Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

message typetrigger eventmessage payloadexamples
*.file.#when any file is uploaded
  • added file ID
  • added filename
  • destination dataset ID, if applicable

clowder.file.image.png

clowder.file.text.csv

clowder.file.application.json

*.file.image.#

*.file.text.#

...

when any file of the given MIME type is uploaded

(this is just a more specific matching)

  • added file ID
  • added filename
  • destination dataset ID, if applicable
see above
*.dataset.file.addedwhen a file is added to a dataset
  • added file ID
  • dataset ID
  • full list of files in dataset
clowder.dataset.file.added
*.dataset.file.removedwhen a file is removed from a dataset
  • removed file ID
  • dataset ID
  • full list of files in dataset
clowder.dataset.file.removed
*.metadata.addedwhen metadata is added to a file or dataset
  • file or dataset ID
  • the metadata that was added
clowder.metadata.added
*.metadata.removedwhen metadata is removed from a file or dataset
  • file or dataset ID
clowder.metadata.removed

 


Typical extractor structure

...

  • extractor_info.json contains some metadata about the extractor for registration and documentation. For more documentation on this file refer here.
  • Many extractors will also include a Dockerfile for creating docker images of the extractor.

...

Code Block
languagebash
cd /etc/init
for x in clowder-*.conf; do
  start `basename $x .conf`
done


Converting from pyClowder to pyClowder2

Given an extractor that is written to use pyClowder 1, the process of migrating to pyClowder 2 is fairly straightforward.

Key differences

  • config.py is no longer used or needed.
    • Several of the common entries in config.py are accessible to all extractors via the basic Extractor class: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse/pyclowder/extractors.py#66 (here you can also see defaults)
    • You can implement your own command line arguments to include any special parameters in config.py. Another option is to read them from environment variables.
    • the 'messageType' parameter (telling what types of messages to listen for) will go into extractor_info.json and uses a a more MIME-like definition format.
  • your extractor will now be an extension of pyClowder2's Extractor class, which contains many useful methods.
    • init is where you can define custom command line arguments beyond the standard ones.
    • check_message and process_message now get explicit parameters such as clowder host and secret key, rather than embedding them in a 'params' object. information about the entity in clowder that triggered the extraction (file, dataset, etc.) is in the 'resource' parameter. The old 'parameters' is kept for back compatibility, but is deprecated.
  • As a result of config.py going away, you should provide parameters at runtime
    • python my_extractor.py --rabbitmqExchange="terra" --rabbitmqURI="rabbitmw.ncsa.illinois.edu/clowder-dev"
  • new cleaner functions in pyClowder 2 for interacting with clowder, including packages for files, datasets, etc.
    • OLD - extractors.upload_file_to_dataset(outfile, parameters)
    • NEW - pyclowder.files.upload_to_dataset(connector, host, secret_key, resource['id'], outfile)

Migration steps

  1. If there are parameters in config.py that don't use the default values in the link under Key differences, they should be listed as command line parameters in your new extractor class __init__ or simply coded into the script. It's possible to make the parameters read from environment variables as well.
    1. https://github.com/terraref/extractors-stereo-rgb/pull/3/files - in this example, 
      1. https://github.com/terraref/extractors-stereo-rgb/pull/3/files#diff-6be4f9dea03b90eac1407a1012cdf34eL42 is moved to
      2. https://github.com/terraref/extractors-stereo-rgb/pull/3/files#diff-f53b0090553dbecd9e15f5eb59549c00R32
    2. ...and below the self.parser.add_argument the input values can be adjusted before assinging to self.args (e.g. cast a string to an int):
      1. https://github.com/terraref/extractors-stereo-rgb/pull/3/files#diff-f53b0090553dbecd9e15f5eb59549c00R48
    3. Add the messageType from config.py into extractor_info.json
      1. Before: https://github.com/terraref/extractors-stereo-rgb/pull/3/files#diff-38d737ae3b969ee995bd1b34ebe93be4L25
      2. After: https://github.com/terraref/extractors-stereo-rgb/pull/3/files#diff-40099abc8fb726838bb4c7a44b8b5958R10 
  2. Move your extractor python functions into a new Extractor subclass 
    1. https://github.com/terraref/extractors-stereo-rgb/pull/3/files#diff-924a575b0595fcd52d5531433471b109R23 Here a new extractor class called StereoBin2JpgTiff is created.
    2. main() -> __init__(self) (but only for handling inputs)
    3. check_message() and process_message() must be named as such now, and receive explicit inputs:
      1. self, connector, host, secret_key, resource, parameters
      2. typically, old references to parameters['xyz'] can be replaced either with resource['xyz'] or with secret_key, host, etc.
      3. if you aren't sure when writing, you can use print(resource) in your extractor testing to see what fields are included.
  3. Modify old extractor.method() to use the new pyclowder.files.method() or pyclowder.datasets.method()
    1. https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse/pyclowder/files.py
    2. https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse/pyclowder/datasets.py
    3. https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse/pyclowder/collections.py
    4. https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse/pyclowder/sections.py
    5. https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse/pyclowder/utils.py
    6. more to come
  4. finally, the call to main() is replaced with a simple instantiation of your extractor class.
    1. https://github.com/terraref/extractors-stereo-rgb/pull/3/files#diff-924a575b0595fcd52d5531433471b109R174
    2. extractor = StereoBin2JpgTiff(); extractor.start()

Registering an Extractor

If you create an extractor and want to register it for widespread use Clowder has a catalog of extractors. To let the community know about your extractor you may submit it to the catalog (registration required) for publication alongside the extractors created by the Clowder developers, and other Clowder community developers.

  1. If you haven't registered you may do so at the registration page (to be linked soon). Registration will need to be approved, so this will not be an instantaneous process. Once approved you will be able to sign in, and submit extractors.
  2. Sign in to the catalog at the sign in page.
  3. Click the Contribute link at the top, or visit the contribution page.
  4. Paste your extractor info into the text box on the contribution page. The extractor info has all the information that we use in the catalog, so having a fully detailed extractor info is important for the catalog.
  5. Select Extractor from the radio buttons under the text box (Converter is for converting one file type to another, and there may be more options added for other tools at a later date).
  6. Submit the extractor.
  7. On the home page you may not see your extractor immediately. Once submitted it must be approved by one of the Clowder development team to show up on the home page. As long as your links to where to get the extractor check out you will likely be approved, and if there are correctable issues we may contact you.