Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is a way to manage the fact that, for dataset-level extractors, it is not possible to specify a MIME type or other parameter for triggering and instead dataset extractors must either evaluate every dataset message (which rulechecker does), or trigger via alternative means (the extractors rulechecker triggers).


Daisy-chaining directly

From there, each extractor can trigger the next extractor directly in the chain if possible, at the end of the process_message() function.

  • bin2tif uses pyclowder's submit_extraction() function to pass it's geoTIFF outputs along to 3 additional extractors, for example.

This is the simplest option of extractor pipelines are simple direct 1-1 paths.


Collection-level extractors

For extractors such as the fieldmosaic stitcher that mosaics together 9000+ images from a single day, the rulechecker extractor is used once again. Each geoTIFF is passed back to rulechecker, which triggers a special rule to add that geoTIFF to a PSQL database maintaining a list of geoTIFFs for a specific day or scan. Once a threshold is met, all 9000 geoTIFFs are passed to the fieldmosaic extractor (each of the geoTIFFs from a different dataset) to stitch them all at once.

Image Added

Rulechecker is useful when:

  • a pipeline has many dataset extractors, but the pipeline has a lot of traffic which is not relevant to any individual dataset extractor. For TERRA, this makes sense because we have 10 different sensors and ~30 different products from those sensors. Without a switchboard filter, all the RabbitMQ queues for those extractors would be filled with irrelevant messages. Rulechecker reduces the traffic to those queues immensely.
  • an extractor needs to trigger only when certain cross-dataset or cross-file conditions are met, e.g. "only when we have 100 CSVs from June 3 extracted" or "only when all 8000 datasets in this collection have Height metadata". Rulechecker has a PSQL database by default underlying it, and the rule_utils library included with rulechecker provides shortcut methods for reading and writing from the database as a means for tracking progress toward triggering a "big" extractor.

For in-depth example, see the terraref_switchboard() function in TERRA's rules.py, a file that is necessary for a rulechecker deployment defining which rules to execute on each dataset (https://opensource.ncsa.illinois.edu/bitbucket/users/mburnet2/repos/terraref-rulechecker/browse/rules.py).