Page History

...

This is a way to manage the fact that, for dataset-level extractors, it is not possible to specify a MIME type or other parameter for triggering and instead dataset extractors must either evaluate every dataset message (which rulechecker does), or trigger via alternative means (the extractors rulechecker triggers).

Daisy-chaining directly

From there, each extractor can trigger the next extractor directly in the chain if possible, at the end of the process_message() function.

bin2tif uses pyclowder's submit_extraction() function to pass it's geoTIFF outputs along to 3 additional extractors, for example.

This is the simplest option of extractor pipelines are simple direct 1-1 paths.

Collection-level extractors

For extractors such as the fieldmosaic stitcher that mosaics together 9000+ images from a single day, the rulechecker extractor is used once again. Each geoTIFF is passed back to rulechecker, which triggers a special rule to add that geoTIFF to a PSQL database maintaining a list of geoTIFFs for a specific day or scan. Once a threshold is met, all 9000 geoTIFFs are passed to the fieldmosaic extractor (each of the geoTIFFs from a different dataset) to stitch them all at once.

Image Added

Rulechecker is useful when:

a pipeline has many dataset extractors, but the pipeline has a lot of traffic which is not relevant to any individual dataset extractor. For TERRA, this makes sense because we have 10 different sensors and ~30 different products from those sensors. Without a switchboard filter, all the RabbitMQ queues for those extractors would be filled with irrelevant messages. Rulechecker reduces the traffic to those queues immensely.
an extractor needs to trigger only when certain cross-dataset or cross-file conditions are met, e.g. "only when we have 100 CSVs from June 3 extracted" or "only when all 8000 datasets in this collection have Height metadata". Rulechecker has a PSQL database by default underlying it, and the rule_utils library included with rulechecker provides shortcut methods for reading and writing from the database as a means for tracking progress toward triggering a "big" extractor.

For in-depth example, see the terraref_switchboard() function in TERRA's rules.py, a file that is necessary for a rulechecker deployment defining which rules to execute on each dataset (https://opensource.ncsa.illinois.edu/bitbucket/users/mburnet2/repos/terraref-rulechecker/browse/rules.py).

Space shortcuts

Page tree

Versions Compared

Old Version 1

New Version Current

Key