|Table of Contents|
We currently track extractor messages by resource - for a dataset or file, the user can view a list of Extractor events that were associated with that file.
This way, we can easily create a UI that can filter and sort through these events without making major modifications to the UI and without taking a large performance hit in the frontend.
There is still an open question of who is responsible for creating this unique ID. On the one hand it should be up to Clowder to create and manage these IDs, as it is managing the larger concept of a "Job" which spans multiple status updates. On the other hand, will this account for failover? What if an extractor fails midjob? Is it ok to start this job over with the same ID, or would this require generating an entirely new ID?
The API changes proposed should be fairly minimal, as we are simply extending the existing Extraction API to account for the new identifier that will need to be added to pyClowder.
NOTE: Extraction already has a field named id, which is unique per-status update. We may want to create a new field (named job_id?) instead of changing existing behavior and maybe causing compatibility issues.
Further changes would likely break the existing implementation of Extractors/Extractions, and should be considered thoroughly before execution.
The UI takes on the brunt of the changes herea few changes here, adding further grouping/categorization based on the new job ID.
Displaying extractor events currently looks as follows: