Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition to resolving the problem stated above, expanding extractor job tracking would also allow extractor developers to better debug and analyze their extractor jobs, or to determine if there are performance enhancements that could be made. 

Current Behavior

API

The API defines the ExtractorInfo and Extraction models. The MongoDBExtractorService can be used to create and interact with those models within MongoDB.

On Startup

The API for file uploads includes an optional call to the RabbitMqPlugin.

If the RabbitMQ plugin is enabled, Clowder subscribes to RabbitMQ to receive heartbeat messages from each extractor. Clowder subscribes to these messages to determine if an extractor is still online and functioning. (This is the ExtractorInfo model.)

When a heartbeat is discovered and/or an extractor is registered manually via the API, Clowder subscribes to a "reply queue" to receive updates back about the extractor.

On Upload

Each new file upload will push an event into the appropriate extractor queue(s) in RabbitMQ based on the types defined by each extractor.

When a reply comes back containing a file + extractor combo, Clowder creates a new Extraction in the database housing this message.

pyClowder

On Startup

pyClowder includes, among other utilities, a client for subscribing to the appropriate queues in RabbitMQ. It knows how to automatically send back heartbeat signals and status updates to Clowder. 

When a pyClowder-based extractor starts up, it begins sending heartbeat messages to RabbitMQ indicating the status of the extractor. Clowder subscribes to these messages to determine if an extractor is still online and functioning. (This is the ExtractorInfo model.)

On Upload

TBD

If an extractor is idle and sees a new message in the queue it is watching, it will grab the message

When a reply comes back containing a file + extractor combo, Clowder creates a new Extraction in the database housing this message.

UI

The UI for viewing a File offers an Extractions tab, which lists out all Extraction objects associated with this file and groups by extractor type.

Changes Proposed

pyclowder

...

This way, we can easily create a UI that can filter and sort through these events without making major modifications to the UI and without taking a large performance hit in the frontend.

API

The API changes proposed should be fairly minimal, as we are simply extending the existing Extraction API to account for the new identifier that will need to be added to pyClowder.

UI

The UI takes on the brunt of the changes here.

...

  • What do we do about past extractions/jobs that are missing this identifier? Should we preserve Is it worth preserving the existing display in the UI for this purpose?
  • Extraction already has an id field that is unique per-status update - would adding a job_id field work in a way that keeps this data/view backward-compatible as possible?