...
- Should extractors now default to only bind themselves to extractor specific submission? And not on mime type?
- Should we have the ability to register extractors at the instance level if that's the case?
- Are list of extractors retrieved from RabbitMQ or from the list of registered extractors?
- Does any file get submitted to an extractor if submitted to a space?
Clowder and RabbitMQ
DISCUSSION DRAFT
Following are some steps that we can do to create a per space extractor. The thinking of this list is that this is path to implement.
- List of extractors is kept per space.
- When a file is uploaded to clowder it will trigger the normal message to rabbitmq, however at the same time in case of a file it will do a lookup of what space(s) the file belongs to and will trigger a message to the normal exchange to each extractor explicitly listed.
- quick implementation, only need to add logic to rabbitmq plugin to find the space and send additional messages, however can result in duplicate messages send to a queue, for example might send a message because binding image/* → preview, as well as explicit space to that extractor
- Assumption, all extractors register themselves with that specific clowder instance, as well as with the exchange (this is assumed now as well).
- List of bindings is kept in clowder, removes exchanges
- When a file is uploaded to clowder, it will look in the global bindings as well as the space bindings and makes a unique list of all extractors and fire a message for each extractor
- all logic of which extractor is now controlled by clowder, no duplicate messages
- Assumption, all extractors register themselves with that specific clowder instance (this is assumed now as well).
- List of extractors is queried
- Extractors will bind themselves to a queue and have a command queue. The command queue is non-persistant. Each extractor will pick up messages from both queues, however give preference to the command queue. Clowder will send a message to the extractor exchange and send a command message (cmd) that is picked up by all command queues, allowing them to register them than with that specific clowder instance. All extractor.* messages will go the normal extractor queue.
- Can easily get a list of all extractors and refresh this list every 15 or so minutes.
- More complex logic for extractors
- Extractors can now have more complex logic, such as (file added, mimetype=image/jpg, filesize>5MB). This is part of the extractor_info.json, clowder can use this more complex logic to see if an extractor should fire.