...
- List of extractors is kept per space ./ clowder instance
- When When a file is uploaded to clowder it will trigger the normal message to rabbitmq, however at the same time in case of a file it will do a lookup of what space(s) the file belongs to and will trigger a message to the normal exchange to each extractor explicitly listed.
- quick implementation, only need to add logic to rabbitmq plugin to find the space and send additional messages, however can result in duplicate messages send to a queue, for example might send a message because binding image/* → preview, as well as explicit space to that extractor
- Assumption, all extractors register themselves with that specific clowder instance, as well as with the exchange (this is assumed now as well).
- List of bindings is kept in clowder, removes exchanges
- When a file is uploaded to clowder, it will look in the global bindings as well as the space bindings and makes a unique list of all extractors and fire a message for each extractor
- all logic of which extractor is now controlled by clowder, no duplicate messages
- Assumption, all extractors register themselves with that specific clowder instance (this is assumed now as well).
- List of extractors is queried
- Extractors will bind themselves to a queue and have a command queue. The command queue is non-persistant. Each extractor will pick up messages from both queues, however give preference to the command queue. Clowder will send a message to the extractor exchange and send a command message (cmd) that is picked up by all command queues, allowing them to register them than with that specific clowder instance. All extractor.* messages will go the normal extractor queue.
- Can easily get a list of all extractors and refresh this list every 15 or so minutes.
- More complex logic for extractors
- Extractors can now have more complex logic, such as (file added, mimetype=image/jpg, filesize>5MB). This is part of the extractor_info.json, clowder can use this more complex logic to see if an extractor should fire.
Bonus
- Adding code so we use the key for the user that is responsible for the event instead of the global key.
UPDATE 2018-05-11
- Need to add global list for extractors, use the same mongo collection for global extractors
- Re-visit per space list of extractors
- RabbitMQ plugin needs to know what space a file belongs to, it will need to know what space
- check list of extractors in space / global
- Use mimetype to filter list (be ready for more complex rules in future)
- when we move dataset into new space, run all extactors on files in space
- private space for now only uses global
- Keep exchange + routing key for now mark as deprecated, remove in 2.0
- new extractors, removing bindings to exchange
WORK:
- rob does pyclowder 2 + simple python code for others
- mike does UI for extractor selection
- luigi/rob do clowder things