Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The service will need to know the file type of a file to find the extractors that can process it. There are two options here; either the client can find out the mime type of a file and send it to the service or the client can send the file extension and the service can find out the mime type based on the extension. Each result entry returned by the service should contain extractor name, extractor id, docker image name and git repository name.

Possible Locations for the Service

...

There are many places within the Brown Dog environment where this service can reside.

Independent Service

If implemented as an independent service, this service will run from a VM but will be accessible only through Fence (BD-API) since the service as such won't have any authentication or authorization in place. The advantage here is that it's implementation can be changed independent of other module like Clowder and Tools Catalog. Now, this also means adding one more component to Brown Dog which adds to the overhead of setting it up.

Inside Tools Catalog

Tools Catalog manages the tools (extractors and converters) that are part of Brown Dog. It's like an app store were all the tools are stored forever for achieving tool reuse. The service can also be part of Tools Catalog since it is closely associated with the tools that the Tools Catalog manage.

Inside BD-Clowder

BD-Clowder is the web application that does the Brown Dog content management. It stores the files that are submitted for processing by different BD clients, sends those files to be processed by remote extractor services, and stores the generated metadata and other auxiliary information for future use. Currently a Mongo database is being used as the data store. In this Mongo database, inside a collection titled extractors.info, the details about each extractor that gets registered with this Clowder instance is getting stored. Those details comes from the extractor_info.json file that is now part of every extractor. In future there is also a plan to include filetypes on which an extractor will fire inside the extractor_info.json. This means that all information needed for find the extractors that can process a file type 

...