Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Moving compute to data is one of the ways through which Brown Dog will address Big data. This means that rather than sending large files for processing to remote extraction services, BD clients will download the extraction services to local machine as docker containers and will be able process the large files locally using those containers. Now, to accomplish this, BD clients will need to find out the list of extractors that can process a given file. The Extractor Info Fetcher service comes into picture here.

...

Extractors are currently linked to filetypes through its use in routing keys. For finding the extractors that can process a given file, the clients will first call a service (let's call it "Extractor Info Fetcher Service" for the time being) through the BD-API endpoint /extractors. This service will return a list of extractors (including details like docker image name, extractor name, etc.) that can process files belonging to a given file type. Depending on which instance of BD-API is used (Dev, Prod, etc.), the returned list of extractors should be associated with that instance. I.e. only those extractors that are bound to a particular Brown Dog instance should be returned based on a request. Ideally these extractors should be either currently running or available, i.e., they shouldn't be obsolete.

...