Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Make changes to (main program). Consider the process_file method as the main method of an extractor and accordingly it needs to contain the main logic. You can call other methods in your python code from this method after importing necessary modules into this file.

3. Edit


Edit extractor configuration file

  1. Change the rabbitmq queue name - in this case replace "wordCount" with an appropriate name for your extractor

  2. Change the messageType field to reflect the MIME type(s) of the file for which you are writing the extractor
  3. Update other fields like rabbitmqURL, rabbitmqExchange, sslVerify, to include
  4. If your extractor needs other custom parameters, they need to be added to

Image Removed


This file contains metadata about the extractor in JSON-LD format. Update all relevant fields as needed.

4. Configuration Parameters [Draft]

Extractors obtain the configuration information from command-line arguments or environment variables. 

5. Edit the Dockerfile

Update the Dockerfile to install your software dependencies, provide necessary instructions in Dockerfile using the RUN command. You will need to add a line in Dockerfile to switch to the root user (USER root) for getting proper permissions. For e.g., to install ImageMagick package using apt-get, add the following commands to Dockerfile: