Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

    • my_python_program.py (required): For simplicity, let us call the Python file that contains the main function my_python_program.py, the main function my_main_function, and your extractor my_extractor.
    • extractor_info.json (required): Contains metadata about the extractor

    • Dockerfile (required): Contains instructions to create a docker image of your extractor

    • requirements.txt (optional): Contains names of Python packages that will be installed using the pip command.

    • packages.apt (optional): Contains names of Linux packages that will be installed using the apt-get command.

  1. Create and save extractor_info.json using any text editor in your source directory. This file contains the metadata about the extractor that you are creating. Please fill in the relevant details about the extractor in this file. This document follows the JSON-LD standard. A template extractor_info.json has been provided below for reference. As you can see, you can fill in the details like name, version, author, contributors, source code repository, docker image name, the data types on which the extractor will work, external services used, any dependent libraries, BibTex  format citation to a list of publications that the extractor is referring to, etc. An example extractor_info.json can be found here:

    Code Block
    languagejs
    themeConfluence
    linenumberstrue
    {
       "@context": "<context root URL>",
       "name": "<extractor name>",
       "version": "<version number>",
       "description": "<extractor description>",
       "author": "<first name> <last name> <<email address>>",
       "contributors": [
           

...

  1. "<first name> <last name> <<email 

...

  1. address>>",
           

...

  1. "<first name> <last name> <<email 

...

  1. address>>",
         ],
       "contexts": [
        {
           "<metadata term 1>": "<URL definition of metadata term 1>",
            "<metadata term 2>": "<URL definition of metadata term 2>",
         }
       ],
       "repository": [
          {
    	"repType": "git",
        	 "repUrl": "<source code URL>"
          }
       ],
       "process": {
         "file": [
           "<MIME type/subtype>",
           "<MIME type/subtype>"
         ]
       },
       "external_services": [],
       "dependencies": [],
       "bibtex": []
     }


  2. Download the Docker Compose file from
    1. https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/raw/docker-compose.yml
    2. You can also use curl command to download it from a terminal:

      Code Block
      languagebash
      themeConfluence
      curl https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/raw/docker-compose.yml?at=refs%2Fheads%2FBD-2226-add-docker-compose-file-to-pyclowder2 --output docker-compose.yml


  3. Start up the Clowder services stack (Clowder, RabbitMQ, MongoDB, and ElasticSearch) by running the following command from the directory containing the downloaded docker-compose.yml file. This may take a few minutes when running for the first time:

    Code Block
    languagebash
    themeConfluence
    docker-compose up


  4. Create and save a Dockerfile in your existing source code directory. This can be done using any text editor in your computer. The content of the Dockerfile needs to be the following, where should replace my_python_program.py and my_main_function with their actual names:

    Code Block
    languagebash
    themeConfluence
    FROM clowder/extractors-simple-extractor:onbuild
    ENV EXTRACTION_FUNC="my_main_function"
    ENV EXTRACTION_MODULE="my_python_program.py"


  5. If there are any Python or Linux packages that are required by your code, please add them to two files named requirements.txt and packages.apt in the source code directory. Each package entry should be added to a separate line in these files.
  6. Now, create the Docker image for your extractor using the command below. Please note that there is a dot (.) at the end of the command. You will need to open a terminal client and change to your Dockerfile directory using the cd command before running the command below (this will also install the Python packages from requirements.txt and Linux apt-get packages from packages.apt):

    Code Block
    languagebash
    themeConfluence
    docker build -t my_extractor .

    In the terminal, you should be able to see the logs of the services that are part of the Clowder stack.

  7. From another terminal window, you can now run your extractor using the following command:

    Code Block
    languagebash
    themeConfluence
    docker run -t -i --rm --network clowder_clowder my_extractor

    You should be able to see the logs related to the starting extractor in this terminal window.

  8. You can always test your python code before wrapping it as an extractor. To test your built extractor, you will need to sign up and create an account in your local Clowder instance. Please follow the steps below:
    1. Open your web browser and go to http://<ip_address>:9000/signup, where <ip_address> needs to be replaced by your computer’s IP address. You can run ifconfig (Mac/Linux) or ipconfig (Windows) command from a terminal window to find your computer’s IP address.
    2. Once you are in the sign up page, please create an account using your email address as shown in figure above. Click on the "Create Account" after you enter your email address.
      Image Added
  9. :