Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Step-by-Step Instructions

Create Project and Runtime Environment

There are some basic software dependencies to install, after which Docker will handle the rest. Docker Compose is run from within a python virtual environment for your project. Docker Compose will start and connect several Docker containers together to create an extractor runtime environment. It will also deploy your own project code into a Docker container that is connected to this runtime environment.

  1. Install prerequisite software. The install methods will depend on your operating system:
    1. VirtualBox (or an equivalent Docker-compatible virtualization environment)
    2. Docker
    3. Python and PIP
  2. Clone this extractor template project from the repository, substituting your extractor project name:

    Code Block
    languagebash
    git clone http://bitbucket.ncsa.illinois.edu/path/to/extractor-template.git <extractor project name>
  3. Create and activate a Python virtual environment for their new project:
    1. Code Block
      languagebash
      cd <extractor project name>
      virtualenv .
      source bin/activate
      pip install docker-compose

      NOTE: To activate this Python virtual environment in the future, repeat the "source bin/activate" step.

  4. Start up the Extractor runtime environment using Docker Compose:
    1. TODO: docker-compose.yml for extractor runtime environment

      Code Block
      languagebash
      docker-compose up
      ./tests.py

Add Sample Files and Create Tests

At this point you have seen the template extractor deployed and working within your local runtime environment. Now it's time to add your sample input files and create the custom code. We recommend that you develop your extractor in a test-driven manner, by first adding input sample files, then modify the test script to validate the extractor results are correct.

  1. Select a few representative sample files and add them to the "sample_files" folder.
  2. Edit the tests.py script to add tests for your sample files. You can remove the template example file and tests.
  3. Run the new tests:

    Code Block
    $ ./tests.py
  4. The tests will fail of course, but you can look at the output logged to the console to see why they failed.

Develop Extractor Code

Now that we have sample files and failing tests, we can start to write code to make those tests pass. You'll also presumably modify your test code too, as you learn more about your extractor output.

  1. Edit extractor.py to add your data processing steps in the commented areas.
  2. In particular, make sure you edit the MIME type filter (link to line) so that your extractor will only run on relevant input files.
  3. Redeploy the code into the runtime environment by issuing a docker-compose command:

    Code Block
    $ docker-compose ????
  4. Run tests.py again:

    Code Block
    $ ./tests.py

...

  1. Repeat 1 - 3 until tests pass!
  2. Try adding some more sample files and tests.

Contribute the Extractor Tool to the Brown Dog Service

Extractor Guidelines

  • ??
  • Handling Failure