...
At this point a Brown Dog Admin is notified that a new transformation tool has been submitted, and they will review and test the tool and approve it for use. The tool will then be viewable in the "Browse" list of all tools in the current installation of the Tools Catalog.
Deploying an Extractor from a Single Call Method
Here is described the entire process for taking a working piece of code and deploying it as a Brown Dog Extractor. It is assumed that the method can be invoked from a single call. In this example, we are using the python extractor wrapper and will invoke a python function. In a very similar fashion, a method developed in a language other than python can be invoked using subprocess.
The main steps:
- Dockerize the extractor
- Deploy the extractor
- Add the extractor to the tool catalog
Creating an Extractor from Working Python Code
In this section, we will describe the process of creating of an extractor from a working Python code. A few assumptions are that you have a working Python code that extracts some kind of metadata from a data file and that you have installed Python, Git, Python virtual environment, and Docker and other specific software needed by your extractor (if any) in your computer.
Install pyClowder, which is a Python library that helps to easily communicate with Clowder - one of the backend services of Brown Dog. The advantage of using this library is that it manages all communications with Clowder and RabbitMQ (messaging bus) and the developer doesn't have to take care of such tasks. Needless to say, an extractor can also be written in native Python without the use of pyClowder, but it would be more time consuming.
Code Block | ||
---|---|---|
| ||
pip install git+https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git |
Get your code together
We have developed a template or example extractor written in Python. It is a simple wordcount extractor that counts lines, words, and characters in a text file. Clone the template extractor and rename the directory to an appropriate name that reflects the purpose of your extractor
Code Block | ||
---|---|---|
| ||
git clone https://opensource.ncsa.illinois.edu/bitbucket/scm/bd/extractors-template.git
mv extractors-template/ <your_extractor_name>
cd <your_extractor_name> |
...
Edit extractor configuration file config.py:
Change the rabbitmq queue name - in this case replace "wordCount" with an appropriate name for your extractor
- Change the messageType field to reflect the MIME type(s) of the file for which you are writing the extractor
- Update other fields like rabbitmqURL, rabbitmqExchange, sslVerify, to include
- If your extractor needs other custom parameters, they need to be added to config.py
...
Update Dockerfile
To install your software dependencies, provide necessary instructions in Dockerfile using the RUN command. You will need to add a line in Dockerfile to switch to the root user (USER root
) for getting proper permissions. For e.g., to install ImageMagick package using apt-get, add the following commands to Dockerfile:
Code Block | ||
---|---|---|
| ||
USER root
RUN apt-get update && apt-get install -y imagemagick |
Test Docker
Code Block | ||
---|---|---|
| ||
docker-compose up -d
docker build -t <your_extractor_name> .
docker run --rm -i -t --link <your_extractor_name_with_only_alphabets>_rabbitmq_1:rabbitmq <your_extractor_name> |
You should see the following in the terminal. This means that the extractor is running and waiting for messages:
Code Block | ||
---|---|---|
| ||
INFO : pyclowder.extractors - Waiting for messages. To exit press CTRL+C |
Creating a Converter
In this section, we describe the creation of a converter using the image converter written using ImageMagick.
Get the template converter code.
We have developed a template or example converter. It is a simple image converter that images between different formats using ImageMagick tool. Clone the template converter and rename the directory to an appropriate name that reflects the purpose of your converter
Code Block | ||
---|---|---|
| ||
git clone https://opensource.ncsa.illinois.edu/bitbucket/scm/bd/convertors-template.git
mv convertors-template/ <your_converter_name>
cd <your_converter_name> |
...
- Full path to input file
- Full path to output file (including filename)
Full local path to available scratch space (optional)
...
Modify Dockerfile in the converter directory to replace ImageMagick with MyTool. Specifically change line numbers 11, 15, 16 and 17. You need to also change other fields like maintainer and may need to add instructions to install any specific software required by your converter. For example, you can see instruction to install ImageMagick software in the example Dockerfile:
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
# Create softwareserver for polyglot.
FROM ncsapolyglot/polyglot:develop
MAINTAINER Rob Kooper <kooper@illinois.edu>
USER root
# - install requirements
# - enable shellscripts to be scanned
# - enable imagemagick conversion by adding to .aliases.txt
RUN apt-get update && apt-get -y install vim nano imagemagick && \
/bin/sed -i -e 's/^\([^#]*Scripts=\)/#\1/' -e 's/^#\(ShellScripts=\)/\1/' /home/polyglot/polyglot/SoftwareServer.conf && \
echo "ImageMagick" > /home/polyglot/polyglot/scripts/sh/.aliases.txt
# copy convert file to scripts/sh folder in container
# this is done to keep cache so you can debug script easily
COPY ImageMagick_convert.sh /home/polyglot/polyglot/scripts/sh/
RUN chown polyglot /home/polyglot/polyglot/scripts/sh/ImageMagick_convert.sh && \
chmod +x /home/polyglot/polyglot/scripts/sh/ImageMagick_convert.sh
# back to polyglot
CMD ["softwareserver"] |
Modify:
Code Block | ||
---|---|---|
| ||
echo "ImageMagick" > /home/polyglot/polyglot/scripts/sh/.aliases.txt |
To:
Code Block | ||
---|---|---|
| ||
echo "MyTool" > /home/polyglot/polyglot/scripts/sh/.aliases.txt |
Modify:
Code Block | ||
---|---|---|
| ||
COPY ImageMagick_convert.sh /home/polyglot/polyglot/scripts/sh/ |
To:
Code Block | ||
---|---|---|
| ||
COPY MyTool_convert.sh /home/polyglot/polyglot/scripts/sh/ |
Modify:
Code Block | ||
---|---|---|
| ||
RUN chown polyglot /home/polyglot/polyglot/scripts/sh/ImageMagick_convert.sh && \
chmod +x /home/polyglot/polyglot/scripts/sh/ImageMagick_convert.sh |
To:
Code Block | ||
---|---|---|
| ||
RUN chown polyglot /home/polyglot/polyglot/scripts/sh/MyTool_convert.sh && \
chmod +x /home/polyglot/polyglot/scripts/sh/MyTool_convert.sh |
Build the Dockerfile and start the converter
Code Block | ||
---|---|---|
| ||
docker-compose stop
docker build –t mytool .
docker-compose up |
Usage of BD-tmux
BD-tmux runs the necessary dockerized Brown Dog Data Transformation Services (Polyglot, Clowder, Fence, ImageMagick converter and OCR extractor) and combines them into one integrated program.
- After downloading BD-tmux, users can simply run bash script under command-line to start BD-tmux.
Code Block |
---|
bd.sh |
The BD-tmux script will split your terminal into panes and start each of the services needed for the Brown Dog Data Transformation Services. It provides a useful and convenient way to view the logs of running services in panes.
...
There is an example to perform a conversion from jpg to bmp.
...