TODO (use lots of pictures!):
- XSEDE Tutorial content
- BD-tmux
- Create a converter
- Create an extractor
- Dockerization
- Tools catalog
Usage of BD-tmux
BD-tmux runs the necessary dockerized Brown Dog Data Transformation Services (Polyglot, Clowder, Fence, ImageMagic converter and OCR extractor) and combines them into one integrated program.
- After downloading BD-tmux, users can simply run bash script under command-line to start BD-tmux.
bd.sh
The BD-tmux script will split your terminal into panes and start each of the services needed for the Brown Dog Data Transformation Services. It provides a useful and convenient way to view the logs of running services in panes.
Users can switch between panes using Tmux commands. In bottom pane, users can run BDCLI commands to interact with Brown Dog Data Transformation Services (Username: fence Password: testing).
There is an example to perform a conversion from jpg to bmp.
- To do extraction, users need to login local Clowder Web Service (http://user_machine_ipaddress:9000 with username: admin@test.com and password testing0909) to accept ''Terms of Service'', then run bd commands as below example:
Deploying an Extractor from a Single Call Method
Here is described the entire process for taking a working piece of code and deploying it as a BrownDog Extractor. It is assumed that the method can be invoked from a single call. In this example, we are using the python extractor wrapper and will invoke a python function. In a very similar fashion, a method developed in a language other than python can be invoked using subprocess.
The main steps:
- Dockerize the extractor
- Deploy the extractor
- Add the extractor to the tool catalog
Creating an extractor from working python code:
In this example, the extractor "killed_photos" will be used. Please, use the appropriate names for your extractor.
Install pyClowder
pip install git+https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git
- Get your code together
Clone the extractor template and rename the directory to an appropriate name
git clone ssh://git@opensource.ncsa.illinois.edu:7999/bd/extractors-template.git mv extractors-template/ killed_photos
Bring in working python code
Create a source directory within the extractor template
cd killed_photos/ mkdir src cd src
Put the existing code into the src directory.
- It is here assumed that the entry into the working code (the callable function) will be at the top level of the src directory.
- Your directory structure should look like this. The source code will probably be a lot more complicated, but the key is that there is an entry point - in this case main.py
And add the python __init__.py file to the src directory.
touch __init__.py
Test Docker
cd .. docker-compose up <open a new terminal in the same directory> docker build -t killed-photos . docker run --rm -i -t --link killedphotos_rabbitmq_1:rabbitmq killed-photos
If you see the following in the terminal, you're ready to continue:
INFO : pyclowder.extractors - Waiting for messages. To exit press CTRL+C
Edit extractor configuration:
- Change the rabbitmq queue name - in this case replace "wordCount" with "killed_photo", but use a good name for your extractor.
- Edit extractor.info.json
- Make sure to change the values in the red boxes. Populate the other fields as needed.
- Make sure to change the values in the red boxes. Populate the other fields as needed.
- Edit extractor.py
- main process
- Within the red box above is the actual process. In the template-extractor, 'wc' (a linux command that counts the number of words in a document) is called using subprocess. However, we are going to change this portion so that the python code is called from the extractor.
- main process
Adding to and Deploying from the Tools Catalog
Navigate to https://browndog.ncsa.illinois.edu/toolscatalog, and login using the button at the top right with your Brown Dog username/email and password.
Adding or Contributing a Tool to the Tools Catalog
To begin the process of adding your dockerized tool to Tools Catalog, click "Contribute" at the top of the page.
You will be taken to the page which will show the tools you have already contributed and a button to begin creating a tool "Create Tool".
You will be taken to an entry screen:
Enter:
- Title of the tool - note the size restrictions - no other restrictions as to format currently - *Required
- URL - location of the dockerized tool - *Required
- At a glance - a short description of the tools - Max of 140 characters - *Required
- Description - a full description of the tool
- Citation - instructions on how someone would cite the tool if they used it in their work
- YouTube Video URL - link to an instructional video if available
- Tool Type - Indicate if the tool is a Converter or Extractor
and click "Submit".
When the system returns the tool has been created and the information is displayed.
Review the tool and edit if necessary.
Click "Add Brown Dog Script" to proceed.
Enter:
- Title - this is the title of the script and not the tool - there are no naming conventions currently - *Required
- Description - this is the description of what the script does - there are no length standards - *Required
Click "Next".
Additional fields will display to allow files to be submitted for the script.
Click "Choose File" to submit the following files:
- Brown Dog Script File
- Docker File
- Example Input file - for testing and illustration purposes
- Example Output - for testing and illustration putposes
- Queue Name - can be left blank - advanced users may know the specific queue that will be used in the Elacticity Module - Must be the same as Extractor Name in the code itself (extractor_info.json)
- Docker Image Name - can be left blank
- VM Image Name - can be left blank
Click "Submit".
Review the script information and click "Author Approve" to submit the script. "Edit" is available to change any details.
At this point a Brown Dog Admin is notified that a new Brown Dog Script has been submitted, and they will review and test the script and approve it for use.
The tool will then be viewable in the "Browse" list of all tools in the current installation of the Tools Catalog.