You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 54 Current »

The Brown Dog DTS is intended to be an extensible/distributed service with new transformation capabilities being added and hosted at various sites over time.  We keep a catalog of these transformation capabilities within the Brown Dog Tools Catalog.  To access it navigate to:

https://browndog.ncsa.illinois.edu/toolscatalog/

and login using the button at the top right with your Brown Dog username/email and password. 

Adding a Tool to the Brown Dog DTS

To begin the process of adding your dockerized tool to Tools Catalog, click "Contribute" at the top of the page.  You will be taken to the page which will show the tools you have already contributed and a button to begin creating a tool "Create Tool".

You will be taken to an entry screen:

 

Enter:

  • Title of the tool - note the size restrictions - no other restrictions as to format currently - Required
  • URL - location of the dockerized tool - Required
  • At a glance - a short description of the tools - Max of 140 characters - Required
  • Description - a full description of the tool
  • Citation - instructions on how someone would cite the tool if they used it in their work
  • YouTube Video URL - link to an instructional video if available
  • Tool Type - Indicate if the tool is a Converter or Extractor

and click "Submit".  When the system returns the tool has been created and the information is displayed.  Review the tool and edit if necessary.

Tool Level indicates the following:

  • Level 1 = Just a link (not usable out of box)
  • Level 2 = a script (usable with a little work, i.e. setting up a VM)
  • Level 3 = a container (usable with a click of a button)
  • Level 4 = test data included (automatically usable and testable, the best case situation!)

Click "Add Brown Dog Script" to proceed.

Enter:

  • Title - this is the title of the script and not the tool - there are no naming conventions currently - Required
  • Description - this is the description of what the script does - there are no length standards - Required

Click "Next".  Additional fields will display to allow files to be submitted for the script.

Click "Choose File" to submit the following files:

  • Brown Dog Script File
  • Docker File
  • Example Input file - for testing and illustration purposes
  • Example Output - for testing and illustration putposes
  • Queue Name - can be left blank  - advanced users may know the specific queue that will be used in the Elacticity Module - Must be the same as Extractor Name in the code itself (extractor_info.json)
  • Docker Image Name - can be left blank
  • VM Image Name - can be left blank

Click "Submit".  Review the script information and click "Author Approve" to submit the script.  "Edit" is available to change any details.

At this point a Brown Dog Admin is notified that a new transformation tool has been submitted, and they will review and test the tool and approve it for use.  The tool will then be viewable in the "Browse" list of all tools in the current installation of the Tools Catalog.

Deploying an Extractor from a Single Call Method

Here is described the entire process for taking a working piece of code and deploying it as a Brown Dog Extractor.  It is assumed that the method can be invoked from a single call.  In this example, we are using the python extractor wrapper and will invoke a python function.  In a very similar fashion, a method developed in a language other than python can be invoked using subprocess.

The main steps:

  1. Dockerize the extractor
  2. Deploy the extractor
  3. Add the extractor to the tool catalog

Creating an Extractor from Working Python Code

In this section, we will describe the process of creating of an extractor from a working Python code. A few assumptions are that you have a working Python code that extracts some kind of metadata from a data file and that you have installed Python, Git, Python virtual environment, and Docker and other specific software needed by your extractor (if any) in your computer.

  1. Install pyClowder, which is a Python library that helps to easily communicate with Clowder - one of the backend services of Brown Dog. The advantage of using this library is that it manages all communications with Clowder and RabbitMQ (messaging bus) and the developer doesn't have to take care of such tasks. Needless to say, an extractor can also be written in native Python without the use of pyClowder, but it would be more time consuming.

    pip install git+https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git
  2. Get your code together
    We have developed a template or example extractor written in Python. It is a simple wordcount extractor that counts lines, words, and characters in a text file. Clone the template extractor and rename the directory to an appropriate name that reflects the purpose of your extractor

    git clone https://opensource.ncsa.illinois.edu/bitbucket/scm/bd/extractors-template.git
    mv extractors-template/ <your_extractor_name>
    cd <your_extractor_name>


    Bring in your working python code. Make changes to extractors.py (main program). Consider process_file method as the main method of an extractor and accordingly it needs to contain the main logic. You can call other methods in your python code from this method after importing necessary modules into this file.

  3. Edit extractor configuration file config.py:

    1. Change the rabbitmq queue name - in this case replace "wordCount" with an appropriate name for your extractor

    2. Change the messageType field to reflect the MIME type(s) of the file for which you are writing the extractor
    3. Update other fields like rabbitmqURL, rabbitmqExchange, sslVerify, to include 
    4. If your extractor needs other custom parameters, they need to be added to config.py


  4. Edit extractor.info.json
    This file contains metadata about the extractor in JSON-LD format. Update all relevant fields as needed.


  5. Update Dockerfile
    To install your software dependencies, provide necessary instructions in Dockerfile using the RUN command. You will need to add a line in Dockerfile to switch to the root user (USER root) for getting proper permissions. For e.g., to install ImageMagick package using apt-get, add the following commands to Dockerfile:

    USER root
    RUN apt-get update && apt-get install -y imagemagick
  6. Test Docker

    docker-compose up -d
    docker build -t <your_extractor_name> .
    docker run --rm -i -t --link <your_extractor_name_with_only_alphabets>_rabbitmq_1:rabbitmq <your_extractor_name>

    You should see the following in the terminal. This means that the extractor is running and waiting for messages:

    INFO    : pyclowder.extractors -  Waiting for messages. To exit press CTRL+C

Creating a Converter

In this section, we describe the creation of a converter using the image converter written using ImageMagick.

  1. Get the template converter code.
    We have developed a template or example converter. It is a simple image converter that images between different formats using ImageMagick tool. Clone the template converter and rename the directory to an appropriate name that reflects the purpose of your converter

    git clone https://opensource.ncsa.illinois.edu/bitbucket/scm/bd/convertors-template.git
    mv convertors-template/ <your_converter_name>
    cd <your_converter_name>
  2. Rename and edit ImageMagick_convert.sh script to wrap your converter logic. This script file should be named in the format <alias>_convert.<script_type>. Here <alias> needs to be replaced by the name of the conversion tool with which the converter registers with Polyglot and <script_type> needs to be replaced by the extension of the script (e.g. py, sh, etc.). For the sake of ease of explanation, we will rename the script file as MyTool_convert.sh. This script accepts three parameters: 
    1. Full path to input file
    2. Full path to output file (including filename)
    3. Full local path to available scratch space (optional)

    This script will be used by the Software Server to perform the conversion. The example script ImageMagick_convert.sh that uses ImageMagick tool to convert images between different formats is shown below. The conversion script follows a specific header and is written as comments:
    1. First line is the shebang line
    2. Second line contains the name of the converter followed by version if any
    3. Third line refers to the type of the data that it can convert
    4. Fourth line contains a comma-separated list of input file formats accepted by this converter
    5. Fifth line contains a comma-separated list of output file formats that this converter can generate
    6. This is followed by the actual code that does conversion.

  3. Modify Dockerfile in the converter directory to replace ImageMagick with MyTool. Specifically change line numbers 11, 15, 16 and 17. You need to also change other fields like maintainer and may need to add instructions to install any specific software required by your converter. For example, you can see instruction to install ImageMagick software in the example Dockerfile:

    Dockerfile
    # Create softwareserver for polyglot.
    FROM ncsapolyglot/polyglot:develop
    MAINTAINER Rob Kooper <kooper@illinois.edu>
    
    USER root
    # - install requirements
    # - enable shellscripts to be scanned
    # - enable imagemagick conversion by adding to .aliases.txt
    RUN apt-get update && apt-get -y install vim nano imagemagick && \
    	/bin/sed -i -e 's/^\([^#]*Scripts=\)/#\1/' -e 's/^#\(ShellScripts=\)/\1/' /home/polyglot/polyglot/SoftwareServer.conf && \
    	echo "ImageMagick" > /home/polyglot/polyglot/scripts/sh/.aliases.txt
    
    # copy convert file to scripts/sh folder in container
    # this is done to keep cache so you can debug script easily
    COPY ImageMagick_convert.sh /home/polyglot/polyglot/scripts/sh/
    RUN chown polyglot /home/polyglot/polyglot/scripts/sh/ImageMagick_convert.sh && \
        chmod +x /home/polyglot/polyglot/scripts/sh/ImageMagick_convert.sh
    
    # back to polyglot
    CMD ["softwareserver"]
    1. Modify:

      echo "ImageMagick" > /home/polyglot/polyglot/scripts/sh/.aliases.txt

      To:

      echo "MyTool" > /home/polyglot/polyglot/scripts/sh/.aliases.txt
    2. Modify:

      COPY ImageMagick_convert.sh /home/polyglot/polyglot/scripts/sh/

      To:

      COPY MyTool_convert.sh /home/polyglot/polyglot/scripts/sh/
    3. Modify:

      RUN chown polyglot /home/polyglot/polyglot/scripts/sh/ImageMagick_convert.sh && \
          chmod +x /home/polyglot/polyglot/scripts/sh/ImageMagick_convert.sh

      To:

      RUN chown polyglot /home/polyglot/polyglot/scripts/sh/MyTool_convert.sh && \
          chmod +x /home/polyglot/polyglot/scripts/sh/MyTool_convert.sh
  4. Build the Dockerfile and start the converter

    docker-compose stop
    docker build –t mytool .
    docker-compose up

Usage of BD-tmux

BD-tmux runs the necessary dockerized  Brown Dog Data Transformation Services (Polyglot, Clowder, Fence, ImageMagick converter and OCR extractor) and combines them into one integrated program. 

  • After downloading BD-tmux, users can simply run bash script under command-line to start BD-tmux.
 bd.sh
  • The BD-tmux script will split your terminal into panes and start each of the services needed for the  Brown Dog Data Transformation Services. It provides a useful and convenient way to view the logs of running services in panes. 


  • Users can switch between panes using Tmux commands. In bottom pane, users can run BDCLI commands to interact with Brown Dog Data Transformation Services (Username: fence Password: testing). 

  • There is an example to perform a conversion from jpg to bmp.


  • To do extraction, users need to login local Clowder Web Service (http://user_machine_ipaddress:9000 with username: admin@test.com and password testing0909) to accept ''Terms of Service'', then run bd commands as below example:



  • No labels