Tutorial Session Design

Introduction to Brown Dog (30m)

This is a presentation and demo of Brown Dog project and service

How to use Brown Dog Services (1h 30m)

20-30 mins setup

2 problems + 1 optional problem

1 extraction - Problem 1 -face and ocr, audio extractor

1 conversion - Problem 3 - collection of old file formats of images,audios

1 combined - Problem 6 - a single script combining together

1 optional - problem 5

This is a session to teach how to use Brown Dog Services

Participant will use his/her own laptop for this part
- We will provide a VM with everything pre-installed in it through Nebula.
  - Rob Kooper will talk to Doug for this if we can spawn 50 VMs on Nebula for the tutorial session. (DONE) We will get 50 VMs on Nebula.
  - Smruti PadhyOrder 50+ flash drive for back up that will contain the VMs
  - Create a VM with everything installed in it and take a snapshot which will then be deployed within Nebula. Approx. time required - 2 days
  - Make a list of all softwares required and the directory structure for the tutorial
  - local installation of fence and local authentication.
  - 50 concurrent users to perform conversion/extractions tests
  - Luigi Marini Max size on File to be uploaded
  - Not sure of Jetstream yet.
- Provide clear instructions as how to access VMs in Nebula with proper credentials.
  - Clear Instruction of how to access the VMs (e.g., through ssh), from different OSes.
  - Do we need training accounts on nebula?
- (Before tutorial - wiki pages with clear instructions) Installs Python/R/MATLAB/cURL to use BD Service along with the library required in case any one interested in using the BD services in future.
  - Create wiki pages with clear instructions
Demonstration of use of BD Fiddle
- Sign up for Brown Dog Service
- Obtain a key/token using curl or Postman or use of IPython notebook
- Use token and bd fiddle interface to obtain to see BD in action.
- Copy paste the python code snippet and use it the application to be explained next.
- Create a document for the demo with step-by-step screenshots
- Fix the CORS error for file url option (I think it is a known issue)
- Delay when file is uploaded from local directory
Create an applications using BD services
Three applications:
- Problem 1 : Given a collection of images with text embedded in it, try to search images based on its content. (Emphasizes on extraction on unstructured data, indexing and content-based retrieval)
  - One can upload images from local directory to obtain images or use external web service.
    - - Create an example dataset with images with interesting query
      - Provide a code snippet of using externel service to obtain images. e.g. Flicker API.
        This will only be provided as an example and will not be used for the rest of the code.
  - Let the participant use the python library of BD to obtain key/token and submit request to BD-API gateway
    - Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
    - Write a Python script that will serve as a stub for the BD client
      - The participant will fill in the code to BD REST API call to submit their requests.
  - Make sure OCR and face extractor are running before starting the demo
  - Python
  - Make sure the Elasticsearch is started before the example files are submitted to BD service
    - Provide Instructions to start Elasticsearch and start a webclient to it for visualization.
      - Make sure the cluster name in the config.yml differs for each participant.
  - Once technical metadata is obtained from BD, index it tags and technical metadata in an locally running Elasticsearch.
    - Write a python script that will index the technical metadata in ES
  - Search for the image using ES query
    - Provide ES query for search
- Problem 2 : Given a collection of text files from a survey or reviews for a book/movie, use sentiment analysis extractor to calculate the sentiment value for each file and group similar values together. (Emphasizes on extraction on unstructured data and useful analysis )
  - A collection of text files with reviews
    - Obtain an examples dataset from the web.
  - Let the participant use the python library of BD to obtain key/token and submit request to BD-API gateway
    - Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
    - Write a Python script that will serve as a stub for the BD client
      - The participant will fill in the code to BD REST API call to submit their requests.
  - Make sure the Sentiment Analysis extractor is running
  - Saves the results for each text file in a single file with corresponding values
    - Provide code for this in stub script
  - Create separate folders and move the file based on the sentiment value
    - Provide a code that will do the above action in the stub
  - (Optional) Index text files along with the sentiment values and use ES visualization tool to search for documents with sentiment value less than some number.
- Problem 3: Use BD conversion to convert a collection of images/ps/odp files to png/pdf/ppt. This will demonstrates that if you have a directory with files in old file formats, just use BD to get it all converted. (Emphasies on conversion)
  - Provide a Python script for this and let Participant use python library to use the BD service
- Problem 4: Given a collection of *.xlsx files, obtain some results based on some columns value. (Emphasizes on extraction and analysis on scientific data)

An example could be - Given a *.xlsx file with max and min temperature for each day of a month. Calculate average temperature max/min/standard deviation for each month.

- - Convert *.xlsx file to *.csv using conversion API so that you can see the content of the file on the VM. We are not installing any office software on the VM.
  - use extraction API to extract columns from the file and
  - Perform some analysis and add to the technical metadata
  - Write an extractor/converter for this problem
    - This should be an enticing yet simple problem that can handle many spreadsheets and get a result.
    - Ideas
      1. An algebra 101, traveling trains problem. 2 trains leave 2 different stations on tracks heading toward a junction. Given a spreadsheet with departure times, distances, velocities, etc., upload all the spreadsheets and determine if they will crash.
        This problem is simple and would provide the user an easily understood problem that can clearly be scaled to much more involved traffic problems.
        However, it doesn't really present a cool new idea. It may be preferable to think of something more cutting edge technology
      2. A bacterial growth model. Given a culture with varied conditions, eg. pH, stored in multiple spreadsheets, determine the growth rate. Might be able to base this on http://mathinsight.org/bacteria_growth_initial_model
        This would require a few minutes of explanation of the model and would require some learning by the developer of the extractor.
        Still maybe not that enticing.
      - Better Ideas?
  - Provide a Python script for obtaining the input files and use BD REST API for to obtain the result.
- Problem 5: Obtaining Ameriflux data and converting into *.clim format (similar to csv format but tab separated) for SNIPET model. Calculate average air temperature and its standard deviation. (This will emphasize both conversion and analysis)

- - Write a R/Python script to call BD conversion API and get data in *clim format and calculate average air temperature. Also plot a graph of the data.
    - Installation of Rstudio server version.
  - (Optional) to calculate average temperature, call BD extraction service. For this write an extractor that accepts *.clim file and outputs average temperature.
Problem 6: Audio converter , Speech to text extractor

How to add Your Tool to Brown Dog Services (1h 30m)

This is a session to teach how to add user's tool to Brown Dog Services.

Part 1: Write an extractor
- Start with the bd-template extractor, which is the word count extractor.
- Ask participant to modify the extractor, which would use 'grep' to find a specific pattern within the file.
- Include yes/no in the metadata if the pattern is found or not found.
- Give a brief description on Json-ld support.
  - Provide intuition behind the idea json-ld and an example
- As another example - Write the extractor that will be used for problem 4.
- Provide Step-by-step instructions/screenshots of updating the extractor and the output as seen at the Clowder GUI.
  - This needs to be more simplified than what we have for users at beginner level/intermediate one
Part 2: Write a converter
- Start with the bd-template for converter- imagemagick
- Ask the participant to modify the converter input/output formats in the comment section. And see the result using the polyglot web UI for post and get
- Think of another software for which creating a converter is easy and interesting.
- Provide step-by-step instructions/screenshots of modifying imagemagick
Part 3: Uploading a converter or an extractor to locally installed Tools Catalog.
- Step-by-step procedure to upload a tool, an input file and an output file without a docker file
Part 4 (Optional - For advanced user): Dockerize the tool

Use Contributors Landing Page for this part of the session.

Participant will be provided with a VM with all required setup so that they can create their own tool.

Wrap up

Tutorial feedback form
Announcement of next user workshop

Page tree

XSEDE Tutorial Session

Tutorial Session Design

Introduction to Brown Dog (30m)

How to use Brown Dog Services (1h 30m)

How to add Your Tool to Brown Dog Services (1h 30m)

Wrap up