Page History

...

We will provide a VM with everything pre-installed in it through Nebula.
- Rob Kooper will talk to Doug for this if we can spawn 50 VMs on Nebula for the tutorial session. (DONE) We will get 50 VMs on Nebula.
- Smruti Padhy (who else??) Make a list of all softwares required and the directory structure for the tutorial
- Smruti Padhy, Marcus Slavenas, (who else?) Create a VM with everything installed in it and take a snapshot which will then be deployed within Nebula. Approx. time required - 2 days
- Write a script for deployment of 50 VMs from the VM snapshot that we created.
- Testing BD service with 50 concurrent users to perform conversion/extractions tasks
- Luigi Marini Maximum size of file that can be uploaded to Brown Dog needs to be controlled. This is require to ensure no one uploads any large files.
- ~~Not sure of Jetstream yet.~~
- Provide clear instructions as how to access VMs in Nebula with proper credentials.
  - Clear Instruction of how to access the VMs (e.g., through ssh), from different OSes (Linux, Mac, Windows).
  - Need training accounts on Nebula. Provide SSH key-pairs to each participant.
- (Before tutorial - wiki pages with clear instructions) Installs Python/R/MATLAB/cURL to use BD Service along with the library required in case any one interested in using the BD services in future.
  - Create wiki pages with clear instructions
Backup - Provide VM through USB sticks in case of network interruption
In addition to the installation required for VMs in Nebula, following are extra steps required for Backups VMs
- TODO
  - Convert the VM created (using Openstack image) to VirtualBox format (*vdi) and test the configurations.
  - Smruti Padhy Order 50+ flash drive for back up that will contain the VMs
  - Test local installation of Fence with local authentication instead of Crowd. This is for backup to be provided in the preinstalled VM.
  - bdfiddle installation

...

Create applications using BD services (50 mins)
- Conversion Example (15 mins): To convert a collection of images/ps/odp/audio/video files to png/pdf/ppt/mp3. This will demonstrates that if you have a directory with files in old file formats, just use BD to get it all converted without requiring to install any software. (Emphasizes on conversion)
  - Make sure imagemagick and ffmpeg converters are running before the demo
  - Obtain the the BD token/key - ask participant to refer to previous bdfiddle step or use the python library
  - Ask the participant to check for the available output formats for specific input formats
  - Ask the participant to use python library to use BD service
  - TODO:
  - - Provide a Python script for this and let participants use python library to use the BD service
    - Provide a Step-by-step instructions with screenshot to do this
- Extraction/Indexing/Retrieval Example (20 mins):
  Given a collection of images with text embedded in it, and audio files, search images/audio files based on its content. (
  Emphasizes on extraction on unstructured data, indexing and content-based retrieval)
- - Make sure OCR, face, and speech2text extractors are running before starting the demo
  - One can upload images from local directory to obtain images or use external web service.
  - Let the participant use the python library of BD to obtain key/token and submit extraction request to BD-API gateway
- - Once technical metadata is obtained from BD, write the tags and technical metadata to a local file /python dictionary.
  - Search the file based on the tags/technical metadata by linear search on the index file
  - TODO
    - Create an example dataset with images and audios to which we can make interesting query
    - (Optional) Provide a code snippet of using externel service to obtain images. e.g. Flicker API.
      - This will only be provided as an example and will not be used for the rest of the code.
    - Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
    - Write a Python script that will serve as a stub for the BD client
      - The participant will fill in the code to use python library to call BD REST API and submit their requests.
      - The python script should write the tags and technical metadata to a local file. (Probably can use python library's index method that writes it as feature vectors.)
    - Write a Python script to make interesting search/query to the index file. Again probably use the python library's find method or just read the local file.
- Conversion & Extraction Example (15 min): Given an audio file of different format, convert it to a format that speech2text extractor accepts and obtains the text for the audio.
  - TODO
    - Write script that does the conversion and then sends the converted file to the BD service.
    - Create a step-by-step instructions document with screenshots.
- (Optional) Combination of Conversion & Extraction Example: Obtaining Ameriflux data and converting into *.clim format (similar to csv format but tab separated) for SNIPET model. Calculate average air temperature and its standard deviation. (This will emphasize both conversion and analysis)

...

This is a session to teach how to add user's tool to Brown Dog Services.

Part 1: Write Teach to write an extractor (30 mins)
- Start with the bd-template extractor, which is the word count extractor.
- Ask participant to modify the extractor, which would use 'grep' to find a specific pattern within the file.
- Ask to change the name of the extractor from ncsa.wordcount to ncsa.grep.
- Include yes/no in the metadata if the pattern is found or not found.
- Give a brief description on Briefly describe Json-ld support. Provide Provide intuition behind the idea json-ld and an example with a simple example. No need to go into details of RDF.
- TODO
- As another example - Write the extractor that will be used for problem 4.
- - Provide Step-by-step instructions/screenshots of updating the extractor and the output as seen at the Clowder GUI
- - . Also provide link to json-ld for further readings. Provide minimum software requirements for the development such as Clowder, Rabbimq, MongoDB, pyclowder, python libraries, etc.
  - Write an extractor that does grep along with the wordcount for demonstration purpose
  - (Optional) Write an extractor that accepts csv file with say 3 columns (probably with values from weather or bacterial growth model (see Problem 2.2 below)) , calculate the average of a specific column
    - Provide step-by-step screenshots for writing such extractor.
Part 2: Teach to write a converter (30 mins)Part 2: Write a converter
- Start with the bd-template for converter- imagemagick
- Ask the participant to modify the converter input/output formats in the comment section. And see the result using the polyglot web UI for post and get
- Think of another software for which creating a converter is easy and interesting.
- Another example - ffmpeg converter - for audio & video
- Provide step-by-step instructions/screenshots of modifying imagemagick
Part 3: Uploading Teach to upload a converter or an extractor to locally installed Tools Catalog. (30mins)
- Step-by-step procedure to upload a tool, an input file and an output file without a docker file
Part 4 (Optional - For advanced user): Dockerize the tool

...

-------- OPTIONAL EXAMPLES --------

Problem 2 1 : Given a collection of text files from a survey or reviews for a book/movie, use sentiment analysis extractor to calculate the sentiment value for each file and group similar values together. (Emphasizes on extraction on unstructured data and useful analysis )

- A collection of text files with reviews
  - Obtain an examples dataset from the web.
- Let the participant use the python library of BD to obtain key/token and submit request to BD-API gateway
  - Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
  - Write a Python script that will serve as a stub for the BD client
    - - The participant will fill in the code to BD REST API call to submit their requests.
- Make sure the Sentiment Analysis extractor is running
- Saves the results for each text file in a single file with corresponding values
  - Provide code for this in stub script
- Create separate folders and move the file based on the sentiment value
  - Provide a code that will do the above action in the stub
- (Optional) Index text files along with the sentiment values and use ES visualization tool to search for documents with sentiment value less than some number.
Problem 42: Given a collection of *.xlsx files, obtain some results based on some columns value. (Emphasizes on extraction and analysis on scientific data)

...

Page tree

Versions Compared

Old Version 99

New Version 100

Key