Page History

...

2 problems + 1 optional problem
1 Extraction example - Given a collection of images with text embedded in it and audio files, extract all metadata using (face , ocr, speech2text) extractors, index and do content-based retrieval
1 Conversion example - Given a collection of old file formats of images and audios, convert to a format which participants can open
1 Combined example - Given an audio format, convert it to a format *.mp3 and then uses speech2text extractor to obtain the text.
1 Optional combined example - Problem 5

Environment Set-up Details (20 mins)

Participant will use his/her own laptop for this part

TODOS

We will provide a VM with everything pre-installed in it through Nebula.
- Rob Kooper will talk to Doug for this if we can spawn 50 VMs on Nebula for the tutorial session. (DONE) We will get 50 VMs on Nebula.
- Smruti Padhy Order 50+ flash drive for back up that will contain the VMs
- Smruti Padhy, Marcus Slavenas, (who else?) Create a VM with everything installed in it and take a snapshot which will then be deployed within Nebula. Approx. time required - 2 days
- Convert the VM created (using Openstack image) to virtualBox format (*vdi) and test the configurations.
- Make a list of all softwares required and the directory structure for the tutorial
- Write a script for deployment of 50 VMs from the VM snapshot that we created.
- Local installation of fence with local authentication. This is for backup to be provided in the preinstalled VM.
- Testing BD service with 50 concurrent users to perform conversion/extractions tasks
- Luigi Marini Maximum size of file that can be uploaded to Brown Dog needs to be controlled. This is require to ensure no one uploads any large files.
- Not sure of Jetstream yet.
- Provide clear instructions as how to access VMs in Nebula with proper credentials.
  - Clear Instruction of how to access the VMs (e.g., through ssh), from different OSes.
  - Need training accounts on nebula. Provide SSH key-pairs to each participant.
- (Before tutorial - wiki pages with clear instructions) Installs Python/R/MATLAB/cURL to use BD Service along with the library required in case any one interested in using the BD services in future.
  - Create wiki pages with clear instructions

...

Demonstration of use of BD Fiddle Fiddle (20 mins)
- Sign up for Brown Dog Service
- Obtain a key/token using curl or Postman or use of IPython notebook
- Use token and bd fiddle interface to obtain to see BD in action.
- Copy paste the python code snippet and use it the application to be explained next.
- Create a document for the demo with step-by-step screenshots for all above steps.
- Eugene Roeder Fix the CORS error for file url option (I think it is a known issue). Please add the JIRA issue number here.
- Fix the delay experienced when file is uploaded from local directory to the bdfiddle ui

...

Create applications using BD services
Three applications:
(50 mins)
- Conversion Example (15 mins): To convert Problem 1 : Given a collection of images/ps/odp/audio files to png/pdf/ppt/mp3. This will demonstrates that if you have a directory with files in old file formats, just use BD to get it all converted. (Emphasizes on conversion)
- TODO:
  - Provide a Python script for this and let participants use python library to use the BD service
  - Provide a Step-by-step instructions with screenshot to do this
- Extraction/Indexing/Retrieval Example (20 mins):
  
  with text embedded in it and audio files, search images/audio files based on its content. (
  Emphasizes on extraction on unstructured data, indexing and content-based retrieval)
- - Make sure OCR, Face, and speech2text extractors are running before starting the demo
  - One can upload images from local directory to obtain images or use external web service.
  - Let the participant use the python library of BD to obtain key/token and submit extraction request to BD-API gateway
- - Make sure OCR, face, speech2text extractor are running before starting the demo
- - Once technical metadata is obtained from BD, index it write the tags and technical metadata in to a local file /python dictionary.
  - Search the file based on the tags/technical metadata by search on the index file
  - TODO
    - Create an example dataset with images and audios with interesting query
    - (Optional) Provide a code snippet of using externel service to obtain images. e.g. Flicker API.
      - This will only be provided as an example and will not be used for the rest of the code.
    - Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
    - Write a Python script that will serve as a stub for the BD client
      - The participant will fill in the code to BD REST API call to submit their requests.
    - Write a python script that will index the tags and technical metadata in ESto a file. (Probably can use BD-CLI Library).
- Problem 2 : Given a collection of text files from a survey or reviews for a book/movie, use sentiment analysis extractor to calculate the sentiment value for each file and group similar values together. (Emphasizes on extraction on unstructured data and useful analysis )
- - A collection of text files with reviews
    - Obtain an examples dataset from the web.
  - Let the participant use the python library of BD to obtain key/token and submit request to BD-API gateway
    - Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
    - Write a Python script that will serve as a stub for the BD client
      - The participant will fill in the code to BD REST API call to submit their requests.
  - Make sure the Sentiment Analysis extractor is running
  - Saves the results for each text file in a single file with corresponding values
    - Provide code for this in stub script
  - Create separate folders and move the file based on the sentiment value
    - Provide a code that will do the above action in the stub
  - (Optional) Index text files along with the sentiment values and use ES visualization tool to search for documents with sentiment value less than some number.
- - .
- Problem 4: Given a collection of *.xlsx files, obtain some results based on some columns value. (Emphasizes on extraction and analysis on scientific data)

...

Page tree

Versions Compared

Old Version 95

New Version 96

Key