Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

How to use Brown Dog Services (1h 30m)

20-30 mins setup

2 problems + 1 optional problem

1 extraction - Problem 1 -face and ocr, audio extractor

1 conversion - Problem 3 - collection of old file formats of images,audios

1 combined - Problem 6 - a single script combining together 

1 optional - problem 5

 

This is a session to teach how to use Brown Dog Services

  • Participant will use his/her own laptop for this part
    • We will provide a VM with everything pre-installed in it through Nebula. 
      •  Rob Kooper will talk to Doug for this if we can spawn 50 VMs on Nebula for the tutorial session. (DONE) We will get  50 VMs on Nebula.
      •  Smruti PadhyOrder 50+ flash drive for back up that will contain the VMs
      •  Create a VM with everything installed in it and take a snapshot which will then be deployed within Nebula. Approx. time required - 2 days
      •  Make a list of all softwares required and the directory structure for the tutorial
      •  local installation of fence and local authentication.
      •  50 concurrent users to perform conversion/extractions tests
      •  Luigi Marini Max size on File to be uploaded
      •  Not sure of Jetstream yet. 
    • Provide clear instructions as how to access VMs in Nebula with proper credentials.
      •  Clear Instruction of how to access the VMs (e.g., through ssh), from different OSes.
      •  Do we need training accounts on nebula? 
    • (Before tutorial - wiki pages with clear instructions) Installs Python/R/MATLAB/cURL to use BD Service along with the library required in case any one interested in using the BD services in future.
      •  Create wiki pages with clear instructions
  • Demonstration of use of BD Fiddle 
    • Sign up for Brown Dog Service
    • Obtain a key/token using curl or Postman or use of IPython notebook
    • Use token and bd fiddle interface to obtain to see BD in action. 
    • Copy paste the python code snippet and use it the application to be explained next. 
    •  Create a document for the demo with step-by-step screenshots
    •  Fix the CORS error for file url option (I think it is a known issue)
    •  Delay when file is uploaded from local directory
  • Create an applications using BD services
    Three applications:
    • Problem 1 : Given a collection of images with text embedded in it, try to search images based on its content. (Emphasizes on extraction on unstructured data, indexing and content-based retrieval)
      • One can upload images from local directory to obtain images or use external web service.  
          •  Create an example dataset with images with interesting query
          •  Provide a code snippet of using externel service to obtain images. e.g. Flicker API.
            •  This will only be provided as an example and will not be used for the rest of the code.

      • Let the participant use the python library of BD to obtain key/token and submit request to BD-API gateway
        •  Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
        •  Write a Python script that will serve as a stub for the BD client
            • The participant will fill in the code to BD REST API call to submit their requests.
      • Make sure OCR and face extractor are running before starting the demo
      • Python
      • Make sure the Elasticsearch is started before the example files are submitted to BD service
        •  Provide Instructions to start Elasticsearch and start a webclient to it for visualization.
          • Make sure the cluster name in the config.yml differs for each participant.
      • Once technical metadata is obtained from BD, index it tags and technical metadata in an locally running Elasticsearch.
        •  Write a python script that will index the technical metadata in ES
      • Search for the image using ES query
        •  Provide ES query for search
    • Problem 2 : Given a collection of text files from a survey or reviews for a book/movie, use sentiment analysis extractor to calculate the sentiment value for each file and group similar values together. (Emphasizes on extraction on unstructured data and useful analysis )
      • A collection of text files with reviews
        •  Obtain an examples dataset from the web.
      • Let the participant use the python library of BD to obtain key/token and submit request to BD-API gateway
        •  Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
        •  Write a Python script that will serve as a stub for the BD client
            • The participant will fill in the code to BD REST API call to submit their requests.
      • Make sure the Sentiment Analysis extractor is running
      • Saves the results for each text file in a single file with corresponding values
        •  Provide code for this in stub script
      • Create  separate folders and move the file based on the sentiment value
        •  Provide a code that will do the above action in the stub
      • (Optional) Index text files along with the sentiment values and use ES visualization tool to search for documents with sentiment value less than some number.


    • Problem 3: Use BD conversion to convert a collection of images/ps/odp files to png/pdf/ppt.  This will demonstrates that if you have a directory with files in old file formats, just use BD to get it all converted. (Emphasies on conversion)
      •  Provide a Python script for this and let Participant use python library to use the  BD service

    • Problem 4: Given a collection of *.xlsx files, obtain some results based on some columns value. (Emphasizes on extraction and analysis on scientific data)

...

      •    Write a R/Python script to call BD conversion API and get data in *clim format and calculate average air temperature. Also plot a graph of the data. 
        •  Installation of Rstudio server version.
      •  (Optional) to calculate average temperature, call BD extraction service. For this write an extractor that accepts *.clim file and outputs average temperature.
  •    Problem 6:  Audio converter , Speech to text extractor

...

 

How to add Your Tool to Brown Dog Services (1h 30m)

...