Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Create applications using BD services (50 mins)
    • Conversion Example (15 mins):  To convert a collection of images/ps/odp/audio/video files to png/pdf/ppt/mp3.  This will demonstrates that if you have a directory with files in old file formats, just use BD to get it all converted without requiring to install any software. (Emphasizes on conversion)
      • Make sure imagemagick and ffmpeg converters are running before the demo
      • Obtain the 
      • TODO:
        •  Provide a Python script for this and let participants use python library to use the  BD service
        •  Provide a Step-by-step instructions with screenshot to do this

    • Extraction/Indexing/Retrieval Example (20 mins): 

      Given a collection of images with text embedded in it, and audio files, search images/audio files based on its content. (

      Emphasizes on extraction on unstructured data, indexing and content-based retrieval)

      • Make sure OCR, face, and speech2text extractors are running before starting the demo
      • One can upload images from local directory to obtain images or use external web service.  
      • Let the participant use the python library of BD to obtain key/token and submit extraction request to BD-API gateway
      • Once technical metadata is obtained from BD, write the tags and technical metadata to a local file /python dictionary.
      • Search the file based on the tags/technical metadata by linear search on the index file

      • TODO
        •  Create an example dataset with images and audios to which we can make interesting query
        •  (Optional) Provide a code snippet of using externel service to obtain images. e.g. Flicker API.
            •  This will only be provided as an example and will not be used for the rest of the code.
        •  Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
        •  Write a Python script that will serve as a stub for the BD client
          • The participant will fill in the code to use python library to call BD REST API and submit their requests. 
          •  The python script  should write the tags and technical metadata to a local file. (Probably can use python library's index method that writes it as feature vectors.)
        •  Write a Python script to make interesting search/query to the index file. Again probably use the python library's find method or just read the local file. 
    • Problem 4: Given a collection of *.xlsx files, obtain some results based on some columns value. (Emphasizes on extraction and analysis on scientific data)

                     An example could be - Given a *.xlsx file with max and min temperature for each day of a month. Calculate average temperature max/min/standard deviation for each month.

        •     
    •  Conversion & Extraction Example (15 min):  Given an audio file of different format, convert it to a format that speech2text extractor accepts and obtains the text for the audio. 
      • TODO
        •  Write script that does the conversion and then sends the converted file to the BD service.
        •  Create a step-by-step instructions document with screenshots.

    • (Optional) Combination of Conversion & Extraction Example
    • Convert *.xlsx file to *.csv using conversion API so that you can see the content of the file on the VM. We are not installing any office software on the VM.
    • use extraction API to extract columns from the file and
    • Perform some analysis and add to the technical metadata
     Write an extractor/converter for this problem
  • This should be an enticing yet simple problem that can handle many spreadsheets and get a result.
  • Ideas
    1. An algebra 101, traveling trains problem.  2 trains leave 2 different stations on tracks heading toward a junction.  Given a spreadsheet with departure times, distances, velocities, etc., upload all the spreadsheets and determine if they will crash.
      1. This problem is simple and would provide the user an easily understood problem that can clearly be scaled to much more involved traffic problems.  
      2. However, it doesn't really present a cool new idea.  It may be preferable to think of something more cutting edge technology
    2. A bacterial growth model.  Given a culture with varied conditions, eg. pH, stored in multiple spreadsheets, determine the growth rate.  Might be able to base this on http://mathinsight.org/bacteria_growth_initial_model
      1. This would require a few minutes of explanation of the model and would require some learning by the developer of the extractor.
      2. Still maybe not that enticing.
    Better Ideas?
    •  Provide a Python script for obtaining the input files and use BD REST API for to obtain the result.            
    Problem 5
    • : Obtaining Ameriflux data and converting into *.clim format (similar to csv format but tab separated) for SNIPET model.  Calculate average air temperature and its standard deviation. (This will emphasize both conversion and analysis)
      •    Write a R/Python script to call BD conversion API and get data in *clim format and calculate average air temperature. Also plot a graph of the data. 
        •  Installation of Rstudio server version.
      •  (Optional) to calculate average temperature, call BD extraction service. For this write an extractor that accepts *.clim file and outputs average temperature.
  •   Problem 6:  Audio converter , Speech to text extractor

...

 

How to add Your Tool to Brown Dog Services (1h 30m)

...

  • Tutorial feedback form
  • Announcement of next user workshop

 

-------- OPTIONAL EXAMPLES -------

  • Problem 2 : Given a collection of text files from a survey or reviews for a book/movie, use sentiment analysis extractor to calculate the sentiment value for each file and group similar values together. (Emphasizes on extraction on unstructured data and useful analysis )
    • A collection of text files with reviews
      •  Obtain an examples dataset from the web.
    • Let the participant use the python library of BD to obtain key/token and submit request to BD-API gateway
      •  Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
      •  Write a Python script that will serve as a stub for the BD client
          • The participant will fill in the code to BD REST API call to submit their requests.
    • Make sure the Sentiment Analysis extractor is running
    • Saves the results for each text file in a single file with corresponding values
      •  Provide code for this in stub script
    • Create  separate folders and move the file based on the sentiment value
      •  Provide a code that will do the above action in the stub
    • (Optional) Index text files along with the sentiment values and use ES visualization tool to search for documents with sentiment value less than some number.
  • Problem 4: Given a collection of *.xlsx files, obtain some results based on some columns value. (Emphasizes on extraction and analysis on scientific data)

         An example could be - Given a *.xlsx file with max and min temperature for each day of a month. Calculate average temperature max/min/standard deviation for each month.

      • Convert *.xlsx file to *.csv using conversion API so that you can see the content of the file on the VM. We are not installing any office software on the VM.
      • use extraction API to extract columns from the file and
      • Perform some analysis and add to the technical metadata
      •  Write an extractor/converter for this problem
        • This should be an enticing yet simple problem that can handle many spreadsheets and get a result.
        • Ideas
          1. An algebra 101, traveling trains problem.  2 trains leave 2 different stations on tracks heading toward a junction.  Given a spreadsheet with departure times, distances, velocities, etc., upload all the spreadsheets and determine if they will crash.
            1. This problem is simple and would provide the user an easily understood problem that can clearly be scaled to much more involved traffic problems.  
            2. However, it doesn't really present a cool new idea.  It may be preferable to think of something more cutting edge technology
          2. A bacterial growth model.  Given a culture with varied conditions, eg. pH, stored in multiple spreadsheets, determine the growth rate.  Might be able to base this on http://mathinsight.org/bacteria_growth_initial_model
            1. This would require a few minutes of explanation of the model and would require some learning by the developer of the extractor.
            2. Still maybe not that enticing.
          • Better Ideas?

      •  Provide a Python script for obtaining the input files and use BD REST API for to obtain the result.