...
- Create applications using BD services (50 mins)
- Conversion Example (15 mins): To convert a collection of images/ps/odp/audio/video files to png/pdf/ppt/mp3. This will demonstrates that if you have a directory with files in old file formats, just use BD to get it all converted without requiring to install any software. (Emphasizes on conversion)
- Make sure imagemagick and ffmpeg converters are running before the demo
- Obtain the
- TODO:
- Provide a Python script for this and let participants use python library to use the BD service
- Provide a Step-by-step instructions with screenshot to do this
Extraction/Indexing/Retrieval Example (20 mins):
Given a collection of images with text embedded in it, and audio files, search images/audio files based on its content. (
Emphasizes on extraction on unstructured data, indexing and content-based retrieval)
- Make sure OCR, face, and speech2text extractors are running before starting the demo
- One can upload images from local directory to obtain images or use external web service.
- Let the participant use the python library of BD to obtain key/token and submit extraction request to BD-API gateway
- Once technical metadata is obtained from BD, write the tags and technical metadata to a local file /python dictionary.
- Search the file based on the tags/technical metadata by linear search on the index file
- TODO
- Create an example dataset with images and audios to which we can make interesting query
- (Optional) Provide a code snippet of using externel service to obtain images. e.g. Flicker API.
- This will only be provided as an example and will not be used for the rest of the code.
- Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
- Write a Python script that will serve as a stub for the BD client
- The participant will fill in the code to use python library to call BD REST API and submit their requests.
- The python script should write the tags and technical metadata to a local file. (Probably can use python library's index method that writes it as feature vectors.)
- Write a Python script to make interesting search/query to the index file. Again probably use the python library's find method or just read the local file.
- Problem 4: Given a collection of *.xlsx files, obtain some results based on some columns value. (Emphasizes on extraction and analysis on scientific data)
- Conversion Example (15 mins): To convert a collection of images/ps/odp/audio/video files to png/pdf/ppt/mp3. This will demonstrates that if you have a directory with files in old file formats, just use BD to get it all converted without requiring to install any software. (Emphasizes on conversion)
An example could be - Given a *.xlsx file with max and min temperature for each day of a month. Calculate average temperature max/min/standard deviation for each month.
- Conversion & Extraction Example (15 min): Given an audio file of different format, convert it to a format that speech2text extractor accepts and obtains the text for the audio.
- TODO
- Write script that does the conversion and then sends the converted file to the BD service.
- Create a step-by-step instructions document with screenshots.
- TODO
- (Optional) Combination of Conversion & Extraction Example
- Convert *.xlsx file to *.csv using conversion API so that you can see the content of the file on the VM. We are not installing any office software on the VM.
- use extraction API to extract columns from the file and
- Perform some analysis and add to the technical metadata
- This should be an enticing yet simple problem that can handle many spreadsheets and get a result. Ideas
- An algebra 101, traveling trains problem. 2 trains leave 2 different stations on tracks heading toward a junction. Given a spreadsheet with departure times, distances, velocities, etc., upload all the spreadsheets and determine if they will crash.
- This problem is simple and would provide the user an easily understood problem that can clearly be scaled to much more involved traffic problems.
- However, it doesn't really present a cool new idea. It may be preferable to think of something more cutting edge technology
- A bacterial growth model. Given a culture with varied conditions, eg. pH, stored in multiple spreadsheets, determine the growth rate. Might be able to base this on http://mathinsight.org/bacteria_growth_initial_model
- This would require a few minutes of explanation of the model and would require some learning by the developer of the extractor.
- Still maybe not that enticing.
- Provide a Python script for obtaining the input files and use BD REST API for to obtain the result.
- : Obtaining Ameriflux data and converting into *.clim format (similar to csv format but tab separated) for SNIPET model. Calculate average air temperature and its standard deviation. (This will emphasize both conversion and analysis)
- Write a R/Python script to call BD conversion API and get data in *clim format and calculate average air temperature. Also plot a graph of the data.
- Installation of Rstudio server version.
- (Optional) to calculate average temperature, call BD extraction service. For this write an extractor that accepts *.clim file and outputs average temperature.
- Write a R/Python script to call BD conversion API and get data in *clim format and calculate average air temperature. Also plot a graph of the data.
- Problem 6: Audio converter , Speech to text extractor
...
How to add Your Tool to Brown Dog Services (1h 30m)
...
- Tutorial feedback form
- Announcement of next user workshop
-------- OPTIONAL EXAMPLES -------
- Problem 2 : Given a collection of text files from a survey or reviews for a book/movie, use sentiment analysis extractor to calculate the sentiment value for each file and group similar values together. (Emphasizes on extraction on unstructured data and useful analysis )
- A collection of text files with reviews
- Obtain an examples dataset from the web.
- Let the participant use the python library of BD to obtain key/token and submit request to BD-API gateway
- Provide the link to the current BD REST API and create a document/wiki page showing step-by-step screenshots of obtaining a key/token using python library.
- Write a Python script that will serve as a stub for the BD client
- The participant will fill in the code to BD REST API call to submit their requests.
- Make sure the Sentiment Analysis extractor is running
- Saves the results for each text file in a single file with corresponding values
- Provide code for this in stub script
- Create separate folders and move the file based on the sentiment value
- Provide a code that will do the above action in the stub
- (Optional) Index text files along with the sentiment values and use ES visualization tool to search for documents with sentiment value less than some number.
- A collection of text files with reviews
- Problem 4: Given a collection of *.xlsx files, obtain some results based on some columns value. (Emphasizes on extraction and analysis on scientific data)
An example could be - Given a *.xlsx file with max and min temperature for each day of a month. Calculate average temperature max/min/standard deviation for each month.
- Convert *.xlsx file to *.csv using conversion API so that you can see the content of the file on the VM. We are not installing any office software on the VM.
- use extraction API to extract columns from the file and
- Perform some analysis and add to the technical metadata
- Write an extractor
/converter for this problem- This should be an enticing yet simple problem that can handle many spreadsheets and get a result.
- Ideas
- An algebra 101, traveling trains problem. 2 trains leave 2 different stations on tracks heading toward a junction. Given a spreadsheet with departure times, distances, velocities, etc., upload all the spreadsheets and determine if they will crash.
- This problem is simple and would provide the user an easily understood problem that can clearly be scaled to much more involved traffic problems.
- However, it doesn't really present a cool new idea. It may be preferable to think of something more cutting edge technology
- A bacterial growth model. Given a culture with varied conditions, eg. pH, stored in multiple spreadsheets, determine the growth rate. Might be able to base this on http://mathinsight.org/bacteria_growth_initial_model
- This would require a few minutes of explanation of the model and would require some learning by the developer of the extractor.
- Still maybe not that enticing.
- Better Ideas?
- An algebra 101, traveling trains problem. 2 trains leave 2 different stations on tracks heading toward a junction. Given a spreadsheet with departure times, distances, velocities, etc., upload all the spreadsheets and determine if they will crash.
- Provide a Python script for obtaining the input files and use BD REST API for to obtain the result.