...
Upcoming Milestones:
...
- Tutorial - July 18th
- Advisory Board meeting - Mid August
- Review - Mid October
- User Workshop
...
- (end user engagement, hands on tutorials) - After review (late 2016, early 2017)
Priorities:
- Polyglot refactoring - needed to support docker deployments!
- Update all extractors to latest techstech
- JSONLD
- Docker containers
- Extractor metadata registration
- pyclowder
- Add status messages to all extractors and fix level granularity
- Make status constants (DONE, ERROR)
- Arcgis multiprocessing extractor
- Register on on demand queues
- Standardize around python logging
- add entry to Tools catlaog, with icon, sample input/output
- Update converters in a similar fashion
- separate repose vs all in scripts folder in main repo
- dockerfiles
- Tutorial materials for end to end deployments (Tutorial Milestone in July!!)
- adding extractors/converters
- admin deploying extractors/converters (all current extractors/converters available for demonstration)
- building applications around API (finalized API, BD Fiddle, toy problem/application?)
- Polyglot information loss
- Implement alpha beta loss estimation described here, https://opensource.ncsa.illinois.edu/jira/browse/BD-680
- Provenance trails available for all requests
- DataWolf workflows
- Data wolf
- Polyglot: add added file.jpgextension.log and file.jpgextension.wf to ideach output file
- Clowder: new endpoint for workflow? (added each step is one of the extractors executed on the specific file)
- File verification info at each step using SigfriendCheck file format at every step
- Add new tools
- Look at the ones in Jira labeled as "Extractors" and "Converters"
- Ankits new tools
- Praveen's new extractorstudents tools
- Josh's new tools?
- Support students into doing this rather than us!
- Large dataset supportMove data vs move computation
- Long hanging fruit implementation?Host
- Add dts/dap endpoints that provide a downloadable list of the containers needed for a specified request (as opposed returning the resulted of executing those containers)
- Implement a light weight client application, e.g. modify python command line interface, to download these containers and run them client side
- Cache/host large datasets locally? - canned example for NARR data previously done, can we generalize? How do we automate the decision process of knowing when do do this large files local?
- Logstash and Kibana
- Add log stash to the docker file
- Make extractors and software servers logs consistent
- Standardize around python logging
- Don't forget java extractors (versus, audio)
Automatic Process Adjustments
Multiple results panes
Extraction Results
Conversions Results
Remove colon on Extractors/Converters
Extract
Convert To
Flip conversion and extractors boxes for real estate
Website Security
Use an anonymous token/key with limits on file size and submissions. (Long Term - Not In Scope)
Login using user/name and password
Sign-In page first
Get key
Fetch token
Key and token displayed on top of page
Indent code snippet buttons to line up with code pane
Links for setup by code snippets
- Manual Process
- Metadata (Extraction)
- Allow selection of multiple metadata tools
- Pick only one tool to start
- Display error from extractor if it fails -> Need clear errors in the extractors
- List each tool specifically -> Get tools from tool catelog
- Conversion
- Populate output (conversion) based on the input type of the file
- User will then select conversion format, which will then populate a list of tools to do the conversion Polyglot will give the list of available tools by conversion format
- Metadata (Extraction)