Upcoming Milestones:
- XSEDE Tutorial - July 18th
- Advisory Board meeting - Mid August
- Review - Mid October
- User Workshop (end user engagement, hands on tutorials) - After review (late 2016, early 2017)
Priorities:
- Polyglot refactoring - needed to support docker deployments!
- Update all extractors to latest tech
- JSONLD
- Docker containers
- Extractor metadata registration
- pyclowder
- Add status messages to all extractors and fix level granularity
- Make status constants (DONE, ERROR)
- Arcgis multiprocessing extractor
- Register on on demand queues
- Standardize around python logging
- add entry to Tools catlaog, with icon, sample input/output
- Update converters in a similar fashion
- separate repose vs all in scripts folder in main repo
- dockerfiles
- Tutorial materials for end to end deployments (Tutorial Milestone in July!!)
- adding extractors/converters
- admin deploying extractors/converters (all current extractors/converters available for demonstration)
- building applications around API (finalized API, BD Fiddle, toy problem/application?)
- Polyglot information loss
- Implement alpha beta loss estimation described here, https://opensource.ncsa.illinois.edu/jira/browse/BD-680
- Provenance trails available for all requests
- DataWolf workflows
- Polyglot: added file.extension.log and file.extension.wf to each output file
- Clowder: new endpoint for workflow? (added each step is one of the extractors executed on the specific file)
- File verification info at each step using Sigfriend
- Add new tools
- Look at the ones in Jira labeled as "Extractors" and "Converters"
- Ankits new tools
- Praveen's students tools
- Josh's new tools?
- Support students into doing this rather than us!
- Large dataset support
- Long hanging fruit implementation?
- Add dts/dap endpoints that provide a downloadable list of the containers needed for a specified request (as opposed returning the resulted of executing those containers)
- Implement a light weight client application, e.g. modify python command line interface, to download these containers and run them client side
- Cache/host large datasets locally? - canned example for NARR data previously done, can we generalize? How do we automate the decision process of knowing when do do this?
- Logstash and Kibana
- Add log stash to the docker file
- Make extractors and software servers logs consistent
- Standardize around python logging
- Don't forget java extractors (versus, audio)
{"serverDuration": 109, "requestCorrelationId": "31c75d95f57596c8"}