Upcoming Milestones:

XSEDE

...

Tutorial - July 18th
Advisory Board meeting - Mid August
Review - Mid October
User Workshop

...

(end user engagement, hands on tutorials) - After review (late 2016, early 2017)

Priorities:

Polyglot refactoring - needed to support docker deployments!
Update all extractors to latest techstech
1. JSONLD
2. Docker containers
3. Extractor metadata registration
4. pyclowder
5. Add status messages to all extractors and fix level granularity
  1. Make status constants (DONE, ERROR)
  2. Arcgis multiprocessing extractor
6. Register on on demand queues
7. Standardize around python logging
Update converters in a similar fashion
1. separate repose vs all in scripts folder in main repo
2. dockerfiles
Tutorial materials for end to end deployments (Tutorial Milestone in July!!)
1. adding extractors/converters
2. admin deploying extractors/converters (all current extractors/converters available for demonstration)
3. building applications around API (finalized API, BD Fiddle, toy problem/application?)
Polyglot information loss
1. Implement alpha beta loss estimation described here, https://opensource.ncsa.illinois.edu/jira/browse/BD-680
Provenance trails available for all requests
1. DataWolf workflows
2. Data wolf
  1. Polyglot: add added file.jpgextension.log and file.jpgextension.wf to ideach output file
  2. Clowder: new endpoint for workflow? (added each step is one of the extractors executed on the specific file)
  3. File verification info at each step using SigfriendCheck file format at every step
Add new tools
1. Look at the ones in Jira labeled as "Extractors" and "Converters"
2. Ankits new tools
3. Praveen's new extractorstudents tools
4. Josh's new tools?
5. Support students into doing this rather than us!
Large dataset supportMove data vs move computation
1. Long hanging fruit implementation?Host
2. Add dts/dap endpoints that provide a downloadable list of the containers needed for a specified request (as opposed returning the resulted of executing those containers)
  1. Implement a light weight client application, e.g. modify python command line interface, to download these containers and run them client side
3. Cache/host large datasets locally? - canned example for NARR data previously done, can we generalize? How do we automate the decision process of knowing when do do this large files local?
Logstash and Kibana
1. Add log stash to the docker file
2. Make extractors and software servers logs consistent
  1. Standardize around python logging
  2. Don't forget java extractors (versus, audio)

Page tree

Versions Compared

Old Version 4

New Version 5

Key

Upcoming Milestones:

Priorities:

Page tree

Page History

Versions Compared

Old Version 4

New Version 5

Key

Upcoming Milestones:

Priorities: