Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Upcoming Milestones:

...

...

  • (end user engagement, hands on tutorials) - After review (late 2016, early 2017)

Priorities:

  1. Polyglot refactoring - needed to support docker deployments!
  2. Update all extractors to latest techstech
    1. JSONLD
    2. Docker containers
    3. Extractor metadata registration
    4. pyclowder
    5. Add status messages to all extractors and fix level granularity
      1. Make status constants (DONE, ERROR)
      2. Arcgis multiprocessing extractor
    6. Register on on demand queues
    7. Standardize around python logging
  3. Update converters in a similar fashion
    1. separate repose vs all in scripts folder in main repo
    2. dockerfiles
  4. Tutorial materials for end to end deployments (Tutorial Milestone in July!!)
    1. adding extractors/converters
    2. admin deploying extractors/converters (all current extractors/converters available for demonstration)
    3. building applications around API (finalized API, BD Fiddle, toy problem/application?)
  5. Polyglot information loss
    1. Implement alpha beta loss estimation described here, https://opensource.ncsa.illinois.edu/jira/browse/BD-680
  6. Provenance trails available for all requests
    1. DataWolf workflows
    2. Data wolf
      1. Polyglot: add added file.jpgextension.log and file.jpgextension.wf to ideach output file
      2. Clowder: new endpoint for workflow? (added each step is one of the extractors executed on the specific file)
      3. File verification info at each step using SigfriendCheck file format at every step
  7. Add new tools
    1. Look at the ones in Jira labeled as "Extractors" and "Converters"
    2. Ankits new tools
    3. Praveen's new extractorstudents tools
    4. Josh's new tools?
    5. Support students into doing this rather than us!
  8. Large dataset supportMove data vs move computation
    1. Long hanging fruit implementation?Host
    2. Add dts/dap endpoints that provide a downloadable list of the containers needed for a specified request (as opposed returning the resulted of executing those containers)
      1. Implement a light weight client application, e.g. modify python command line interface, to download these containers and run them client side
    3. Cache/host large datasets locally? - canned example for NARR data previously done, can we generalize?  How do we automate the decision process of knowing when do do this large files local?
  9. Logstash and Kibana
    1. Add log stash to the docker file
    2. Make extractors and software servers logs consistent
      1. Standardize around python logging
      2. Don't forget java extractors (versus, audio)