Upcoming Milestones:

  • XSEDE Tutorial - July 18th
  • Advisory Board meeting - Mid August
  • Review - Mid October
  • User Workshop (end user engagement, hands on tutorials) - After review (late 2016, early 2017)

Priorities:

  1. Polyglot refactoring - needed to support docker deployments!
  2. Update all extractors to latest tech
    1. JSONLD
    2. Docker containers
    3. Extractor metadata registration
    4. pyclowder
    5. Add status messages to all extractors and fix level granularity
      1. Make status constants (DONE, ERROR)
      2. Arcgis multiprocessing extractor
    6. Register on on demand queues
    7. Standardize around python logging
    8. add entry to Tools catlaog, with icon, sample input/output
  3. Update converters in a similar fashion
    1. separate repose vs all in scripts folder in main repo
    2. dockerfiles
  4. Tutorial materials for end to end deployments (Tutorial Milestone in July!!)
    1. adding extractors/converters
    2. admin deploying extractors/converters (all current extractors/converters available for demonstration)
    3. building applications around API (finalized API, BD Fiddle, toy problem/application?)
  5. Polyglot information loss
    1. Implement alpha beta loss estimation described here, https://opensource.ncsa.illinois.edu/jira/browse/BD-680
  6. Provenance trails available for all requests
    1. DataWolf workflows
      1. Polyglot: added file.extension.log and file.extension.wf to each output file
      2. Clowder: new endpoint for workflow? (added each step is one of the extractors executed on the specific file)
      3. File verification info at each step using Sigfriend
  7. Add new tools
    1. Look at the ones in Jira labeled as "Extractors" and "Converters"
    2. Ankits new tools
    3. Praveen's students tools
    4. Josh's new tools?
    5. Support students into doing this rather than us!
  8. Large dataset support
    1. Long hanging fruit implementation?
    2. Add dts/dap endpoints that provide a downloadable list of the containers needed for a specified request (as opposed returning the resulted of executing those containers)
      1. Implement a light weight client application, e.g. modify python command line interface, to download these containers and run them client side
    3. Cache/host large datasets locally? - canned example for NARR data previously done, can we generalize?  How do we automate the decision process of knowing when do do this?
  9. Logstash and Kibana
    1. Add log stash to the docker file
    2. Make extractors and software servers logs consistent
      1. Standardize around python logging
      2. Don't forget java extractors (versus, audio)
  • No labels