Upcoming Milestones:

XSEDE

...

Tutorial - July 18th
Advisory Board meeting - Mid August
Review - Mid October
User Workshop

...

(end user engagement, hands on tutorials) - After review (late 2016, early 2017)

Priorities:

Polyglot refactoring - needed to support docker deployments!
Update all extractors to latest techstech
1. JSONLD
2. Docker containers
3. Extractor metadata registration
4. pyclowder
5. Add status messages to all extractors and fix level granularity
  1. Make status constants (DONE, ERROR)
  2. Arcgis multiprocessing extractor
6. Register on on demand queues
7. Standardize around python logging
8. add entry to Tools catlaog, with icon, sample input/output
Update converters in a similar fashion
1. separate repose vs all in scripts folder in main repo
2. dockerfiles
Tutorial materials for end to end deployments (Tutorial Milestone in July!!)
1. adding extractors/converters
2. admin deploying extractors/converters (all current extractors/converters available for demonstration)
3. building applications around API (finalized API, BD Fiddle, toy problem/application?)
Polyglot information loss
1. Implement alpha beta loss estimation described here, https://opensource.ncsa.illinois.edu/jira/browse/BD-680
Provenance trails available for all requests
1. DataWolf workflows
2. Data wolf
  1. Polyglot: add added file.jpgextension.log and file.jpgextension.wf to ideach output file
  2. Clowder: new endpoint for workflow? (added each step is one of the extractors executed on the specific file)
  3. File verification info at each step using SigfriendCheck file format at every step
Add new tools
1. Look at the ones in Jira labeled as "Extractors" and "Converters"
2. Ankits new tools
3. Praveen's new extractorstudents tools
4. Josh's new tools?
5. Support students into doing this rather than us!
Large dataset supportMove data vs move computation
1. Long hanging fruit implementation?Host
2. Add dts/dap endpoints that provide a downloadable list of the containers needed for a specified request (as opposed returning the resulted of executing those containers)
  1. Implement a light weight client application, e.g. modify python command line interface, to download these containers and run them client side
3. Cache/host large datasets locally? - canned example for NARR data previously done, can we generalize? How do we automate the decision process of knowing when do do this large files local?
Logstash and Kibana
1. Add log stash to the docker file
2. Make extractors and software servers logs consistent
  1. Standardize around python logging
  2. Don't forget java extractors (versus, audio)
BDFiddle
1. Automatic Process Adjustments
  1. Multiple results panes
    1. Extraction Results
    2. Conversions Results
  2. Remove colon on Extractors/Converters
    1. Extract
    2. Convert To
  3. Flip conversion and extractors boxes for real estate
  4. Website Security
    1. Use an anonymous token/key with limits on file size and submissions. (Long Term - Not In Scope)
    2. Login using user/name and password
      1. Sign-In page first
      2. Get key
      3. Fetch token
      4. Key and token displayed on top of page
  5. Indent code snippet buttons to line up with code pane
  6. Links for setup by code snippets
2. Manual Process
  1. Metadata (Extraction)
    1. Allow selection of multiple metadata tools
    2. Pick only one tool to start
    3. Display error from extractor if it fails -> Need clear errors in the extractors
    4. List each tool specifically -> Get tools from tool catelog
  2. Conversion
  3. Populate output (conversion) based on the input type of the file
  4. User will then select conversion format, which will then populate a list of tools to do the conversion

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Upcoming Milestones:

Priorities:

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

Upcoming Milestones:

Priorities: