CZO: Geostreaming Data Framework Integration

Goals

CSV files uploaded to Clowder are annotated with information about the variables contained within the file using standard vocabularies.
This metadata, together with metadata about the location or sensor attached to a dataset is used to automatically ingest data into the Geostreaming API.

For example, if only 9 out of 10 columns match a prior mapping, likelihood is 90%
Or percentage of files seen with this type of mapping

Variables Mapping Service (VMS) BD-2310 - Getting issue details... STATUS
- POST/GET/PUT/DELETE mappings
- The collection in MongoDB contains documents that represent mappings
  - Each mapping is a collection of mappings between strings (column headers) and standard vocabularies (uri terms)
  - How many times have seen a particular mapping (how many unique files)
  - When a mapping is not complete, i.e. we can only identify a subset of the columns, we should keep track of how many we columns we successfully identified
    - let's say a csv file has 10 columns, but we can only tag 4, we would have 40% accuracy
- Maybe keep a collection of what files match what mapping
- SEARCH for mappings that match a set of CSV headers and return them in order of accuracy
  - Client submits one list of CSV column names, service returns a list of potential mappings including accuracies.
- Dockerize the service:
  - BD-2318 - Getting issue details... STATUS
Semantic Annotation Service (SAS)
- http://ecgs.ncsa.illinois.edu/SAS.html
- We should build a simpler version of this as a Flask application storing info in MongoDB
Datapoints Extractor (DPE)

Geostreaming Data Framework
- Store and visualize datapoints
- https://geodashboard.ncsa.illinois.edu/
- Geostreaming API (GSAPI)

~~Update https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-csv to store more information~~ (Decided as Won't Do.)
- ~~which column has which header~~
- ~~include column number and label, for example (3, "temperature)~~
Develop Variables Mapping Service (VMS)
- Simple flask app with mongodb back end
Variable Annotation Extractor (VAE)
- En extension of the extractor-csv that queries the VMS and stores standard names in metadata
  - We should support multiple mappings added to metadata
Figure out where the frontend should be
- Standalone client
- Clowder add metadata widget