You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Goals

  • CSV files uploaded to Clowder are annotated with information about the variables contained within the file using standard vocabularies.
  • This metadata, together with metadata about the location or sensor attached to a dataset is used to automatically ingest data into the Geostreaming API.

Components

  • Clowder
    • Dataset is annotated with sensor information
      • Reuse existing relationship between dataset and sensor
      • Or... add metadata to dataset
  • Variable Annotation Extractor (VAE)
    • Annotate files with entries from standard vocabularies
      • Col. 3 contains term http://odm2/precipitation
      • Multiple mappings can be provided, each with their own likelihood
        • For example, if only 9 out of 10 columns match a prior mapping, likelihood is 90%
        • Or percentage of files seen with this type of mapping
  • Variables Mapping Service (VMS)
    • Tracks mappings between strings (column headers) and standard vocabularies (uri terms)
  • Semantic Annotation Service (SAS)
  • Datapoints Extractor (DPE)
    • Creates datapoints in the Geostreaming API based on rows in the CSV input file
    • Requires mapping from Variable Annotation Extractor
    • Site information as metadata on dataset
  • Geostreaming Data Framework

Workflow

  • File F1 (CSV) uploaded to dataset D1
  • VAE reads headers in
  • VAE requests matching mappings from mapping service VMS
  • VAE adds metadata entries to file F1
  • DPE extracts datapoints from CSV and adds them to GSAPI

Tasks

  • Update https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-csv to store more information
    • which column has which header
    • include column number and label, for example (3, "temperature)
  • Develop Variables Mapping Service (VMS)
    • Simple flask app with mongodb back end
  • Variable Annotation Extractor (VAE)
    • En extension of the extractor-csv that queries the VMS and stores standard names in metadata
      • We should support multiple mappings added to metadata
  • Figure out where the frontend should be
    • Standalone client
    • Clowder add metadata widget


  • No labels