Raw data from loggernet is uploaded here:

http://imlczo.ncsa.illinois.edu/clowder/datasets/573a361ee4b0e63138ea0c24?space=570e7d67e4b0ed483bd2da27

The parser is setup to create a dataset for each instrument:

http://imlczo-dev.ncsa.illinois.edu/clowder/collection/57d8381be4b038a443080a61

The development site contains a subset of the data parsed into the Geostreaming API and available in the Geodashboard.

For basic metadata about the sensor, call the this API endpoint with the sensor name, for example for "Li7500" call:

http://imlczo-dev.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=Li7500

Or search the full list of sensors:

http://imlczo-dev.ncsa.illinois.edu/clowder/api/geostreams/sensors

List of parameters definitions:

https://docs.google.com/spreadsheets/d/1S5KZcsGX-0BmmYKVZ3DAPTes_OS6uZU079R7_xaFBSw/edit#gid=375472075

Sensors List

Note: This data will come and go as we work on the parsers.

Current Parser Implementation

The parser check for new files from loggernet available in Clowder and goes from there:

  • In the Flux Tower Data project space, check the Flux Tower Raw Files dataset and get all the files in it.
    • This should be an extractor that monitors new files in this dataset.
  • For each file that comes in, the file contains data for over 100+ measurements. So the parser looks at the columns in the file and maps them to the instrument that created them. From there, we start creating CSV files for each instrument, and we put them in a Collection called Flux Tower Instruments.
  • Inside of Flux Tower Instruments, we find or create a Dataset with the same name as the Instrument.
  • We are find or create a Geostreams Sensor with the same name as the instrument, and create a stream if necessary.
  • In the Dataset, we upload the CSV file.
  • Before we create the datapoint, we record the metadata. This includes the URL of the original file, the derived file (the CSV file we created), and the new Dataset for this instrument.
  • We check if the datapoint exists
    • Currently this is a very expensive operations and should be removed in case of batch ingestion
  • We create each of the datapoints and record their ID
  • We go back to the Instrument's Dataset and record Metadata on the Derived File that shows the unique parameters in the datapoints, a link to each datapoint, and the original file used to create the CSV file.
  • We go back to the original file in the Flux Tower Raw Files dataset and record metadata showing the Derived File we created.
  • The original raw file should always stay intact and unmodified.
  • Any files or data created from the original file will have a paper trail so that the process can be reproduced.
  • No labels