Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The

...

Geostreams Data Framework provides data management capabilities and web application interfaces for the management and visualization of geostreaming data.

To maximize flexibility in supporting heterogeneous data sources, the framework includes four components:

  1. a geostreams web service API to store and serve the normalized data
  2. a geodashboard web application providing web interfaces to visualize, interact and retrieve the data
  3. data parsing software libraries written in Python to normalize the data from different data sources into one common schema
  4. Clowder, a web based data management system to store, curate and analyze raw files and associated metadata.

The four components interact to provide pre-processing, cleaning, and visualization of geospatial earth science time series data such as water health data. The raw data from various sources are ingested into the geo-temporal web service API using a variety of data parsers. The parsers organize raw data into an information model composed of three main entities: sensors, streams, and datapoints. The geo-temporal API web service provides methods to query the ingested data by different software clients, including the geodashboard web application.

Projects currently using and developing the software:

Source code: 

Recently Updated
showProfilePictrue

Image Removed

Figure 1: Great Lakes GeoDashboard and Data Views

While this approach works well for static data, it does not easily scale to accommodate dynamic data sizes or heterogeneous data types.   The current approach has the following limitations:

  1. All new data types will require the creation of a new data ingestion code (parser) and the creation of new file types for the client.  
  2. The current solution does not provide an ability to search, filter, or refine the data queries to meet specific user needs.     
  3. As the file sizes increase, the download and display time will get slower.
To address these issues, NCSA working with Sea Grant defined Phase II.  The first objective of Phase II is to address the scalability and extensibility deficiencies by refactoring the back end of the dashboard to: (1) host all data directly from a database that supports geospatial and temporal queries, (2) create standard RESTful interfaces to the database so that Web applications and client script can access the data, and (3) provide a mechanism to upload new data types and allow the user to map the new data parameters to the existing model.  Phase II will also extend data and visualization capabilities of the Geo Dashboard to:
  1. Incorporate additional data sources that include:
    1. Biological data (phytoplankton, zooplankton, benthos);
    2. Near-shore habitat data;
    3. Sea Bird and Triaxis sensors; and
    4. Data from other government and state agencies, including STORET and USGS data and basin/watershed boundary files.
  2. Create additional data visualizations for unique data sets (e.g., a depth graph view for Sea Bird data and a fly-through visualization of Triaxis and other water quality and biological data in Google Earth using kml).
  3. Create a Great Lakes data archive and provide an archive view leveraging the NCSA Medici project that enables upload, automated simple analytics, indexing, searching and tagging.  
  4. Establish a server hosting environment that will scale with the community needs as data requirements grow.

The second objective of Phase II is to develop algorithms for the Triaxis and Seabird data to enable anomaly detection and adaptive observation; the algorithms would be implemented in the dashboard system in Phase III. The anomaly detection algorithms will enable more efficient quality control of the Triaxis and Seabird data by suggesting which data in these large datasets are most in need of human review as they significantly deviate from trends in historical data. Researchers will test existing algorithms with these datasets and then extend the algorithms to enable anomaly detection of image data, which has not been attempted previously.  These anomaly detection algorithms(1) will also be used to explore the potential for adaptive observation with the Triaxis data. Such an approach would enable the ship towing the Triaxis instrumentation to receive near-real-time assessment of interesting anomalies in the data (“events”) that may warrant further data collection, such as a developing algal bloom or high nutrient or sediment fluxes from the shore or rivers. It would also enable synchronous timing of grab sample collection (e.g., for phosphorus) during these events.

The collaboration between EPA, Sea Grant, and NCSA/UIUC researchers will enable a novel data archive, informatics, and visualization environment that will be readily available to the public and scientific communities through a Web browser.  The enhancements to the system will be based on lessons learned from Phase I and the continual feedback from the EPA and Sea Grant team members and outreach activities.    

The project references include:

 

...