- This line was added.
- This line was removed.
- Formatting was changed.
The objective of this project is to construct a service that will allow for past and present, un-curated data to be utilized by science while simultaneously demonstrating the novel science that can be conducted from such data. The proposed effort will focus on the large distributed and heterogeneous bodies of past and present un-curated data. This data is often referred to in the scientific community as "long-tail data": data that would have great value to science if its contents were readily accessible. The proposed framework will be made up of two re-purposable cyberinfrastructure building blocks referred to as a Data Access Proxy (DAP) and Data Tilling Service (DTS). These building blocks will be developed and tested in the context of three use cases that will advance science in geoscience, biology, engineering, and social science. The DAP will aim to enable a new era of applications that are agnostic to file formats through the use of a tool called a Software Server which itself will serve as a workflow tool to access functionality within 3rd party applications. By chaining together open/save operations within arbitrary software the DAP will provide a consistent means of gaining access to content stored across the large numbers of file formats that plague long tail data. The DTS will utilize the DAP to access data contents and will serve to index unstructured data sources (i.e. instrument data or data without text metadata). Building off of the Versus content based comparison framework and the Medici extraction services for auto-curation the DTS will assign content specific identifiers to untagged data allowing one to search collections of such data.
The intellectual merit of this work lies in the proposed solution which does not attempt to construct a single piece of software that magically understands all data, but instead aims at utilizing every possible source of automatable help already in existence in a robust and provenance preserving manner to create a service that can deal with as much of this data as possible. This proverbial “super mutt” of software, or Brown Dog, will serve as a low level data infrastructure to interface with digital data contents and through its capabilities enable a new era of science and applications at large. The broader impact of this work is in its potential to serve not just the scientific community but the general public as a DNS for data. Ultimately the goal is to move civilization towards an era where a user’s access to data is not limited by a file’s format or un-curated collections.
- Project Information and Resources
Children Display page Project Information and Resources
Children Display page Documentation
- Use Cases
Children Display page Use Cases
- Discussions / Under Development
- Alpha Beta Information Loss
- Beta Release
- Brown Dog API (Proposed Changes)
- Chrome Extension for DTS
- CI-BER Testbed
- Data and Tools
- Data Fetching Service & Data Movement Overall
- Extractors v2
- Fence User Quotas
- JSON-LD Support
- Ideas for Cyberintegrator for BrownDog
- LIDAR Tools
- Live Examples
- Metadata Encoding
- Metadata Representation
- Metadata Schemas
- Moving Computation Towards Data
- Obtaining Extractors Information- Heartbeats approach
- Polyglot Refactoring for Dockerizing SSes
- Polyglot Refactoring based on Clowder Architecture
- Polyglot / SoftwareServer Documentation
- REST Endpoint Consistency
- Relationships between Files
- Request a type of data be supported by the DAP and/or DTS
- Story Boards
- Tool Catalog
- Tool (or service) to generate a River Profile
- VM Elasticity
- Miscellaneous Topics
Recent Space Activity