Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Clowder: The Brown Dog component responsible for extracting novel, often higher level, data from file contents (e.g. metadata, tags, signatures, and other derived products) in order to index, compare, and further analyze collections of data through a broadly accessible REST APIBuilt on top of Clowder (formerly Medici 2.0) and utilizing Versus for content based comparisons the DTS is a highly distributed and extensible service which extracts from data information by which to index, compare, and further analyze collections of data through a broadly accessible REST API Clowder is a web based research data management system designed to support multiple research domains and the diverse data types utilized across those domains.  In addition to data sharing and organizational functionality it contains three major extension points for the preprocessing, processing, and previewing of data.  When new data is added to the system, whether it is via the web front-end or through the REST API, preprocessing serving as a form of autocuration is off-loaded to cloud based extraction services that analyze the data’s contents to extract appropriate data and metadata. These extractors triggered based on the type of the data analyze the contents of the data to tag it (e.g. found flood basins in images, trees in LIDAR) and/or create lightweight web-accessible previews of large files (e.g. an image pyramid) allowing users to examine and compare the contents of one or more datasets. Complimented by a number of features supporting community based social curation, this combined raw and derived metadata is presented to the user in the Clowder web interface and utilized to navigate stored collections.

Data Conversion

Anchor
conversion
conversion
:  A transformation on digital data that largely preserves the entirety of the data.  An example in the case of Brown Dog would be a transformation of a file in one 3D file format to another 3D file format.  As file formats typically vary slightly, and the transformations themselves can be imperfect, variations can occur in the form of information loss.  However, the intent is for the resulting data to be as intact as possible.  Conversions allow one to access data more easily given that the original format is not understood or difficult to work with.  This is analogous to translating languages.

...

Unstructured Data

Anchor
unstructured
unstructured
: Data that does not have a pre-defined data model or is not organized in a pre-defined manner.  Unstructured data can be text based but can also involve sensor data or data that quantifies some physical object or phenomenon (e.g. images, video, audio, 3d models, etc.).  Such data is typically difficult to understand using traditional computer programs.  Images are a good example of this.  To a computer images are nothing more than an array of numbers representing pixel intensities or colors.  Though images are extremely informative to us as human beings, for a computer to make any use of them some form of pre-processing must be run.  An example would be to use computer vision to recognize faces within the image and then spit out their locations as numerical values and a textural tag identifying these areas as faces.  With information such as this a computer is then more readily able to carry out a search or other process involving the contents of such data.

 

...

Clowder Definitions