Date: Thu, 28 Mar 2024 12:15:54 -0500 (CDT) Message-ID: <190375657.156.1711646154935@os-confluence.ncsa.illinois.edu> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_155_1510094225.1711646154935" ------=_Part_155_1510094225.1711646154935 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Collection: = A user defined group of datasets.
Comment: High level information associated with a file or= dataset left by users.
Content Based Retrieval: A means of indexing collections of d= ata where instead of indexing by text or keywords items are indexed by sign= atures and users query the collection with example files in order to retrie= ve files with similar contents.
Content Management System: A system used to store, manage, an= d curate collections of files and datasets.
Data Extraction: A transformation that creates n= ew data from the given data. An example would be the execution of ana= lysis code on an image file's contents to determine if a face is in the ima= ge. Clowder utilizes extractions to automatically generate metadata, = signatures, and previews from a file's contents and provide users with mean= s of finding, relating, and utilizing data that may be difficult otherwise.=
Dataset: A group of files that through some defined = relationship or corresponding metadata are strongly tied together and not r= epresentable otherwise by the individual files.
Extractor: A tool which takes a file, section of a file= , data set, or collection as input and through some analysis of the content= s produces some higher level information, e.g. metadata, or other derived p= roduct, e.g. preview, to aid users in searching/organizing data (both autom= atically and/or manually).
File: The lowest level unit of information that can be track= ed. This is a file from a file system.
Metadata: Simply data about data. Available on datasets = and individual files.
Preview: Special representation of a dataset or a fi= le used by a previewer to visualize information about the dataset or file o= n the web. Often used to provide a smaller version of a dataset or file &nb= sp;when bandwidth is a consideration.
Section: A subset of a files contents (e.g. a sub-image, = a line from a document, a frame from a video, etc...). A sections is = tied to a file.
Signature: A typically numerical representation for som= e semantic aspect of a files contents. This can be thought of as a ha= sh of the files contents. Various means of generating these signature= s are typically available and focus on different aspects of a files data (e= .g. color distributions in an image vs edge distributions). Signature= s are used in content based retrieval to index and find similar data to a g= iven example.
Space: A group of collections, data sets, and files with de= fined user access rights.
Tag: A short string, e.g. one or two words, associated with a= file or data set used to categorize or index its contents.
Technical Metadata: Automatically generated me= tadata produced by the system via extractors.
User Metadata: Metadata associated with file or dat= aset, entered by a human user.
Versus: A framework for decomposing content based comparis= ons into reusable parts that can be mixed and matched to meet a variety of = user needs when content based indexing and retrieval is a viable means of a= llowing users to search a collection of data.
Versus Metadata: Signatures, typically numerical = in nature, generated by versus to represent some semantic aspect of a files= contents. Used for content based retreival.