You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Next »

Brown Dog (BD) data transformation services, Data Access Proxy (DAP) and Data Tilling Service (DTS), provide data format conversions and metadata extractions from data content functionalities, respectively,  through a set of consistent and programmable REST APIs.  These services also exist as framework and are extensible by design. DAP, built on top of Polyglot framework and DTS, built on top of Clowder framework, support contribution of new BD tools from users across various research domains and making the tools available for other users to use.  A BD tool is a Brown Dog script/program that wraps any piece of code, software, library or webservice and exposes their data format conversions and/or metadata extraction or analysis capabilities. The tool is customized to fit into the ecosystem of either Polyglot framework (data format conversions) or Clowder framework (metadata extractions), and is made available through DAP/DTS APIs. 

Within the Polyglot framework, a tool is alternatively known as converter to emphasize on the data format conversion capabilities. A converter resides within a software server which hosts the third party software/library with convert functionality. The software server is the component that automates the use of conversion capabilities of any piece of code through the converter. A converter is a wrapper script, written in any scripting languages such as bash, R, Python, on any existing piece of code to leverage its convert capability. 

Similarly, a tool is known as extractor within Clowder framework to emphasize on the extraction of metadata from data content.  An extractor is a program that extracts metadata within the file content, analyzes the file’s contents and tags it according to some specific classification or criteria, etc. It resides in distributed environment such as cloud, as extraction service and listens to a message queue for extraction requests from DTS API. The extraction process is triggered based on specific file types and the metadata extracted are made available through the DTS API.

In the BD framework, extractor and converter are the extensible units, and can be contributed by researchers across different research domains. To enable contributions from research community, a Tools Catalogue was designed and implemented where a user can register and share their BD tool with other users.

How can I contribute to BD data transformation services?

  1. Identify your tool capabilities - extractions/analysis or conversions.
  2. Decide if your tool is best fit as converter within DAP or as extractor within DTS. Look into some existing examples.
  3. Download bd-template project from the NCSA opensource repository
  4. Follow the README instruction to launch a development environment
  5. Follow the README instruction to write a new extractor/converter that uses your tool, thereby converting it to BD tool.
  6. Once tested in the local development environment, push the code the open-source repository.
  7. Register the new BD tool in the tools catalogue. This will involve providing description, link to the code repository, a dockerfile, input and output file. Submit for Admin approval.
  8. On approval, you can share the tool with others.

 

  • No labels