...

The Brown Dog environment includes several services that work together to deliver the Brown Dog API. These include the Brown Dog Clowder web application, hosting which hosts the API, a RabbitMQ message broker and extractor tools running on separate hosts. Brown Dog extractors are message-based network services that process data files in response to RabbitMQ messages. The client-facing Brown Dog API receives data from a client and passes messages to the extractor message topic. Any extractors that are subscribed to the topic will receive the message and decide for themselves if they can process the data, usually by looking at the MIME type. When an extractor starts or finishes working on a file it posts status and results back onto a message queue. All communication between the Brown Dog servers and the extractor tools is handled by messaging, except for ??? ANY EXCEPTIONS ???

...

Install prerequisite software. The install methods will depend on your operating system:
1. VirtualBox (or an equivalent Docker-compatible virtualization environment)
2. Docker
3. Python and PIP

Clone this extractor template project from the repository, substituting your extractor project name:

Code Block

language	bash

git clone httphttps://bitbucketopensource.ncsa.illinois.edu/pathbitbucket/scm/tobd/extractorbd-templatetemplates.git <extractor project name>

Create and activate a Python virtual environment for their new project:
1. Code Block
  language bash
  cd <extractor project name>/bd-extractor-templates virtualenv . source bin/activate pip install docker-compose
  NOTE: To activate this Python virtual environment in the future, repeat the "source bin/activate" step.
Start up the Extractor runtime environment using Docker Compose:
1. ~~TODO: docker-compose.yml for extractor runtime environment~~
  Code Block
  language bash
  docker-compose up ./tests.py

...

Contribute the Extractor Tool to the Brown Dog Service

JSON-LD Metadata Requirements

The DTS returns extracted metadata in the form of JSON-LD, with a graph representing the output of each extractor tool. JSON-LD scopes all data to namespaces, which help keep the various tools from using the same keys for results. For example, here is a response that includes just one extractor graph in JSON-LD:

Code Block

language	js

[
  {
    "@context": {
      "bd": "http://dts.ncsa.illinois.edu:9000/metadata/#",
      "@vocab" : "http://dts.ncsa.illinois.edu:9000/extractors/ncsa.cv.caltech101"
    },
    "bd:created_at": "Mon Mar 07 09:30:14 CST 2016",
    "bd:agent": {
      "@type": "cat:extractor",
      "bd:name": "ncsa.cv.caltech101",
      "@id": "http://dts.ncsa.illinois.edu:9000/api/extractors/ncsa.cv.caltech101"
    },
    "bd:content": {
		"@id": "https://dts-dev.ncsa.illinois.edu/files/938373748293",
		"basic_caltech101_score": [
        "-0.813741"
      ],
      "basic_caltech101_category": [
        "BACKGROUND_Google"
      ]
    }
  }
]

If your metadata is a simple key/value map, without more depth or ordered elements, such as arrays, then submitting a plain JSON dictionary will be fine. Your metadata will be processed into JSON-LD on the server-side and tagged with your extractor as the software agent. All values returned by your extractor will fall in a namespace particular to your extractor

Other Extractor Guidelines

Handling failure gracefully
How to use Key/Value pairs
Different kinds of output data
JSON-LD
??Handling Failure

Page tree

Versions Compared

Old Version 5

New Version Current

Key

Contribute the Extractor Tool to the Brown Dog Service

JSON-LD Metadata Requirements

Other Extractor Guidelines

Page tree

Page History

Versions Compared

Old Version 5

New Version Current

Key

Contribute the Extractor Tool to the Brown Dog Service

JSON-LD Metadata Requirements

Other Extractor Guidelines