...
- Install prerequisite software. The install methods will depend on your operating system:
- VirtualBox (or an equivalent Docker-compatible virtualization environment)
- Docker
- Python and PIP
Clone this extractor template project from the repository, substituting your extractor project name:
Code Block language bash git clone httphttps://bitbucketopensource.ncsa.illinois.edu/pathbitbucket/scm/tobd/extractorbd-templatetemplates.git <extractor project name>
- Create and activate a Python virtual environment for their new project:
Code Block language bash cd <extractor project name>/bd-extractor-templates virtualenv . source bin/activate pip install docker-compose
NOTE: To activate this Python virtual environment in the future, repeat the "source bin/activate" step.
- Start up the Extractor runtime environment using Docker Compose:
TODO: docker-compose.yml for extractor runtime environmentCode Block language bash docker-compose up ./tests.py
...
Contribute the Extractor Tool to the Brown Dog Service
JSON-LD Metadata Requirements
The DTS returns extracted metadata in the form of JSON-LD, with a graph representing the output of each extractor tool. JSON-LD scopes all data to namespaces, which help keep the various tools from using the same keys for results. For example, here is a response that includes just one extractor graph in JSON-LD:
Code Block | ||
---|---|---|
| ||
[
{
"@context": {
"bd": "http://dts.ncsa.illinois.edu:9000/metadata/#",
"@vocab" : "http://dts.ncsa.illinois.edu:9000/extractors/ncsa.cv.caltech101"
},
"bd:created_at": "Mon Mar 07 09:30:14 CST 2016",
"bd:agent": {
"@type": "cat:extractor",
"bd:name": "ncsa.cv.caltech101",
"@id": "http://dts.ncsa.illinois.edu:9000/api/extractors/ncsa.cv.caltech101"
},
"bd:content": {
"@id": "https://dts-dev.ncsa.illinois.edu/files/938373748293",
"basic_caltech101_score": [
"-0.813741"
],
"basic_caltech101_category": [
"BACKGROUND_Google"
]
}
}
] |
If your metadata is a simple key/value map, without more depth or ordered elements, such as arrays, then submitting a plain JSON dictionary will be fine. Your metadata will be processed into JSON-LD on the server-side and tagged with your extractor as the software agent. All values returned by your extractor will fall in a namespace particular to your extractor
Other Extractor Guidelines
- Handling failure gracefully
- How to use Key/Value pairs
- Different kinds of output data
- JSON-LD
- ??Handling Failure