This page outlines an approach for Brown Dog to become the definitive source for scientific community curated box skills.
Architecture
When a Skill is registered with a Box account, the invocation URL is provided. This URL will resolve to an endpoint in Fence.
Extractor Invocation
The RabbitMQ message for invoking an extractor will be changed to add a new property source which can be set to:
- clowder
- box
- dataverse
- etc...
The pyclowder library will use this source property to determine how to download the file and to update metadata. The extractor may use the source property to determine how to format the metadata.
Tools Catalog
The Brown Dog Tools Catalog will be the source of scientific community curated extractors. Extractors will be categorized into Organizations and Repositories within an organization. This pattern matches GitHub and DockerHub. The organizations will reflect scientific communities which will be responsible for curating the extractors in their repos.
Different versions and configurations of extractors can be specified by the use of tags.
The tools catalog will rely on an underlying Git repo for storing extractor_info documents and keep track of versions, issues, branches, and pull requests. It will download the extractor_info.json
file to populate information on the page.
It will furnish Box enterprise admins with URLs that expose the tool as a skill. Initially they will have to copy and paste the URL. Once Box exposes management of Skills through an API this can be further automated.
Stories
Here are some initial stories to help us implement this vision:
Skills Workflow
Endpoints for Box Skills
Add unsecured endpoints to Fence for skills notifications in the form of /skills/repo/extractor?tag=tag
Source property for rabbitMq message
Add a new property to the rabbitMQ message that contains the source for the invocation (clowder, box, dataverse...)
Route Box Sills Invocations to correct Extractor
Translate the repo and extractor name to a queue name. Translate the tag into a routing key. Implement bindings to enforce routing messages by tools catalog tag.
Pyclowder downloads file from Box
Extend Pyclowder to observe the source property in the message and use the Box SDK to download the file locally to the docker container. Retain the existing functionality for Clowder sourced files.
Pyclowder uploads metadata to Box
Extend Pyclowder to make the files.upload_metadata method respect the source property and use the Box SDK to upload metadata to a box file. Retain the existing functionality for Clowder sourced files.
Tools Catalog
Hello, Tools Catalog
Create a new Play 2.6 app based on the Clusterman code base.
GitHub Social Auth
Configure Silhouette with GitHub Social Auth
Organization Page
Configure a MongoDB Collection with basic organization data (basically the name and the GitHub URL)
Display a basic organization page that includes the list of all of the repos owned by that organiztion
Repository Page
User can click on a repository link from the Organization page and see information about that repository
Link to BDFiddle to try out tool
Repo page shows a link to BDFiddle where the user can try out a tool
Tags
Repo page interrogates GitHub for list of tags and displays them
Initial Skills
Prioritise these, port to the new Pyclowder and deploy to Tools Catalog
- Langid
- DBPedia
- Census From Cell
- Handwritten Decimals
- Killed Photos
- Mean Grey
- Faces
- Eyes
- Profiles
- Closeups
- NLTK Summary
- Stanford CoreNLP
- Tesseract
- Tika
- Versus
- VLFeat