This document sets out a revised roadmap for Brown Dog and Clowder based on ongoing discussions related to the Tools Catalog, deployment, and queue bindings. The basic idea is to emphasize Tools Catalog as a global resource, much like Docker Hub. This will be achieved by making it easy to deploy tools from the Catalog to local instances of Clowder, Polyglot, or, indeed Brown Dog.

By doing this, we will encourage researchers to put their tools into the central catalog and spread usage of tools throughout the community. Just as with Docker Hub, we propose the addition of private repos in the Tools Catalog to allow researchers to put their private Tools in the catalog for use in their team without having to disclose confidential software.

Tools Catalog as Global Resource

We wish to encourage the research community to use the Brown Dog Tools Catalog as a central resource for all tools. This means creating conditions where researchers do not need to run their own local Tools Catalog. By doing this, we make sure there is a large community of Tools and users.

Exploring the Tools Catalog

As the number of tools in the catalog grows, we will need to provide better tools for exploring and finding tools that meet a researcher's needs.

Here are some ideas to meet this need:

Directory of Tool Categories

Create a meaningful directory of categories that tools can be assigned to. Allow browsing of the catalog via this directory. Create a process where users can suggest new categories.

User Reviews

Create a community around each tool by allowing authenticated users to post reviews and ratings for the tool. Show ratings and review metrics on tool search results.

Related Tools...

Create a measure of tools that are related to one that the user is viewing. This could be based on groups of tools people frequently deploy together or keyword matching. The idea is to expose users to new tools they may find useful.

Try it out...

Move some of the BD-Fiddle functionality to the tool page so users can try a tool out the tool with a sample of their own data.

New Level to Tool Hierarchy: Repo

To make managing a global collection of tools, we should add a new level to the tools hierarchy- Repo. Repo names have to be globally unique, but tool names only have to be unique within the repo.

Repos are managed by one or more Brown Dog accounts. It is the responsibility of those accounts to review and publish tools and scripts within their repo.

Private Repos

By default Repos will be public, meaning any unauthenticated user can browse the repo and see those tools in the catalog. Some researchers may wish to share tools in their team, but not make them public. In order to support them with the central tools catalog, we will allow researchers to create private repos where they can manage access. 

Private repo owners can manage which Brown Dog accounts can have access.

Remove Level from Tool Hierarchy: Script

The difference between Tool and script is not helpful and makes the catalog more cumbersome to navigate. Let's make everything a tool. We can use Repos to relate scripts to each other. For example, there could be an ImageMagik repo with all of the related tools, or else the tools could simply be related by virtue of the category. More discussion needed on this.

Deployment is a Local Concern

Once we make the Tools Catalog a global resource we need to make it easy for a local administrator to choose a tool and deploy it to their cluster. This makes it easy for researchers to try out tools without needing their own Brown Dog installation. It also helps to generate a global community around the central tools catalog.

Each service will have its own deployment workflow. We'll attempt to capture the most popular ones.

Deploy to a local Brown Dog Cluster

Groups deploying entire Brown Dog suite can use a new Cluster Management tool to manage deployments and queue bindings across the entire cluster of Clowder and Polyglot. This will mean building a new tool for Brown Dog that will be for managing clusters.

We will move all of the deploy functionality out of the Tools Catalog and put it all in this cluster man tool. It will rely on the Tools Catalog REST api to retrieve data about tools, and interact with configured clusters to:

  1. Deploy tools to a cluster
  2. Amend min/max instance settings
  3. Pause a tool
  4. Delete a tool
  5. Manage queue bindings
  6. Refresh tool versions
  7. View deployed tools and some metrics

Supporting Local Clowder Deployments Across Projects

We will make deploying to a local Clowder as easy as installing a plug-in to IntelliJ. User can copy a repo/tool reference from the tool catalog and just paste it into their Clowder UI deploy form.

This will mean migrating Clowder to a Docker Swarm deployment and adding a deployment page to the UI.

We will add queue binding functionality to the Clowder UI as well.

  • No labels

1 Comment

  1. Benjamin Galewsky, looks great!  Can you add something about publishing tools?  Are you familiar with this concept and things like Zenodo?  These efforts like NDS, RDA, Big Data Hubs, are trying to push for data and software to be first class citizens in terms of academic credit, something a researcher can put on their CV towards tenure (in addition to just paper publications).  The idea is that these are important artifacts of scientific research and they are largely lost today, for a number of reasons, and we need to save these.  Things like Zenodo do this by associating a Digital Object Identifier (DOI) with things like a github repo, allowing others can cite the software by citing the DOI.  Can we add a "Publish" button to each tool that connects to Zenodo (or something like it) to get a DOI that is then displayed with the tool?  Would like to use this to help 1) drive this movement, 2) add another benefit to adding a tool, specifically something that can be added to a CV (its a BD tool with DOI X and its called 100 times each month by the community, etc).