Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Thoughts on generalizing what we currently call  "Labs Workbench"  as a general platform that can be used to support multiple distinct use cases.

Table of Contents

Potential Use Cases

NDS Labs Workbench

...

  • Custom catalog of tools supported for the environment as well as user-defined catalogs
  • User accounts that can be created without requiring registration (API). For example, shared auth with Clowder for TERRA-REF.
  • Authentication that ties to existing systems (e.g., Shibboleth, Oauth)
  • Long-term stable and scalable resources. Ability to add/remove nodes as needed.
  • Ability to terminate long-running containers to reclaim resources
  • Custom documentation and branding, although the UI itself may be optional
  • Ability to mount data stored on remote systems (e.g., ROGER) as read-only and possibly read-write scratch space
  • Ability to add data to a running container, retrieved from a remote system?
  • Clear REST API to
    • List tools; list environments for a user; launch tools; stop tools;
  • Security/TLS/vulnerability assessment

 

Platform for the development and deployment of research data portals

Another use case, really a re-purposing of the platform, is to support the development and deployment of research data portals – aka, the Zuhone case. In this case we have something like workbench to develop and test services, with the ability to "push" or "publish", which is still a bit unclear.

...

  • Ability to develop data portal using common tools (e.g., development environments or portal toolkits)
  • Ability to deploy (host) data portal services "near" datasets (e.g., ythub)
  • Security (TLS)
  • Monitoring (Nagios)
  • Custom DNS entries (gcmc.hub.yt)
  • Optional authentication into portal services (i.e., restricting access to service from end users)

Other: Working with Whole Tale

Whole Tale will also support launching Jupyter and R notebooks, but is more focused on 1) bringing data to the container and 2) reproducibility. Datasets are registered via Girder. Authentication is handed via Globus auth. User's home directory will be iRODS. WT will be deployed at NCSA and TACC. Containers will be time-constrained and launched at the site with the right data. A key component is the Data Management System which handles caching data locally and exposing via fuse filesystem to the container (and therefore handling permissions).  They are hoping to leverage Labs Workbench – or at least Kubernetes – for container management.

  • Is there a way that WT can leverage Workbench to launch remote containers? At the very least, relying on a Kubernetes federation?

Other: Cyverse (work in progress)

Another case coming out of the Phenome conference is the possibility of using workbench to provide R/Jupyter support for Cyverse:

"I am very interested in setting up the CyVerse DataStore with iRODS on the Workbench. CyVerse has been talking for months about integrating Jupyter and RStudio into our ecosystem. The Labs workbench appears to be just the sort of thing we (or at least, I) need."

The Cyverse Data Store supports iRODS iCommands, FUSE,  or an API (http://www.cyverse.org/data-store). We can envision several approaches: 1) Workbench mounts the Cyverse data directly; 2) Workbench mounts data via iRods; 3) Workbench retrieves data via API. 

Requirements might include:
  • Ability to install Labs Workbench at Cyverse or 
  • Ability to use Labs Workbench to access Cyverse data
  • Start an R or Jupyter container that can access the Cyverse data store via iRODS
    • Data mounted directly (local install at Cyverse)
    • Data transfered via iRODS
  • Ability to handle Cyverse authentication

Other: Collaborative Development Cloud (work in progress)

One issue that has come up recently on the KnowEnG UI development is the need for TLS-protected development instances with basic auth in front. Since we offer a slew of development environments with built-in TLS and basic auth, this seemed like a natural fit.

We also offer Jenkins CI. ARI already has set up for some of the KnowEnG folks, but could help other similar teams gain experience with setting up their own CI, and even testing applications that they could develop from within Labs. I played around over the weekend, and discovered that there are also several GitLab and Atlassian suite (JIRA + Confluence) images floating around that might be usable from within Labs.

Given the above, we have the potential to offer the following powerful combination of tools for any team of collaborating developers:

  • Private Source Control (via GitLab)
  • Project Documentation / Internal Development Wiki (via Confluence)
  • Ticketing workflow system (via JIRA)
  • Continuous Integration (via Jenkins CI)
  • Development Environments for several popular languages (via Cloud9 and friends - with the potential to add more)

True, you could outsource for any one of these (Atlassian provides the first three), but Labs is the only place I can think of where you could get them all! (wink)

Pros:

  • development-in-a-box: give teams all the tools they need to succeed right away
  • no need to remember 10 different URLs (if development teams shared a project in Labs) - access all of your development tools from one place!
  • automatic TLS with basic auth protecting all services (albeit self-signed, unless you have a CA)
  • quickly spin up new team members without spending a week installing dependencies and preparing environments

Cons:

  • full disclosure: I made this use case up... I have no idea if this is a real need that is unmet
  • storage is flaky, and hosting a source-code repository or ticket backlog directly violates my original ideology
    • "DO NOT store anything critical on Workbench. Storage is volatile and may go away at any point - save a hard copy externally."
  • requires that any service developed be runnable from within Labs, or else testing your code becomes more difficult than on a VM
    • currently: this would require that all services run from Labs (as well as, by extension, all things developed in Labs) be available via Docker Hub, which is may be too public for KnowEnG / ARI's current licensing needs

Other: Workflow Orchestration (work in progress)

See 

Jira
serverJIRA
serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
keyNDS-664

Another need that has come up on the KnowEnG project is the ability to run a cluster of compute resources for scheduling analysis jobs. These jobs come in the form of a DAG (directed acyclical graph) and are effectively Docker containers with dependencies. Since the API server already contains much of the logic to talk with etcd and Kubernetes, it might not be so difficult to extend Workbench to run these types analysis jobs.

Our "spec" architecture is already set up to handle running dependent containers and ensuring that they are running before continuing on to the next containers in the chain. If we were to add a flag (i.e. type == "job") to the specs, that could signal to the API server to run a job, instead of a service/rc/ingress, and to wait for the job to be "Completed" before running the next dependency.

I created a simple example of a Job spec YAML on raw Kubernetes just to see how a multi-container job would run and be scheduled. Apparently multiple Jobs can be scheduled at once, containing multiple containers. Each container within the Job will run sequentially (in the order listed in the spec).

I still need to ask for an example of a real life example of both a simple and a complex DAG to gather more details and create a more realistic prototype. We had previously discussed investigating Kubernetes to handle the scheduling, but we decided to look into BD2K's cwltoil framework instead.

Pros:

  • seems relatively small-effort to extend Labs in this way
  • more control over the scheduler than with raw Kubernetes, with direct access to the developers (ourselves)
  • we offer a user interface, which toil and kubernetes do not (aside from the mesos / kubernetes dashboard, which are fairly limited)

Cons:

  • BD2K created cwltoil, and KnowEnG is a product out of the BD2K, so we miss out on a political win by using Labs
  • toil was created for exactly this purpose: scalable DAG / CWL jobs
  • toil would allow us to run jobs using the CGC's CWL system
  • still some kinks in our platform (actual bugs, storage, commercial cloud deployment is not formalized, etc)

Current features/components

...

The deploy-tools image assumes that you are deploying CoreOS instances. This choice is arbitrary, but there are many assumptions in the deploy-tools component that are bound to the OS choice. Different providers make different OS decisions.  Kubernetes seems to lean toward Fedora and Debian. GCE itself is Debian. Azure Ubuntu, etc. This may not be important if we can rely on Kuberenetes deployment provided by each commercial cloud provider.

...

Other storage includes mounted volumes for /var/lib/docker and /var/lib/kubelet.

Dedicated etcd

We no longer rely on the Kubernetes etcd service, and provide our own that runs within the cluster.

SMTP Server / Relay

We now provide an in-cluster SMTP relay that can be configured to use Google credentials. This makes it very simple to use your Google credentials to send verification / approval / support e-mails.

REST API Server/CLI

Labs Workbench provides a thin REST interface over Kubernetes. Basic operations include: authentication, account management (register, approve, deny, delete), service management (add/update/remove), application instance management (add/update/remove/start/stop/logs), console access. The primary purpose of the REST API is to support the Angular Web UI. The API depends on Kubernetes API, etcd, Gluster for shared volume support, and SMTP support.

Web UI

The Web UI is an Angular JS a monolithic AngularJS application that interfaces with the REST API.

...

The Labs Workbench system deployed via deploy-tools includes a local Docker cache to minimize network traffic for image pulls

Private Docker registry

The Labs Workbench system deployed via deploy-tools includes a private Docker registry to privately share images within your cluster without needing to share them out to Docker Hub

  • This will need to be tested

Automated Testing 

The Angular Web UI includes a facility for executing automated Selenium smoke tests.

...

  • Deployment process needs to be generalized to support more environments than OpenStack and likely more OSes than CoreOS
  • Volume/storage will not work on cloud providers
  • Potentially allow for Web UI customization
  • Better custom catalog support
  • Confirm ingress (including DNS/TLS) support with Commercial providers
  • We should work towards adding options for some of the above components, to reduce minimal deployment size

Other thoughts

  • TERRA-REF case:
    • We can imagine a couple of cases.  First, TERRA-ref as a full install with a system catalog and user catalogs.  Second, TERRA-REF as a user of the current system with a custom catalog and no individual user namespaces.  We could also have a TERRA-REF data provider to get data into containers.
  • Cyverse case:
    • Similarly, we can imagine a Cyverse user launching notebooks in the current system with a Cyverse data provider (FUSE, iRODS, etc)
    • Or a full install of Workbench by Cyverse