View Source

Thoughts on generalizing workbench as Project X based on recent discussions.

Use Cases

Education and training

One of the clearest proven uses of the platform is for education and training purposes. Labs Workbench was used for:

IASSIST 2016 for a workshop on integrating Dataverse and iRODS
NDSC6 for a workshop for the development of Docker containers
Phenome 2017 for a workshop on how to use the TERRA-REF reference data platform and tools
Planned iSchool pilot for data curation educators
Possible deployment by Big Data Hub on commercial provider, such as Microsoft Azure, AWS, GCE

Each environment is unique, but there are a few basic requirements:

Custom catalog only of the tools needed for the training environment
User accounts that can be created without requiring registration (e.g., batch import)
Authentication that ties to existing systems (e.g., Shibboleth, Oauth)
Short term scalable resources (e.g. 40 users, 4 hours) as well as longer-term stable resources (11 weeks, 24x7, maintenance allowed)
Custom documentation and branding/skinning
Custom data, API keys, etc accessible by users
Configurable quotas (not one-size fits all)
Ability to deploy a dedicated environment, scale it up, and tear it down. At the end of the workshop/semester, access can be revoked.
Ability to backup/download data
Ability to deploy system under a variety of architectures
Ability to host/manage system at NDSC/SDSC/TACC
Security/TLS/vulnerability assessment

Scalable analysis environment

We can also envision the platform working as a replacement for the TERRA-REF toolserver or as a DataDNS analysis service. In this case, the requirements are:

Custom catalog of tools supported for the environment.
User accounts that can be created without requiring registration (API)
Authentication that ties to existing systems (e.g., Shibboleth, Oauth)
Long-term stable and scalable resources. Ability to add/remove nodes as needed.
Ability to terminate long-running containers to reclaim resources
Custom documentation and branding, although the UI itself may be optional
Ability to mount data stored on remote systems (e.g., ROGER) as read-only and possibly read-write scratch space
Ability to add data to a running container, retrieved from a remote system?
Clear REST API to
- List tools; list environments for a user; launch tools; stop tools;
Security/TLS/vulnerability assessment

Platform for the development and deployment of research data portals

Another use case, really a re-purposing of the platform, is to support the development and deployment of research data portals – aka, the Zuhone case. Requirements include:

Ability to develop data portal using common tools.
Ability to deploy data portal services "near" datasets (e.g., ythub).

General requirements

Backup
Monitoring

Current features/components

Deployment (OpenStack)

We currently have two methods of deploying the Labs Workbench service: 1) ndslabs-startup (single node) and 2) deploy-tools (multi-node OpenStack)

The ndslabs-startup tool provides a set of scripts to deploy NDS Labs services to a single VM. This is intended primarily for development and testing. The deployment is incomplete (no shared storage, NRPE, LMA, backup), but adding these services would be minor. Minikube was considered as an option, but is problematic when running on a VM in OpenStack and might require additional investigation.

The deploy-tools image provides a set of Ansible plays designed specifically to support the provisioning and deployment of a Kubernetes cluster on OpenStack with a hard dependencies on CoreOS and GlusterFS. It's unclear whether this can be replaced by openstack-heat. Deploy-tools has 3 parts: 1) OpenStack provision, 2) Kubernetes install, and 3) Labs components install. The OpenStack provision uses the OpenStack API and Ansible support to provision instances and volumes. The Kubernetes install is based on the contrib/ansible community tools with very minor local modifications. The Labs components install is primarily deploying Kubernetes objects.

For commercial cloud providers, we cannot use our deployment process. Fortunately, these services already have the ability to provision Kubernetes clusters: AWS, Azure, and GCE.

CoreOS (Operating system)

The deploy-tools image assumes CoreOS. This choice is arbitrary, but there are many assumptions in the deploy-tools component that are bound to the OS choice. Different providers make different OS decisions. Kubernetes seems to lean toward Fedora and Debian. GCE itself is Debian. Azure Ubuntu, etc. This may not be important if we can rely on Kuberenetes deployment provided by each commercial cloud provider.

Docker (Container)

The Labs Workbench system assumes Docker, but there are other container options. Kubernetes also supports rkt. This is something we've discussed but never explored.

Orchestration (Kubernetes)

Labs Workbench relies heavily on Kubernetes itself. The API server integrates directly with the Kubernetes API. Of all basic requirements, this seems to be one that's unlikely to change.

Gluster FS (Storage)

Labs Workench uses a custom Gluster FS solution for shared storage. A single Gluster volume is provisioned (4 GFS servers) and mounted to each host. The shared volume is accessed via hostPath by containers.

This approach was necessary due to lack of support for persistent volume claims for OpenStack. For commercial cloud providers, we'll need to re-think this approach. We can either have a single volume claim (giant shared disk), volume claim per user, or volume claim per application. There are benefits/weaknesses in all of these approaches. For example, in a cloud provider, you don't want to have a giant provisioned disk with no usage. The per account approach may be better.

Other storage includes mounted volumes for /var/lib/docker and /var/lib/kubelet.

REST API Server/CLI

Labs Workbench provides a thin REST interface over Kubernetes. Basic operations include: authentication, account management (register, approve, deny, delete), service management (add/update/remove), application instance management (add/update/remove/start/stop/logs), console access. The primary purpose of the REST API is to support the Angular Web UI. The API depends on Kubernetes API, etcd, Gluster for shared volume support, and SMTP support.

Web UI

The Web UI is an Angular JS application that interfaces with the REST API.

Application Catalog

Labs workbench provides the ability to support custom application catalogs via Github. Eventually, it may be nice to provide a more user-friendly method for adding/removing services.

Ingress Controller

Labs Workbench relies on the Kubernetes contrib Nginx ingress controller (reverse proxy) to provide access to running services including authentication. We've made only minor modifications to some of the configuration options.

We know that GCE uses a version of the Nginx controller, but it's unclear whether it's the same as the version we use.

Wildcard DNS and TLS

Labs Worbench relies on wildcard DNS (*.workbench.nds.org) to provide access to running services. For security purpose, this also requires a wildcard TLS certificate (revokable, 1 year).

For short-term deployments, TLS can be disabled (DNS is still required). It's unclear how this relates to commercial cloud providers.

Backup

A backup container is provided to backup Gluster volumes, etcd, and Kubernetes configs. This is tightly coupled to the Workbench architecture. The backup server is hosted at SDSC. We should be able to generalize this solution, if needed.

Monitoring (Nagios/Qualys)

A Nagios NRPE image is provided to support monitoring instances with some Kubernetes support. We also use the contrib addons (Grafana, etc), deployed as standard services.

Commercial cloud providers provide their own monitoring tools, e.g., GCE Monitoring.

Docker cache

The Labs Workbench system deployed via deploy-tools includes a local Docker cache to minimize network traffic for image pulls

Automated Testing

The Angular Web UI includes a facility for executing automated Selenium smoke tests.