Generalizing Workbench

Thoughts on generalizing workbench as Project X based on recent discussions.

Use Cases

Education and training

One of the clearest proven uses of the platform is for education and training purposes. Labs Workbench was used for:

IASSIST 2016 for a workshop on integrating Dataverse and iRODS
NDSC6 for a workshop for the development of Docker containers
Phenome 2017 for a workshop on how to use the TERRA-REF reference data platform and tools
Planned iSchool pilot for data curation educators
Possible deployment by Big Data Hub on commercial provider, such as Microsoft Azure, AWS, GCE

Each environment is unique, but there are a few basic requirements:

Custom catalog only of the tools needed for the training environment
User accounts that can be created without requiring registration (e.g., batch import)
Authentication that ties to existing systems (e.g., Shibboleth, Oauth)
Short term scalable resources (e.g. 40 users, 4 hours) as well as longer-term stable resources (11 weeks, 24x7, maintenance allowed)
Custom documentation and branding/skinning
Custom data, API keys, etc accessible by users
Configurable quotas (not one-size fits all)
Ability to deploy a dedicated environment, scale it up, and tear it down. At the end of the workshop/semester, access can be revoked.
Ability to backup/download data
Ability to deploy system under a variety of architectures
Ability to host/manage system at NDSC/SDSC/TACC
Security/TLS/vulnerability assessment

Scalable analysis environment

We can also envision the platform working as a replacement for the TERRA-REF toolserver or as a DataDNS analysis service. In this case, the requirements are:

Custom catalog of tools supported for the environment.
User accounts that can be created without requiring registration (API)
Authentication that ties to existing systems (e.g., Shibboleth, Oauth)
Long-term stable and scalable resources. Ability to add/remove nodes as needed.
Ability to terminate long-running containers to reclaim resources
Custom documentation and branding, although the UI itself may be optional
Ability to mount data stored on remote systems (e.g., ROGER) as read-only and possibly read-write scratch space
Ability to add data to a running container, retrieved from a remote system?
Clear REST API to
- List tools; list environments for a user; launch tools; stop tools;
Security/TLS/vulnerability assessment

Platform for the development and deployment of research data portals

Another use case, really a re-purposing of the platform, is to support the development and deployment of research data portals – aka, the Zuhone case. Requirements include:

Ability to develop data portal using common tools.
Ability to deploy data portal services "near" datasets (e.g., ythub).

General requirements

Backup
Monitoring

Current features/components

Deployment (OpenStack)

We currently have two methods of deploying the NDS Labs service: 1) ndslabs-startup (single node) and 2) deploy-tools (multi-node OpenStack)

The ndslabs-startup repo provides a set of scripts to deploy NDS Labs services to a single VM. This is intended primarily for development and testing. The deployment is incomplete (no shared storage, NRPE, LMA, backup), but adding these services would be minor.

The deploy-tools repo/image provides a set of Ansible scripts specifically to support the provisioning and deployment of a Kubernetes cluster on OpenStack with a hard dependencies on CoreOS and GlusterFS. It's unclear whether this can be replaced by openstack-heat.

For commercial cloud providers, we cannot use our deployment process. Fortunately, these services already have the ability to provision Kubernetes clusters: AWS, Azure, and GCE.

Minikube was considered as an option, but is problematic when running on VM.

CoreOS (Operating system)

The current system relies on CoreOS. This choice is arbitrary, but there are many assumptions in the deploy-tools component that are bound to the OS choice. Different providers make different OS decisions. Kubernetes seems to lean toward Fedora and Debian. GCE itself is Debian. Azure Ubuntu, etc.

Docker (Container)

The Labs Workbench system assumes Docker, but there are other container options. Kubernetes also supports rkt. This is something we've never explored.

Orchestration (Kubernetes)

Labs Workbench relies heavily on Kubernetes itself. The API server integrates directly with the Kubernetes API. Of all basic requirements, this seems to be one that's unlikely to change.

Gluster FS (Storage)

Labs Workench uses a custom Gluster FS solution for shared storage. A single Gluster volume is provisioned (4 GFS servers) and mounted to each host. The shared volume is accessed via hostPath by containers.

This approach was necessary due to lack of support for persistent volume claims for OpenStack. For commercial cloud providers, we'll need to re-think this approach. We can either have a single volume claim (giant shared disk), volume claim per user, or volume claim per application. There are benefits/weaknesses in all of these approaches. For example, in a cloud provider, you don't want to have a giant provisioned disk with no usage. The per account approach may be better.

REST API Server

Web UI

Ingress Controller

Backup

Monitoring

Application Catalog

Development/Analysis Environments

Phone home

CoreOS
Docker
Kubernetes
Gluster
Deploy tools/NDS Labs Startup
API Server
Web UI
Ingress controller
Backup
Monitoring/NRPE
Specs
Development environments
Docker registry cache

Space shortcuts

Page tree