Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Prototype status:

  • Working nginx LB with kubernetes ingress controller integration
  • LB runs under kubernetes as a system-service
  • Instructions/test harnesses in
  • The LB is unopinionated - it works at the system level with any K8s service, as long as the service conforms to standard K8s network model.   The requirements below are specific to NDSLabs test-drive/workbench but the LB is general-purpose and supportive of test-drive/workbench if test-drive/workbench are standard K8s services - assumed to be true.
  • Tested with Vhost and path routing - basic testing not thorough
  • Ingress interface based on K8s 1.2.0-alpha release - needs update
  • Vhost/path routing verified

Tasks required for Production Deployments:

  1. Test LB prototype with test-drive interfaces/specs - path based for odum 
    1. Jira
      serverJIRA
      serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
      keyNDS-239
  2. Update go dependencies/ingress API to current production release of kubernetes -
    currently based on 1.2.0-alpha, current 160502 is 1.2.3 , should evaluate diff of 1.2.3 and 1.3.0-alpha and pick appropriately for future
    1. Jira
      serverJIRA
      serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
      keyNDS-240
  3. Update the load balancer build  - go build produces a static binary.   Build should produce image from alpine with net-tools and single static binary.
    Info on golang:onbuilds is here: https://hub.docker.com/_/golang/
    1. Jira
      serverJIRA
      serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
      keyNDS-241
  4. Addressing startup
    1. Label the LB node such that LB pod deploys there, and add anti-affinity to label/scheduler/system to avoid scheduling other pods on the LB node.
      i.e.  The ingress-lb should be the only thing running on the LB node - always
    2.  
      Jira
      serverJIRA
      serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
      keyNDS-242

 

Background

The NDS Labs "Workbench" service (under development) provides NDSC stakeholders with the ability to quickly launch and explore a variety of data management tools.  Users Users select from a list of available services to configure , and deploy instances of them. Workbench "services" are composed of a set of integrated Docker containers , all deployed on a Kubernetes cluster. 

In the following screenshot , the user has started starts a single instance of Dataverse , which includes containers running Glassfish, PostgreSQL, Solr, Rserve, and TwoRavens (Apache + R). ( The user is attached to a Kubernetes namespace and can start instances of multiple different services):

Image Modified 

Currently, remote access to running services is implemented using the Kubernetes "NodePort" mechanism. In essence, a given service (e.g., webserver) is mapped to a cluster-wide port in some configured range (default 30000-32767).  Remote users access the running service on the specified port.  In the above screenshot, the Dataverse web interface is accessible via http://141.142.210.130:30233. This solution has worked for development purposes but is 1) not scalable and 2) difficult to secure.   We are exploring options to provide a scalable and secure solution to providing access to running services in the NDS Labs workbench for 10's-100's of users working with multiple instances of services,  ii.e. hundreds of service endpoints.

Requirements

Use case:  The workbench user (project administrator) configures a service via the the workbench.  Once configured, external endpoints are accessible via TLS/SSL. 

  • Ability for the user to securely  access access NDS Labs workbench services, which include web-based HTTP and TCP services.
  • Service endpoints are secured using TLS
  • Services will be hosted on a cluster at "labs.nationaldataservice.org"

Options

Dynamic DNS-based solution

...

  • Special handling for NDS Labs workbench API server and GUI requests, including CORS support
  • Resilient to failure

Options:

...

  • Wildcard DNS entry
  • NDS-local DNS service with ability to update dynamically based on deployed services
  • Wildcard SSL certificate

...

OptionDescriptionProCon
Path based

Services accessed via URL + Path

For example, labs.nds.org/namespace/dataverse

Single DNS entry for labs.nds.org

Single SSL certificate

Simple

Only supports HTTP-based services

Requires that every deployed service support a context or load balancer must re-write requests.

Port based

Services accessed via URL + port

For example

labs.nds.org:33333

Single DNS entry for labs.nds.org

...

Single SSL certificate

Simple

Requires use of non-standard ports

Possible collisions in ports if services are stopped and started across projects (i.e., I stop my stack, port is free – you start your stack and are assigned my port, my users now access your service)

Only scales to # ports

CNAME

Services accessed via CNAME URL + Path or Port

for example

project.labs.nds.org/dataverse

project.labs.nds.org:port

One DNS entry, IP address, and certificate for each project (or possibly wildcard Cert)

Supports both HTTP and TCP services

...

  • Complex setup, requires wildcard DNS support and hosting a local DNS server

Port collisions are only within a project.

Requires IP address per project

Requires DNS/CNAME request to neteng

 

Requirements

  • When a new project is created, if the admin anticipates needing remote access to non-HTTP services, a static IP address and CNAME are assigned to the project.
  • The load balancer routes requests to services configured in Kubernetes.  This means that the LB must be Namespace and service aware – which means monitoring Etcd or the Kubernetes API for changes.
  • When a new HTTP service is added, load balancer config is updated to proxy via path
    • If no CNAME
      • paths are in the form:

Path-based solution

    Services would be access via URL path.  For example: https://
      • labs.nds.org/
  • demo
      • namespace/
  • dataverseThis would require
      • serviceId
    • If CNAME
      • paths are in the form namespace.
  • Single DNS entry for
      • labs.nds.org
  • Single SSL certificate
  • Pro:
    • Simple
  • Cons:
  • Only supports web-based services that recognized paths. For example, cannot support remote access to iRODS server via iCommands on port 1247.
  • Requires that every deployed service handle a path-based deployment, which could require custom configuration for each service.

NodePort

      • /serviceId
  • When a new TCP service is added, load balancer config is updated to proxy via port – only if project has CNAME/IP:
    • namespace.labs.nds.org:port
  • For GUI and API, paths are labs.nds.org/ labs.nds.org/api respectively
  • Load balancer must be resilient – if restarted, previous configuration is maintained.  Possibly in failover configuration.

Preliminary Design

Based on the prototype, we will move forward with the Kubernetes ingress-based nginx load balancer model. The current version from the Kubernetes contrib repo works based on preliminary tests.

  • Load balancer node: A VM node will serve as the dedicated load-balancer node and run the Nginx LB replication controller using node labels
  • Nginx ingress controller: The nginx ingress controller is deployed as a replication controller
  • DNS:
    • "A" record points to load balancer node (e.g., test.ndslabs.org A 141.142.210.172)
    • Per-cluster wildcard CNAME (e.g., "*.test.ndslabs.org. CNAME test.ndslabs.org)
  • Per-service Ingress resource:  
    • For each exposed service endpoint, an ingress rule will be created 
      • host: <stack-service-id>-<namespace>.ndslabs.org
      • path: "/"
      • backend:
        • serviceName: <service name>
        • servicePort: <service port>
    • These resources will be created/updated/deleted with the associated service
    • The <service> value in the host will be the stack service ID (e.g., srz4wj-clowder)
  • Endpoints:
    • For single-node and development instances, use NodePort
    • For multi-node cluster, use LB
  • TLS: 
    • Wildcard certificate for each cluster (*.test.ndslabs.org)
  • TCP support:
    • The nginx controller supports access to TCP services using the ConfigMap resource. ConfigMap is simply a map of keys/values that contains the exposed port and the namespace/service:port. We will need to update the ConfigMap when services are added and removed.  We will also need to handle assignment of ports. Unfortunately, the port assignments appear to be system-wide.  It might be nice if we could assign ports within a host (i.e., in the Ingress rules), but this isn't possible today.

API Server

With the implementation of 

Jira
serverJIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
keyNDS-260
, the NDSLabs API server has been modified to support the following:

  • A system-wide secret named "ndslabs-tls-secret" on the default namespace that contains the TLS certificate and key. If this secret exists, then the associated TLS cert/key are copied to per-project secrets ("<namespace>-tls-secret) during PostProject
  • At startup, the API Server supports two "Ingress" types: NodePort or LoadBalancer, configurable in apiserver.conf or, for the docker image, via the INGRESS environment variable.
  • If Ingress=LoadBalancer
    • An ingress resource is created for every service with access=external during the stack startup
    • The ingress resource consists of 
      • TLS via named secret
      • Host in the form <stack-service-id>.domain
      • Endpoint objects contain the ingress host
  • If Ingress=NodePort
    • Ingress rules are not created for stack services
    • Endpoint objects retain the previous NodePort and protocol information
  • The API Server has also been changed to support a configurable Domain (apiserver.conf) or (DOMAIN env in docker). This domain is used to construct the ingress rules.
  • Pods have been modified to include ndslabs-role=compute (NDS-264)

 

 

...

  • Already supported

...