Cluster Loadbalancer

Prototype status:

Working nginx LB with kubernetes ingress controller integration
LB runs under kubernetes as a system-service
Instructions/test harnesses in
The LB is unopinionated - it works at the system level with any K8s service, as long as the service conforms to standard K8s network model. The requirements below are specific to NDSLabs test-drive/workbench but the LB is general-purpose and supportive of test-drive/workbench if test-drive/workbench are standard K8s services - assumed to be true.
Tested with Vhost and path routing - basic testing not thorough
Ingress interface based on K8s 1.2.0-alpha release - needs update
Vhost/path routing verified

Tasks required for Production Deployments:

Test LB prototype with test-drive interfaces/specs - path based for odum
1. NDS-239 - Getting issue details... STATUS
Update go dependencies/ingress API to current production release of kubernetes -
currently based on 1.2.0-alpha, current 160502 is 1.2.3 , should evaluate diff of 1.2.3 and 1.3.0-alpha and pick appropriately for future
1. NDS-240 - Getting issue details... STATUS
Update the load balancer build - go build produces a static binary. Build should produce image from alpine with net-tools and single static binary.
Info on golang:onbuilds is here: https://hub.docker.com/_/golang/
1. NDS-241 - Getting issue details... STATUS
Addressing startup
1. Label the LB node such that LB pod deploys there, and add anti-affinity to label/scheduler/system to avoid scheduling other pods on the LB node.
  i.e. The ingress-lb should be the only thing running on the LB node - always
2. NDS-242 - Getting issue details... STATUS

Background

The NDS Labs "Workbench" service provides NDSC stakeholders with the ability to quickly launch and explore a variety of data management tools. Users select from a list of available services to configure and deploy instances of them. Workbench "services" are composed of a set of integrated Docker containers all deployed on a Kubernetes cluster.

In the following screenshot the user has starts a single instance of Dataverse which includes containers running Glassfish, PostgreSQL, Solr, Rserve, and TwoRavens (Apache + R). The user is attached to a Kubernetes namespace and can start instances of multiple different services:

Currently, remote access to running services is implemented using the Kubernetes "NodePort" mechanism. In essence, a given service (e.g., webserver) is mapped to a cluster-wide port in some configured range (default 30000-32767). Remote users access the running service on the specified port. In the above screenshot, the Dataverse web interface is accessible via http://141.142.210.130:30233. This solution has worked for development purposes but is 1) not scalable and 2) difficult to secure. We are exploring options to provide a scalable and secure solution to providing access to running services in the NDS Labs workbench for 10's-100's of users working with multiple instances of services, i.e. hundreds of service endpoints.

Requirements

Use case: The workbench user (project administrator) configures a service via the the workbench. Once configured, external endpoints are accessible via TLS/SSL.

Ability for the user to securely access NDS Labs workbench services, which include web-based HTTP and TCP services.
Service endpoints are secured using TLS
Special handling for NDS Labs workbench API server and GUI requests, including CORS support
Resilient to failure

Options:

Option

Description

Pro

Con

Path based

Services accessed via URL + Path

For example, labs.nds.org/namespace/dataverse

Single DNS entry for labs.nds.org

Single SSL certificate

Simple

Only supports HTTP-based services

Requires that every deployed service support a context or load balancer must re-write requests.

Port based

Services accessed via URL + port

For example

labs.nds.org:33333

Single DNS entry for labs.nds.org

Single SSL certificate

Simple

Requires use of non-standard ports

Possible collisions in ports if services are stopped and started across projects (i.e., I stop my stack, port is free – you start your stack and are assigned my port, my users now access your service)

Only scales to # ports

CNAME

Services accessed via CNAME URL + Path or Port

for example

project.labs.nds.org/dataverse

project.labs.nds.org:port

One DNS entry, IP address, and certificate for each project (or possibly wildcard Cert)

Supports both HTTP and TCP services

Port collisions are only within a project.

Requires IP address per project

Requires DNS/CNAME request to neteng

Requirements

When a new project is created, if the admin anticipates needing remote access to non-HTTP services, a static IP address and CNAME are assigned to the project.
The load balancer routes requests to services configured in Kubernetes. This means that the LB must be Namespace and service aware – which means monitoring Etcd or the Kubernetes API for changes.
When a new HTTP service is added, load balancer config is updated to proxy via path
- If no CNAME
  - paths are in the form: labs.nds.org/namespace/serviceId
- If CNAME
  - paths are in the form namespace.labs.nds.org/serviceId
When a new TCP service is added, load balancer config is updated to proxy via port – only if project has CNAME/IP:
- namespace.labs.nds.org:port
For GUI and API, paths are labs.nds.org/ labs.nds.org/api respectively
Load balancer must be resilient – if restarted, previous configuration is maintained. Possibly in failover configuration.

Preliminary Design

Based on the prototype, we will move forward with the Kubernetes ingress-based nginx load balancer model. The current version from the Kubernetes contrib repo works based on preliminary tests.

Load balancer node: A VM node will serve as the dedicated load-balancer node and run the Nginx LB replication controller using node labels
Nginx ingress controller: The nginx ingress controller is deployed as a replication controller
DNS:
- "A" record points to load balancer node (e.g., test.ndslabs.org A 141.142.210.172)
- Per-cluster wildcard CNAME (e.g., "*.test.ndslabs.org. CNAME test.ndslabs.org)
Per-service Ingress resource:
- For each exposed service endpoint, an ingress rule will be created
  - host: <stack-service-id>-<namespace>.ndslabs.org
  - path: "/"
  - backend:
    - serviceName: <service name>
    - servicePort: <service port>
- These resources will be created/updated/deleted with the associated service
- The <service> value in the host will be the stack service ID (e.g., srz4wj-clowder)
Endpoints:
- For single-node and development instances, use NodePort
- For multi-node cluster, use LB
TLS:
- Wildcard certificate for each cluster (*.test.ndslabs.org)
TCP support:
- The nginx controller supports access to TCP services using the ConfigMap resource. ConfigMap is simply a map of keys/values that contains the exposed port and the namespace/service:port. We will need to update the ConfigMap when services are added and removed. We will also need to handle assignment of ports. Unfortunately, the port assignments appear to be system-wide. It might be nice if we could assign ports within a host (i.e., in the Ingress rules), but this isn't possible today.

API Server

The API server will need to know the following:

Path to TLS cert/key
Cluster domain name
Whether to use NodePort or LoadBalancer
Label for compute nodes (NDS-264)

Space shortcuts

Page tree

Prototype status:

Background

Requirements

Options:

Requirements

Preliminary Design

API Server