Overview

This page documents the preliminary requirements and design of the NDS Labs Workbench resource limit strategy. The NDS Labs Workbench is based on Kubernetes and GlusterFS, so we'll leverage their internal resource limit and enforcement mechanisms. Kubernetes supports limits and quotas on compute resources (CPUs and memory) as well as object counts (services, replication controllers, pods, secrets, etc).  GlusterFS provides mechanisms to enforce storage quotas through explicit quotas or allocated volume size.

Why specify and limit resources?

There are several reasons to enforce resource limits in the NDS Labs Workbench:

  1. The Kubernetes system requires the specification of Pod resource requirements for effective scheduling. To enable the specification of resource requirements, the resource quotas must be enabled system-wide.  (Note: it would be possible to grant all projects full access to all system resources if desired, this would lead to competition).
  2. We want to prevent individual NDS Labs Workbench projects from exhausting all of the system resources, introducing instability in other projects. This means enforcing quotas on projects.

Enforcement options

We've discussed the following options for enforcing quotas:

Enforcement pointDescriptionProCon
During configurationDo not let the user add a stack if running the stack would exceed limitsEasy to understand

We can't know actual resource usage in advance

User might need to delete existing stacks to start new stacks

 

Before launchAllow the user to configure the stack, but not start the stack if it would exceed limits

Allows user to have configured stacks and decide which to run

Easy to understand

We can't know actual resource usage in advance


Launch/runtimeAllow the user to launch the stack, if resources aren't available during or after launch, the stack enters an error stateConstrains resources based on actual runtime requirementsServices may fail unexpectedly

 

Since it isn't possible to know exact usage requirements before launch, we have decided to enforce quotas during runtime.  If launching the stack exceeds quotas, the stack will fail to launch and the user will be notified.  

Kubernetes: Limit ranges, Requests, and ResourceQuotas

Kubernetes defines three concepts related to the implementation and enforcement of resource limits: resource quotas, limit ranges s, and requests. Resource quotas are enabled at the namespace level and define hard limits onresources available to the namespace. Once enabled, all Pods must specify requests for each of the resources quotas that are enabled.  If no request is specified, then Pod creation will fail (403 Forbidden).  

Default values can be specified at the namespace level using LimitRanges.  If a LimitRange is specified on the namespace, then Pods will have resources requests set to the default limit values.

Individual Pods can specify limits (upper bounds of resource requirements) and requests (resource increments). By default, Pods run with unbounded CPU and memory limits. These quotas are independent of actual cluster capacity. 

Resource quotas and limits must be enabled during Kubernetes cluster startup through the --admission-control flag.  

Resource Limits

ResourceImplementation
MemoryKubernetes ResourceQuotas, Limits, Requests
CPUKubernetes ResourceQuotas, Limits, Requests
ObjectsKuberenetes ResourceQuotas
StorageEither through hard limits on allocated resources (e.g., users are limited to volume size) or directory quotas

Note units: https://github.com/kubernetes/kubernetes/blob/master/docs/design/resources.md

  • CPUs are in milli-core units (1 core = 1000m).  The default value of cpu=1 is "best effort"

Usage information

  • CPU and Memory usage are available via the Grafana API, which are included in the Kubernetes contrib/ansible deployment
  • Storage usage information is available via GlusterFS

Requirements

  1. Cluster administrator can specify memory, CPU, and storage quotas at the project level
  2. Cluster administrator can specify default limits for Pods system-wide
  3. Cluster administrator can update quotas
  4. Quotas/limits are deleted when the project is deleted
  5. Project administrator can view project quotas
  6. Project administrator can view memory, CPU, storage usage
  7. Service developer can implement resource requests for Pods
  8. Service developer can run a tool to capture/estimate resource requirements for a service
  9. Cluster ops can monitor cluster-wide resource utilization
  10. Test services for exercising limits
    1. Exceed memory limits, exceed CPU limits, exceed Object limits
    2. Both internally (single pod/container) and adding a resource that exceeds
    3. What happens if Pod killed due to OOM?
  11. Will need to have better error handling for failure conditions – what happens when the stack fails to deploy because a service fails to start?
  12. The API server will monitor all Pods to track changes to status and notify site and project administrators.

Implementation

Project Limits

During the POST or PUT of a project, ResourceQuota and LimitRange objects are created for the namespace. The Project object has been modified to include resourceLimits.

  • cpuMax:  Maximum CPU for this project
  • cpuDefault: Default CPU for Pods without resource requests for this project
  • memMax: Maximum total memory for this project
  • memDefault: Default memory for Pods without resource requests for this project
  • storageQuota: Storage allocation

{
    "id": "demo",
    "name": "demo project",
    "description": "demo project description",
    "namespace": "demo",
    "password": "12345",
	"resourceLimits": {
		"cpuMax": "2",
		"cpuDefault": "1",
		"memMax": "8Gi",
		"memDefault": "100Mi",
		"storageQuota": "10Gb"
	}
}


The above project specification results in the following ResourceQuota and LimitRange objects created for the namespace:

{
    "kind": "ResourceQuota",
    "apiVersion": "v1",
    "metadata": {
        "name": "quota",
    },
    "spec": {
        "hard": {
            "cpu": "2",
            "memory": "8Gi"
        }
    },
}
{
    "kind": "LimitRange",
    "apiVersion": "v1",
    "metadata": {
        "name": "limits",
    },
    "spec": {
        "limits": [
            {
                "type": "Container",
                "default": {
                    "cpu": "1",
                    "memory": "100Mi"
                }
            }
        ]
    }
}

 

Resource requests by services

The service developer will be required to specify resource requirements for their services. Project administrators will be able to override these values during service configuration. 

Individual Pods have both requests and limits. Pods will be terminated when memory limits are reached and may be terminated if they exceed CPU limits.  

The following example specification illustrates service-level resource limits and requests. These translate directly into Pod limits and requests: 

{
    "label": "Clowder"
	"resources": {
    	"limits": [
			{ 
		  		"cpu": "250m",
  		  		"memory": "1G"
    		}
		],
		"requests": [
			{
				"cpu": "100m",
				"memory": "128Mi"
			}
		]
	}
}

 

Estimating resource requirements

Requiring service developers to provide resource requests and limits means that we need to provide tools to enable them to estimate limits.  We should also provide some level of logging to indicate when services are nearing or exceeding limits, or failing.

To estimate resource requirements for a service, service developers may use the following tools:

  • docker stats
  • sysdig/csysdig
  • /sys/fs/cgroup/memory/
  • /sys/fs/cgroup/memory/memory.usage_in_bytes

The system should track when services near or exeed limits to notify the user.  This will require constant monitoring of running containers with associated alerts. The API server will start a monitoring process to keep track of the status of all running pods systemwide.

When things fail

Once limits are enforced, services fail during creation or runtime if they reach the specified limits. Kubernetes generally puts pods into the pending state, waiting for resources to free up or change to allow a process to run. We have several test cases:

  • Pod exceeds available memory during create
  • Pod exceeds available CPU during create
  • Pod exceeds available memory at runtime (out of memory error)
  • Pod exceeds available CPU during runtime

Unfortunately, Kubernetes does not provide a single mechanism for monitoring resource status. Each resource type (service, replication controller, pod) has a "watch" interface. Additionally, status information is available via Kubernetes events.


Watching resource changes

Implementing resource limits requires that we handle unexpected failures, which means monitoring Kubernetes events for resource status changes.  For example, if the user launches a stack that exceeds their memory allocation, the Pod will fail to start – hanging in a "Pending" state until memory is freed.  While the Pod create succeeds, it will not enter the ready state.

Kubernetes events are available via the "watch" methods. We are specifically interested in Services, Replication Controllers, Pods, and Events:

  •  localhost:8080/api/v1/watch/pods
  •  localhost:8080/api/v1/watch/services
  •  localhost:8080/api/v1/watch/replicationcontrollers
  •  localhost:8080/api/v1/watch/events

The "start stack" process launches services, replication controllers, and pods. 

During initialization, the NDSLabs API server will start a set of threads to monitor the various event channels and update stack status accordingly, asynchronously. 

 

Pod Events

TypePhaseReadyStack service status
ADDED  starting
DELETED  stopped
MODIFIEDrunningtruestarted
MODIFIEDpending starting

 

Events

Involved ObjectTypeReasonStack service statusNotes
PodNormal  Any status message is set on the StackService, but otherwise the status us unchanged with "Normal" events.
PodWarning

Unhealthy

MissingClusterDNS

FailedSync

<ignored>These events all occur during Pod startup and are ignored.
PodWarningAll othererrorExamples include BackOff, FailedScheduling
Replication ControllerWarning error 


Examples:

CaseReasonMessage
Test - invalid imageFailedFailed to pull image "xyzzy": Error: image library/xyzzy not found
 BackOffBack-off pulling image
Test - memory hog containerStatuses.state.terminated.reason = "OOMKilled"
Test - too bigFailedCreateExceeded quota: quota

 

Final points:

  • When all stack services have status=ready, then the stack status will be "started"
  • When all pods have been deleted, then the stack status will be "stopped"
  • If a stack service is in error, the stack status is "error"

Open issue:

  • One problem we'll need to address is when a Pod remains in the pending state for a long time. Kubernetes does not stop pods, since it is simply waiting for resources to be freed. Ideally, we'd watch the event logs for specific errors, putting the pod and stack in an error state. In the start and stop stack methods, instead of polling for the pods to be in a ready state, we can simply poll for the stack service to be in the ready state.

 

Quality of Service

Kubernetes does not yet have QoS features. 

References

  • No labels