Resource Limits and Usage

Overview

This page documents the preliminary requirements and design of the NDS Labs Workbench resource limit strategy. The NDS Labs Workbench is based on Kubernetes and GlusterFS, so we'll leverage their internal resource limit and enforcement mechanisms. Kubernetes supports limits and quotas on compute resources (CPUs and memory) as well as object counts (services, replication controllers, pods, secrets, etc). GlusterFS provides mechanisms to enforce storage quotas through explicit quotas or allocated volume size.

Why specify and limit resources?

There are several reasons to enforce resource limits in the NDS Labs Workbench:

The Kubernetes system requires the specification of Pod resource requirements for effective scheduling. To enable the specification of resource requirements, the resource quotas must be enabled system-wide. (Note: it would be possible to grant all projects full access to all system resources if desired, this would lead to competition).
We want to prevent individual NDS Labs Workbench projects from exhausting all of the system resources, introducing instability in other projects. This means enforcing quotas on projects.

Enforcement options

We've discussed the following options for enforcing quotas:

Enforcement point

Description

Pro

Con

During configuration

Do not let the user add a stack if running the stack would exceed limits

Easy to understand

We can't know actual resource usage in advance

User might need to delete existing stacks to start new stacks

Before launch

Allow the user to configure the stack, but not start the stack if it would exceed limits

Allows user to have configured stacks and decide which to run

Easy to understand

We can't know actual resource usage in advance

Launch/runtime

Allow the user to launch the stack, if resources aren't available during or after launch, the stack enters an error state

Constrains resources based on actual runtime requirements

Services may fail unexpectedly

Since it isn't possible to know exact usage requirements before launch, we have decided to enforce quotas during runtime. If launching the stack exceeds quotas, the stack will fail to launch and the user will be notified.

Kubernetes: Limit ranges, Requests, and ResourceQuotas

Kubernetes defines three concepts related to the implementation and enforcement of resource limits: resource quotas, limit ranges s, and requests. Resource quotas are enabled at the namespace level and define hard limits onresources available to the namespace. Once enabled, all Pods must specify requests for each of the resources quotas that are enabled. If no request is specified, then Pod creation will fail (403 Forbidden).

Default values can be specified at the namespace level using LimitRanges. If a LimitRange is specified on the namespace, then Pods will have resources requests set to the default limit values.

Individual Pods can specify limits (upper bounds of resource requirements) and requests (resource increments). By default, Pods run with unbounded CPU and memory limits. These quotas are independent of actual cluster capacity.

Resource quotas and limits must be enabled during Kubernetes cluster startup through the --admission-control flag.

Resource Limits

Resource	Implementation
Memory	Kubernetes ResourceQuotas, Limits, Requests
CPU	Kubernetes ResourceQuotas, Limits, Requests
Objects	Kuberenetes ResourceQuotas
Storage	Either through hard limits on allocated resources (e.g., users are limited to volume size) or directory quotas

Note units: https://github.com/kubernetes/kubernetes/blob/master/docs/design/resources.md

CPUs are in milli-core units (1 core = 1000m). The default value of cpu=1 is "best effort"

Usage information

CPU and Memory usage are available via the Grafana API, which are included in the Kubernetes contrib/ansible deployment
Storage usage information is available via GlusterFS

Requirements

Cluster administrator can specify memory, CPU, and storage quotas at the project level
Cluster administrator can specify default limits for Pods system-wide
Cluster administrator can update quotas
Quotas/limits are deleted when the project is deleted
Project administrator can view project quotas
Project administrator can view memory, CPU, storage usage
Service developer can implement resource requests for Pods
Service developer can run a tool to capture/estimate resource requirements for a service
Cluster ops can monitor cluster-wide resource utilization
Test services for exercising limits
1. Exceed memory limits, exceed CPU limits, exceed Object limits
2. Both internally (single pod/container) and adding a resource that exceeds
3. What happens if Pod killed due to OOM?
Will need to have better error handling for failure conditions – what happens when the stack fails to deploy because a service fails to start?
The API server will monitor all Pods to track changes to status and notify site and project administrators.

Implementation

Project Limits

During the POST or PUT of a project, ResourceQuota and LimitRange objects are created for the namespace. The Project object has been modified to include resourceLimits.

cpuMax: Maximum CPU for this project
cpuDefault: Default CPU for Pods without resource requests for this project
memMax: Maximum total memory for this project
memDefault: Default memory for Pods without resource requests for this project
storageQuota: Storage allocation

{
    "id": "demo",
    "name": "demo project",
    "description": "demo project description",
    "namespace": "demo",
    "password": "12345",
	"resourceLimits": {
		"cpuMax": "2",
		"cpuDefault": "1",
		"memMax": "8Gi",
		"memDefault": "100Mi",
		"storageQuota": "10Gb"
	}
}

The above project specification results in the following ResourceQuota and LimitRange objects created for the namespace:

{
    "kind": "ResourceQuota",
    "apiVersion": "v1",
    "metadata": {
        "name": "quota",
    },
    "spec": {
        "hard": {
            "cpu": "2",
            "memory": "8Gi"
        }
    },
}

{
    "kind": "LimitRange",
    "apiVersion": "v1",
    "metadata": {
        "name": "limits",
    },
    "spec": {
        "limits": [
            {
                "type": "Container",
                "default": {
                    "cpu": "1",
                    "memory": "100Mi"
                }
            }
        ]
    }
}

Resource requests by services

The service developer will be required to specify resource requirements for their services. Project administrators will be able to override these values during service configuration.

Individual Pods have both requests and limits. Pods will be terminated when memory limits are reached and may be terminated if they exceed CPU limits.

The following example specification illustrates service-level resource limits and requests. These translate directly into Pod limits and requests:

{
    "label": "Clowder"
	"resources": {
    	"limits": [
			{ 
		  		"cpu": "250m",
  		  		"memory": "1G"
    		}
		],
		"requests": [
			{
				"cpu": "100m",
				"memory": "128Mi"
			}
		]
	}
}

Estimating resource requirements

Requiring service developers to provide resource requests and limits means that we need to provide tools to enable them to estimate limits. We should also provide some level of logging to indicate when services are nearing or exceeding limits, or failing.

To estimate resource requirements for a service, service developers may use the following tools:

docker stats
sysdig/csysdig
/sys/fs/cgroup/memory/
/sys/fs/cgroup/memory/memory.usage_in_bytes

The system should track when services near or exeed limits to notify the user. This will require constant monitoring of running containers with associated alerts. The API server will start a monitoring process to keep track of the status of all running pods systemwide.

When things fail

Once limits are enforced, services fail during creation or runtime if they reach the specified limits. Kubernetes generally puts pods into the pending state, waiting for resources to free up or change to allow a process to run. We have several test cases:

Pod exceeds available memory during create
Pod exceeds available CPU during create
Pod exceeds available memory at runtime (out of memory error)
Pod exceeds available CPU during runtime

Unfortunately, Kubernetes does not provide a single mechanism for monitoring resource status. Each resource type (service, replication controller, pod) has a "watch" interface. Additionally, status information is available via Kubernetes events.

Watching resource changes

Implementing resource limits requires that we handle unexpected failures, which means monitoring Kubernetes events for resource status changes. For example, if the user launches a stack that exceeds their memory allocation, the Pod will fail to start – hanging in a "Pending" state until memory is freed. While the Pod create succeeds, it will not enter the ready state.

Kubernetes events are available via the "watch" methods. We are specifically interested in Services, Replication Controllers, Pods, and Events:

localhost:8080/api/v1/watch/pods
localhost:8080/api/v1/watch/services
localhost:8080/api/v1/watch/replicationcontrollers
localhost:8080/api/v1/watch/events

The "start stack" process launches services, replication controllers, and pods.

During initialization, the NDSLabs API server will start a set of threads to monitor the various event channels and update stack status accordingly, asynchronously.

Pod Events

Type	Phase	Ready	Stack service status
ADDED			starting
DELETED			stopped
MODIFIED	running	true	started
MODIFIED	pending		starting

Events

Involved Object	Type	Reason	Stack service status	Notes
Pod	Normal			Any status message is set on the StackService, but otherwise the status us unchanged with "Normal" events.
Pod	Warning	Unhealthy MissingClusterDNS FailedSync	<ignored>	These events all occur during Pod startup and are ignored.
Pod	Warning	All other	error	Examples include BackOff, FailedScheduling
Replication Controller	Warning		error

Examples:

Case	Reason	Message
Test - invalid image	Failed	Failed to pull image "xyzzy": Error: image library/xyzzy not found
	BackOff	Back-off pulling image
Test - memory hog		containerStatuses.state.terminated.reason = "OOMKilled"
Test - too big	FailedCreate	Exceeded quota: quota

Final points:

When all stack services have status=ready, then the stack status will be "started"
When all pods have been deleted, then the stack status will be "stopped"
If a stack service is in error, the stack status is "error"

Open issue:

One problem we'll need to address is when a Pod remains in the pending state for a long time. Kubernetes does not stop pods, since it is simply waiting for resources to be freed. Ideally, we'd watch the event logs for specific errors, putting the pod and stack in an error state. In the start and stop stack methods, instead of polling for the pods to be in a ready state, we can simply poll for the stack service to be in the ready state.

Quality of Service

Kubernetes does not yet have QoS features.

Space shortcuts

Page tree