Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

Rook is an open-source distributed filesystem designed for use under Kubernetes, and is only supported on Kubernetes 1.7 or higher.

...

Source code is also available here: https://github.com/rook/rook

Prerequisites

Minimum Version: Kubernetes v1.7 or higher is supported by Rook.

...

You will also need to set up RBAC, and ensure that the Flex volume plugin has been configured.

Set the dataDirHostPath

If you are using dataDirHostPath to persist Rook data on Kubernetes hosts, make sure your host has at least 5GB of space available on the specified path.

Setting up RBAC

On Kubernetes 1.7+, you will need to configure Rook to use RBAC appropriately.

See https://rook.github.io/docs/rook/master/rbac.html

Flex Volume Configuration

The Rook agent requires setup as a Flex volume plugin to manage the storage attachments in your cluster. See the Flex Volume Configuration topic to configure your Kubernetes deployment to load the Rook volume plugin.

Getting Started

Now that we've examined each of the pieces, let's zoom out and see what we can do with the whole cluster.

For the quickest quick start, check out the Rook QuickStart guide: https://rook.github.io/docs/rook/master/quickstart.html

Getting Started without an Existing Kubernetes cluster

The easiest way to deploy a new Kubernetes cluster with Rook support on OpenStack (Nebula / SDSC) is to use the https://github.com/nds-org/kubeadm-terraform repository.

This may work for other cloud providers as well, but has not yet been thoroughly tested.

Getting Started on an Existing Kubernetes cluster

If you’re feeling lucky, a simple Rook cluster can be created with the following kubectl commands. For the more detailed install, skip to the next section to deploy the Rook operator.

...

For a more detailed look at the deployment process, see below.

Deploy the Rook Operator

The first step is to deploy the Rook system components, which include the Rook agent running on each node in your cluster as well as Rook operator pod.

...

You can also deploy the operator with the Rook Helm Chart.

Restart Kubelet (Kubernetes 1.7.x only)

For versions of Kubernetes prior to 1.8, the Kubelet process on all nodes will require a restart after the Rook operator and Rook agents have been deployed. As part of their initial setup, the Rook agents deploy and configure a Flexvolume plugin in order to integrate with Kubernetes’ volume controller framework. In Kubernetes v1.8+, the dynamic Flexvolume plugin discovery will find and initialize our plugin, but in older versions of Kubernetes a manual restart of the Kubelet will be required.

Create a Rook Cluster

Now that the Rook operator and agent pods are running, we can create the Rook cluster. For the cluster to survive reboots, make sure you set the dataDirHostPath property. For more settings, see the documentation on configuring the cluster.

...

Code Block
languagebash
$ kubectl -n rook get pod
NAME                              READY     STATUS    RESTARTS   AGE
rook-ceph-mgr0-1279756402-wc4vt   1/1       Running   0          5m
rook-ceph-mon0-jflt5              1/1       Running   0          6m
rook-ceph-mon1-wkc8p              1/1       Running   0          6m
rook-ceph-mon2-p31dj              1/1       Running   0          6m
rook-ceph-osd-0h6nb               1/1       Running   0          5m

Monitoring Your Rook Cluster

A glimpse into setting up Prometheus for monitoring Rook: https://rook.github.io/docs/rook/master/monitoring.html

Advanced Configuration

Advanced Configuration options are also documented here: https://rook.github.io/docs/rook/master/advanced-configuration.html

Debugging

For common issues, see https://github.com/rook/rook/blob/master/Documentation/common-issues.md

For more help debugging, see https://github.com/rook/rook/blob/master/Documentation/toolbox.md

Cluster Teardown

See https://rook.github.io/docs/rook/master/teardown.html for thorough steps on destroying / cleaning up your Rook cluster

Components

Rook runs a number of smaller microservices that run on different nodes in your Kubernetes cluster:

  • The Rook Operator + API
  • Ceph Managers / Monitors / OSDs
  • Rook Agents

The Rook Operator

The Rook operator is a simple container that has all that is needed to bootstrap and monitor the storage cluster. 

...

The Rook operator also creates the Rook agents as a daemonset, which runs a pod on each node. 

Ceph Managers / Monitors / OSDs

The operator will start and monitor ceph monitor pods and a daemonset for the OSDs, which provides basic Reliable Autonomic Distributed Object Store (RADOS) storage

...

Ceph monitors (aka "Ceph mons") will be started or failed over when necessary, and other adjustments are made as the cluster grows or shrinks. 

Rook Agents

Each agent is a pod deployed on a different Kubernetes node, which configures a Flexvolume plugin that integrates with Kubernetes’ volume controller framework.

All storage operations required on the node are handled such as attaching network storage devices, mounting volumes, and formatting the filesystem.

Storage

Rook provides three types of storage to the Kubernetes cluster:

  • Block Storage: Mount storage to a single pod
  • Object Storage: Expose an S3 API to the storage cluster for applications to put and get data that is accessible from inside or outside the Kubernetes cluster
  • Shared File System: Mount a file system that can be shared across multiple pods

Custom Resource Definitions

Rook also allows you to create and manage your storage cluster through custom resource definitions (CRDs). Each type of resource has its own CRD defined.

  • Cluster: A Rook cluster provides the basis of the storage platform to serve block, object stores, and shared file systems.
  • Pool: A pool manages the backing store for a block store. Pools are also used internally by object and file stores.
  • Object Store: An object store exposes storage with an S3-compatible interface.
  • File System: A file system provides shared storage for multiple Kubernetes pods.

Shared Storage Example

Shamelessly stolen from https://rook.github.io/docs/rook/master/filesystem.html

Prerequisites

This guide assumes you have created a Rook cluster as explained in the main Kubernetes guide

Multiple File Systems Not Supported

By default only one shared file system can be created with Rook. Multiple file system support in Ceph is still considered experimental and can be enabled with the environment variable ROOK_ALLOW_MULTIPLE_FILESYSTEMS defined in rook-operator.yaml.

Please refer to cephfs experimental features page for more information.

Create the File System

Create the file system by specifying the desired settings for the metadata pool, data pools, and metadata server in the FilesystemCRD. In this example we create the metadata pool with replication of three and a single data pool with erasure coding. For more options, see the documentation on creating shared file systems.

...

Code Block
languagebash
$ ceph status                                                                                                                                              
  ...
  services:
    mds: myfs-1/1/1 up {[myfs:0]=mzw58b=up:active}, 1 up:standby-replay

Consume the Shared File System: Busybox + NGINX Example

As an example, we will start the kube-registry pod with the shared file system as the backing store. Save the following spec as kube-registry.yaml:

...

NOTE: I had to explicitly specify clusterName in the YAML above... newer versions of Rook will fallback to clusterNamespace

Kernel Version Requirement

If the Rook cluster has more than one filesystem and the application pod is scheduled to a node with kernel version older than 4.7, inconsistent results may arise since kernels older than 4.7 do not support specifying filesystem namespaces.

Testing Shared Storage

After creating our above example, we should now have 2 pods each with 2 containers running on 2 separate nodes:

...

You have just set up your first shared filesystem under Rook!

Under the Hood

For more information on the low-level processes involved in the above example, see https://github.com/rook/rook/blob/master/design/filesystem.md

...

The directories section is supposed to list the paths that will be included in the storage cluster. (Note that using two directories on the same physical device can cause a negative performance impact.)

Investigating Storage directories

Checking the logs for one of the Rook agents, we can see a success message shows us where the data really lives:

...

Obviously this is not where we want the shared filesystem data stored long-term, so I'll need to figure out why these files are persisted into /var/lib/kubelet and not into the directories specified in the Cluster configuration.

Digging Deeper into dataDirHostPath

Checking /var/lib/rook directory, we see a few sub-directories:

...

As you can see, these metadata files do not appear to be readable on disk and would likely need to be un-mangled by Rook to properly perform a backup.

Checking the kubelet logs...

Digging into the systemctl logs for kubectl, we can see it's complaining about the volume configuration:

...

Sadly, even setting this value explicitly did not fix my immediate issue.

Hacking Terraform

At this point, I decided to start hacking the Terraform deployment to get Rook working to the level we'll need for Workbench.

...

  • Rook has been upgraded from v0.6.2 to v0.7.1, in the helm install and in rook-cluster.yaml
  • Expanded storage section to include a nodes subsection - this specifies which machines / directories should be part of the storage cluster
  • Turn off useAllNodes

Checking the rook-operator logs...

Now, with the new version of rook up and running, I attempted to make a filesystem as before. This time, however, no pods were spawned following my filesystem's creation.

...

Changing /vol_b to /volb solved this problem - this must be adjusted both in the deploy-rook.sh script above, as well as the bootstrap-rook.sh script alongside of it.

Now we're getting somewhere...

After changing the volume path and redeploying (again), now myfs pods were being spawned after creating the filesystem in Kubernetes, as they should be:

...

  • Upgrading to Rook 0.7.1
  • The adjustments to the directories configuration in rook-cluster.yaml are now writing data to the correct drive (/volb), but that drive may be improperly formatted for use with Rook

Narrowing it down...

Adjusting my rook-cluster.yaml to only include the adjustments to the directories configuration, and to use storeType: filestore instead of bluestore.

...

Confirmed by this GitHub issue: https://github.com/rook/rook/issues/1604

Back to bluestore...

Switching back to storeType: bluestore on Rook v0.6.2 with the correct nodes/directories configuration:

...

I have noticed that cluster with pods in an error state such as this one will fail to terraform destroy (the operation never completes even after waiting 15+ minutes)

Resolution

After pouring through the docs and GitHub issues and tediously reading the source code, we found a concerning comment in a GitHub issue: https://github.com/rook/rook/issues/1220#issuecomment-343342515

...

Code Block
languageyml
titlerook-filesystem.yaml
apiVersion: rook.io/v1alpha1
kind: Filesystem
metadata:
  name: myfs
  namespace: rook
spec:
  metadataPool:
    replicated:
      size: 2
  dataPools:
    - erasureCoded:
       dataChunks: 2
       codingChunks: 1
  metadataServer:
    activeCount: 1
    activeStandby: true

Recovering from backup

This feature is currently in the planning stages: https://github.com/rook/rook/issues/1552

Unofficial Python script for creating / restoring backups from Rook: https://gitlab.com/costrouc/kubernetes-rook-backup

Edge Cases and Quirks

There are many pitfalls here, particularly surrounding my perceived fragility of the shared filesystem

DO NOT delete the filesystem before shutting down all of the pods consuming it

Deleting the shared filesystem out from under the pod will confuse the kubelet, and prevent it from being able to properly unmount and terminate your containers.

...

This will hopefully be improved in later versions of Kubernetes (1.9+?)

You must follow these cleanup steps before terraform destroy will work

Expanding on the above topic, terraform destroy will hang on destroying your cluster if you fail to cleanup your filesystems properly:

...

WARNING: this seems like a very tenuous/tedious process... I am hoping that later versions of terraform/rook will improve the stability of cleanup under these scenarios. Perhaps we can expand their cleanup to first drain all nodes of their running pods (if this is not already the case), although this would not fix the case of a user deleting the filesystem before a running pod that is consuming it - in this case, the pods will fail to terminate indefinitely, which I think is what is leading terraform to fail.

Kill (or hide) a hanging pod

There are a few ways to kill a pod with fire:

...

Only use this as a last resort on test clusters, and NEVER use --grace-period=0  on a production cluster.

Cleaning up failed runs of terraform destroy

Here is a quick checklist of the items that you will need to manually if you are unable to terraform destroy your cluster:

...