Getting Started

Clone the GlusterFS repo containing the necessary Kubernetes specs:

git clone https://github.com/nds-org/gluster.git
cd gluster/

Server Setup

Create the gluster-server DaemonSet using kubectl:

kubectl create -f kubernetes/gluster-server-ds.yaml

This spec runs the ndslabs/gluster container in "server mode" on Kubernetes nodes labeled with ndslabs-role=storage.

Once all of the server containers are up, we must tell them to cooperate with each other using the gluster CLI.

The steps below then only to be done from inside of a single glusterfs-server container.

Alternative: Raw Docker

docker run --name=gfs --net=host --pid=host --privileged -v /dev:/dev -v <ABSOLUTE_PATH_TO_SHARED_DATA>:/var/glfs -v /run:/run -v /:/media/host -it -d gluster:local

Getting into a Server Container

Using kubectl, exec into one of the GlusterFS servers:

core@willis8-k8-test-1 ~ $ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       NODE
coffee-rc-4u3pb              1/1       Running   0          12d       192.168.100.65
coffee-rc-5m4t6              1/1       Running   0          12d       192.168.100.65
default-http-backend-y98iw   1/1       Running   0          22h       192.168.100.64
glusterfs-server-hh5rm       1/1       Running   0          5d        192.168.100.156
glusterfs-server-zoefs       1/1       Running   0          5d        192.168.100.89
ndslabs-apiserver-zqgj8      1/1       Running   0          1d        192.168.100.66
ndslabs-gui-p0hjh            1/1       Running   0          23h       192.168.100.66
nginx-ilb-rc-x853y           1/1       Running   0          6d        192.168.100.64
tea-rc-8saiu                 1/1       Running   0          12d       192.168.100.65
tea-rc-t403k                 1/1       Running   0          12d       192.168.100.65
core@willis8-k8-test-1 ~ $ kubectl exec -it glusterfs-server-zoefs bash

Take note of all node IPs that are running glusterfs-server pods. You will need these IPs to finish configuring GlusterFS.

Peer Probe

Once inside of the gluster server container, perform a peer probe on all other gluster nodes.

Do not probe the host's own IP.

For example, since we are executing from 192.168.100.89, we must probe our other storage node:

root@willis-k8-test-gluster:/# gluster peer probe 192.168.100.156

Create Volume

Ansible has already created the placeholder directories for bricks, we just need to create and start a Gluster volume pointing to the different brick directories on each node.

This is done using gluster create volume as outlines below:

root@willis-k8-test-gluster:/# gluster volume create ndslabs transport tcp 192.168.100.89:/var/glfs/brick0 192.168.100.156:/var/glfs/ndslabs/brick0

NOTE: Our Ansible playbook mounts GlusterFS bricks at /media/brick0. We will need to update this in the future to be consistent throughout.

To be sure the volume was created successfully, you can run the following commands and see your new volume:

root@willis-k8-test-gluster:/# gluster volume list
ndslabs
root@willis-k8-test-gluster:/# gluster volume status
Volume ndslabs is not started

Reusing a Volume

Simply add force to the end of your volume create command to force GlusterFS to reuse a volume that is no longer accessible:

root@willis-k8-test-gluster:/# gluster volume create ndslabs transport tcp 192.168.100.89:/media/brick0/brick/ndslabs 192.168.100.156:/media/brick0/brick/ndslabs
volume create: ndslabs: failed: /media/brick0/brick/ndslabs is already part of a volume
root@willis-k8-test-gluster:/# gluster volume create ndslabs transport tcp 192.168.100.89:/media/brick0/brick/ndslabs 192.168.100.156:/media/brick0/brick/ndslabs force
volume create: ndslabs: success: please start the volume to access data

The alternative solution would be to delete / recreate the mount point:

root@willis-k8-test-gluster:/# rm -rf /path/to/brick0
root@willis-k8-test-gluster:/# mkdir -p /path/to/brick0

Start Volume

Now that we have created our volume, we must start it in order for clients to mount it:

root@willis-k8-test-gluster:/# gluster volume start ndslabs
volume start: ndslabs: success

Our volume is now being served out to the cluster over NFS, and we are ready for our clients to mount the volume.

Adding a Brick

Suppose we have a simple replicated gluster volume with 2 bricks, and we are running low on space... we want to expand the storage it contains:

# On the host node, via SSH
core@workshop1-node1 ~ $ df
Filesystem             1K-blocks     Used Available Use% Mounted on
devtmpfs                16460056        0  16460056   0% /dev
tmpfs                   16476132        0  16476132   0% /dev/shm
tmpfs                   16476132     1872  16474260   1% /run
tmpfs                   16476132        0  16476132   0% /sys/fs/cgroup
/dev/vda9               38216204   256716  36301140   1% /
/dev/mapper/usr          1007760   639352    316392  67% /usr
tmpfs                   16476132    17140  16458992   1% /tmp
tmpfs                   16476132        0  16476132   0% /media
/dev/vda1                 130798    39292     91506  31% /boot
/dev/vda6                 110576       64    101340   1% /usr/share/oem
/dev/vdb                41922560  6023596  35898964  15% /var/lib/docker
/dev/vdc                10475520   626268   9849252   6% /media/storage
/dev/vdd               104806400 49157880  55648520  47% /media/brick0
192.168.100.122:global 104806400 87618944  17187456  84% /var/glfs/global
tmpfs                    3295224        0   3295224   0% /run/user/500
/dev/vde               209612800    32928 209579872   1% /media/brick1


# Inside of the GLFS server pod
root@workshop1-node1:/# gluster volume info global
 
Volume Name: global
Type: Replicate
Volume ID: ca59a98e-c959-454e-8ac3-9082b0ed2856
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.100.122:/media/brick0/brick
Brick2: 192.168.100.116:/media/brick0/brick
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on

Provision and attach a new OpenStack volume to your existing instance, then format it with XFS:

core@workshop1-node1 ~ $ sudo mkfs -t xfs /dev/vde
meta-data=/dev/vde               isize=256    agcount=4, agsize=13107200 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=52428800, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=25600, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

You will then need to build up a *.mount file as below:

$ vi media-brick1.mount
[Unit]
Description=Mount OS_DEVICE on MOUNT_PATH
After=local-fs.target

[Mount]
What=OS_DEVICE
Where=MOUNT_PATH
Type=FS_TYPE
Options=noatime

[Install]
WantedBy=multi-user.target

where:

OS_DEVICE is the source device in /dev where your raw volume is mounted (i.e. /dev/vde)
MOUNT_PATH is the target mount path where your data should be mounted (i.e. /media/brick1)
FS_TYPE is a string of which filesystem will be formatted on the new volume (i.e. xfs)

Place this file in /etc/systemd/system/

Finally, start and enable your service to mount the volume to CoreOS and ensure it is remounted on restart:

sudo mv media-brick1.mount /etc/systemd/system/media-brick1.mount
sudo systemctl daemon-reload
sudo systemctl start media-brick1.mount
sudo systemctl enable  media-brick1.mount
sudo systemctl unmask  media-brick1.mount

You will need to perform the above steps on each of your GLFS servers before continuing

Now you'll need to exec into one of the GLFS server pods and perform the following:

# Peer probe the other IP in the cluster (gluster service IP also seems to work)
$ gluster peer probe 10.254.202.236 
peer probe: success. Host 192.168.100.1 port 24007 already in peer list


# This one fails because we did not include our new brick's second replica
$ gluster volume add-brick global 192.168.100.122:/media/brick1                                                                                             
volume add-brick: failed: Incorrect number of bricks supplied 1 with count 2


# This one fails because we need a sub-directory of the mount point
$ gluster volume add-brick global 192.168.100.122:/media/brick1 192.168.100.116:/media/brick1 
volume add-brick: failed: The brick 192.168.100.116:/media/brick1 is a mount point. Please create a sub-directory under the mount point and use that as the brick directory. Or use 'force' at the end of the command if you want to override this behavior.


# This one works! :D
$ gluster volume add-brick global 192.168.100.122:/media/brick1/brick 192.168.100.116:/media/brick1/brick
volume add-brick: success

And now we can see that our new brick has been added to the existing volume:

core@workshop1-node1 ~ $ df
Filesystem             1K-blocks     Used Available Use% Mounted on
devtmpfs                16460056        0  16460056   0% /dev
tmpfs                   16476132        0  16476132   0% /dev/shm
tmpfs                   16476132     1792  16474340   1% /run
tmpfs                   16476132        0  16476132   0% /sys/fs/cgroup
/dev/vda9               38216204   256736  36301120   1% /
/dev/mapper/usr          1007760   639352    316392  67% /usr
tmpfs                   16476132    17140  16458992   1% /tmp
tmpfs                   16476132        0  16476132   0% /media
/dev/vda1                 130798    39292     91506  31% /boot
/dev/vda6                 110576       64    101340   1% /usr/share/oem
/dev/vdb                41922560  6023732  35898828  15% /var/lib/docker
/dev/vdc                10475520   626360   9849160   6% /media/storage
/dev/vdd               104806400 49157820  55648580  47% /media/brick0
192.168.100.122:global 314419200 49191424 265227776  16% /var/glfs/global
tmpfs                    3295224        0   3295224   0% /run/user/500
/dev/vde               209612800    33088 209579712   1% /media/brick1


root@workshop1-node1:/# gluster volume info global
 
Volume Name: global
Type: Distributed-Replicate
Volume ID: ca59a98e-c959-454e-8ac3-9082b0ed2856
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 192.168.100.122:/media/brick0/brick
Brick2: 192.168.100.116:/media/brick0/brick
Brick3: 192.168.100.122:/media/brick1/brick
Brick4: 192.168.100.116:/media/brick1/brick
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on

Client Setup

Create the gluster-client DaemonSet using kubectl:

kubectl create -f kubernetes/gluster-client-ds.yaml

This spec runs the ndslabs/gluster container in "client mode" on Kubernetes nodes labeled with ndslabs-role=compute.

Once each client container starts, it will mount the GlusterFS volume to each compute host using NFS.

Testing

Once the clients are online, we can run a simple test of GlusterFS to ensure that it is correctly serving and synchronizing the volume.

From the Kubernetes master, run the following command to see which nodes are running the glusterfs-client containers:

core@willis8-k8-test-1 ~ $ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       NODE
coffee-rc-4u3pb              1/1       Running   0          12d       192.168.100.65
coffee-rc-5m4t6              1/1       Running   0          12d       192.168.100.65
default-http-backend-y98iw   1/1       Running   0          23h       192.168.100.64
glusterfs-client-4hm9y       1/1       Running   0          5d        192.168.100.65
glusterfs-client-6c12y       1/1       Running   0          5d        192.168.100.66
glusterfs-server-hh5rm       1/1       Running   0          5d        192.168.100.156
glusterfs-server-zoefs       1/1       Running   0          5d        192.168.100.89
ndslabs-apiserver-zqgj8      1/1       Running   0          1d        192.168.100.66
ndslabs-gui-p0hjh            1/1       Running   0          23h       192.168.100.66
nginx-ilb-rc-x853y           1/1       Running   0          6d        192.168.100.64
tea-rc-8saiu                 1/1       Running   0          12d       192.168.100.65
tea-rc-t403k                 1/1       Running   0          12d       192.168.100.65

Create two SSH sessions - one into each compute node (in this case, 192.168.100.65 and 192.168.100.66).

First Session

In one SSH session, run a BusyBox image mounted with our shared volume:

docker run -v /var/glfs:/var/glfs --rm -it busybox

Inside of the BusyBox container, create a test file:

echo "testing!" > /var/glfs/ndslabs/test.file

Second Session

On the other machine, test that mapping the same directory into BusyBox we can see the changes from the first host:

docker run -v /var/glfs:/var/glfs --rm -it busybox

Running an ls on /var/glfs/ndslabs/ should show the test file created on the other node:

ls -al /var/glfs/ndslabs

This proves that we can mount via NFS onto each node, map the NFS mount into containers, and allow those containers to ingest or modify the data from the NFS mount.

Page tree

GlusterFS HostHack in Kubernetes