Much of this material was pulled from the Kubernetes docs: http://kubernetes.io/docs/admin/cluster-management/#maintenance-on-a-node

The purpose of this wiki page is to house the results of NDS-691 - Getting issue details... STATUS

OpenStack Nodes

From NDS-728 - Getting issue details... STATUS

On several occasions, we've had nodes that just won't reboot (e.g., corrupt disk image). There are two approaches to resolving this problem:

Option 1: re-run ansible

Shutdown nodes, rename to node-dead or delete.
Detach volumes, but do not remove
Re-run ansible openstack-provision and k8s-install

Option 2:

Shutdown nodes, rename to node-dead or delete.
In OpenStack, make a snapshot of a good node (similar type)
Create new instance from snapshot
Change instance name
Edit /etc/kubelet/kubelet.config, change name to correct name
Re-attach volumes

Drain + Cordon

Drain will automatically execute cordon on a node, meaning the scheduler will no longer run any new pods there.

Drain will also kill any pods currently running on that node, effectively clearing it out for reboot and telling the scheduler to temporarily put everything elsewhere.

Master: Not applicable - no pods run on master
LMA: Cordon makes node unschedulable, drain kills all pods
GFS: Not applicable to gfs nodes, as they only run daemonsets
LoadBal: Cordon makes node unschedulable, drain kills all pods - this causes them to restart elsewhere
Compute: Cordon makes node unschedulable, drain kills all pods - this causes them to restart elsewhere

Reboot (optional)

Rebooting a node will automatically restart all Kubernetes system services (apiserver / controller on master or kubelet on all other node types).

Master: All services continue to run uninterrupted. I suspect that outstanding / new API requests will be discarded until master is back online. Kubernetes services (apiserver, controller, and scheduler) automatically started back up and recovered.
LMA: Node automatically righted itself and influxdb recovered historical profiling data, but lost saved dashboard changes. A volume mounted into grafana might fix this - filed NDS-699 - Getting issue details... STATUS to investigate.
GFS: Everything seemed to recover just fine.. daemonset containers were automatically restarted
LoadBal: Everything seemed to recover just fine.. containers that were running were restarted (supposedly: if node comes back within 5 minutes - otherwise they are rescheduled).
Compute: Everything seemed to recover just fine.. containers that were running were restarted (supposedly: if node comes back within 5 minutes - otherwise they are rescheduled).

Uncordon

Uncordon will put you node back into a "schedulable" state, allowing the scheduler to run pods there.

Master: Not applicable - no pods run on master
LMA: Uncordon makes node schedulable once again
GFS: Not applicable to gfs nodes, as they only run daemonsets
LoadBal: Uncordon makes node schedulable once again
Compute: Uncordon makes node schedulable once again

Pending Pods

When a node is uncordoned, pods in a Pending state pods should automatically be scheduled on the newly-schedulable node, unless some other error is preventing the pods from being rescheduled thusly.

Master: Not applicable - no pods run on master
LMA: ReplicaSets / ReplicationControllers will automatically recreate pods after a minute or so
GFS: Not applicable to gfs nodes, as they only run daemonsets
LoadBal: Once a proper node was found, about a minute later the containers were automatically restarted
Compute: Pending pods are scheduled appropriately as nodes become schedulable

Resource Constraint

This part seemed sort of secondary and difficult (at least for me) to test, verify, or recreate.

I was using a tool called stress to generate CPU load, IO / HDD traffic, and eat up node memory.

It can be installed on debian using the following command:

apt-get -qq update && apt-get -qq install stress

Master: stress seemed to have little effect on master... more investigation may be needed
LMA: Grafana stops responding while load issues occur, but works once load subsides... node profiling data may be lost during the outage period
GFS: Seemingly no effect... node profiling data may be lost during the outage period... I was able to read data off of one brick while stressing both of the gfs nodes serving its files... with the new sharding configuration, if 2 gfs nodes (housing the same brick) are stressed simultaneously, new files will go to the other (available) brick(s)... occasionally after long periods of heavy load, the kubelet can die and must be manually restarted to correct the issue
LoadBal: Requests through the loadbalancer hang slightly, but still seem to return correctly... Grafana hangs while loadbal experience issues, but works again once load subsides... node profiling data may be lost during the outage period
Compute: Seemingly no effect... running services will obviously hang while load issues occur... node profiling data may be lost during the outage period

FAQ

Does anything strange happen with the pods?

Master: Not applicable - no pods run on master
LMA: Seemingly nothing.. pods did not even report any restarts and uptime remained intact
GFS: DaemonSets will restart their pods once the node comes back up
LoadBal: Pods that were running restart their containers once the node comes back up
Compute: Pods that were running restart their containers once the node comes back up

Does anything strange happen with the kube services?

Master: Kubernetes services seemed to recover gracefully without user intervention
LMA: Kubelet recovered gracefully without user intervention
GFS: Kubelet recovered gracefully without user intervention
LoadBal: Kubelet recovered gracefully without user intervention
Compute: Kubelet recovered gracefully without user intervention

Were there any other strange observations or open questions to be aware of?

Master
- Q: What happens to NRPE on master? Is this running as a service? (it should be)
  - A: NDS-600 - Getting issue details... STATUS included the --restart=unless-stopped flag, so this container should automatically restart with the node unless the nrpe container has been explicitly stopped for node maintenance
LMA
- Observation: LMA services encountered the usual "503" (then "502", then"504") issue, but started working again after a couple of minutes (see NDS-635)
GFS
- Seems super stable - ~~full copy of the data was sitting at /media/brick0, so I wonder if the new configuration is providing some form of 4-way replication?~~ half of the files are on gfs1 + gfs2, other half are on gfs3 + gfs4, so you can lose any two nodes and still access all of your data
LoadBal
- Assuming we mount the GLFS client to the loadbal for no reason anyways, we might consider running the apiserver and/or the angular-ui from this node, since it has virtually no load on it besides ingress web traffic
Compute
- Everything seems to be running fairly smoothly
- Some odd behavior occurred that I have been unable to reproduce: the docker daemon went down on the gfs nodes and needed to be restarted via SSH + systemctl... I have since been unable to reproduce such behavior

Space shortcuts

Page tree

OpenStack Nodes

Drain + Cordon

Reboot (optional)

Uncordon

Pending Pods

Resource Constraint

FAQ

Does anything strange happen with the pods?

Does anything strange happen with the kube services?

Were there any other strange observations or open questions to be aware of?

Space shortcuts

Page tree

Kubernetes Node Maintenance

OpenStack Nodes

Drain + Cordon

Reboot (optional)

Uncordon

Pending Pods

Resource Constraint

FAQ

Does anything strange happen with the pods?

Does anything strange happen with the kube services?

Were there any other strange observations or open questions to be aware of?