Much of this material was pulled from the Kubernetes docs: http://kubernetes.io/docs/admin/cluster-management/#maintenance-on-a-node

The purpose of this wiki page is to house the results of  NDS-691 - Getting issue details... STATUS

OpenStack Nodes

From  NDS-728 - Getting issue details... STATUS

On several occasions, we've had nodes that just won't reboot (e.g., corrupt disk image). There are two approaches to resolving this problem:

Option 1: re-run ansible

  • Shutdown nodes, rename to node-dead or delete. 
  • Detach volumes, but do not remove
  • Re-run ansible openstack-provision and k8s-install

Option 2: 

  • Shutdown nodes, rename to node-dead or delete. 
  • In OpenStack, make a snapshot of a good node (similar type)
  • Create new instance from snapshot
  • Change instance name
  • Edit /etc/kubelet/kubelet.config, change name to correct name 
  • Re-attach volumes

Drain + Cordon

Drain will automatically execute cordon on a node, meaning the scheduler will no longer run any new pods there.

Drain will also kill any pods currently running on that node, effectively clearing it out for reboot and telling the scheduler to temporarily put everything elsewhere.

  • Master: Not applicable - no pods run on master
  • LMA: Cordon makes node unschedulable, drain kills all pods
  • GFS: Not applicable to gfs nodes, as they only run daemonsets
  • LoadBal: Cordon makes node unschedulable, drain kills all pods - this causes them to restart elsewhere
  • Compute: Cordon makes node unschedulable, drain kills all pods - this causes them to restart elsewhere

Reboot (optional)

Rebooting a node will automatically restart all Kubernetes system services (apiserver / controller on master or kubelet on all other node types).

  • Master: All services continue to run uninterrupted. I suspect that outstanding / new API requests will be discarded until master is back online. Kubernetes services (apiserver, controller, and scheduler) automatically started back up and recovered.
  • LMA: Node automatically righted itself and influxdb recovered historical profiling data, but lost saved dashboard changes. A volume mounted into grafana might fix this - filed  NDS-699 - Getting issue details... STATUS  to investigate.
  • GFS: Everything seemed to recover just fine.. daemonset containers were automatically restarted
  • LoadBal: Everything seemed to recover just fine.. containers that were running were restarted (supposedly: if node comes back within 5 minutes - otherwise they are rescheduled).
  • Compute: Everything seemed to recover just fine.. containers that were running were restarted (supposedly: if node comes back within 5 minutes - otherwise they are rescheduled).

Uncordon

Uncordon will put you node back into a "schedulable" state, allowing the scheduler to run pods there. 

  • Master: Not applicable - no pods run on master
  • LMA: Uncordon makes node schedulable once again
  • GFS: Not applicable to gfs nodes, as they only run daemonsets
  • LoadBal: Uncordon makes node schedulable once again
  • Compute: Uncordon makes node schedulable once again

Pending Pods

When a node is uncordoned, pods in a Pending state pods should automatically be scheduled on the newly-schedulable node, unless some other error is preventing the pods from being rescheduled thusly.

  • Master: Not applicable - no pods run on master
  • LMA: ReplicaSets / ReplicationControllers will automatically recreate pods after a minute or so
  • GFS: Not applicable to gfs nodes, as they only run daemonsets
  • LoadBal: Once a proper node was found, about a minute later the containers were automatically restarted
  • Compute: Pending pods are scheduled appropriately as nodes become schedulable

Resource Constraint

This part seemed sort of secondary and difficult (at least for me) to test, verify, or recreate.

I was using a tool called stress to generate CPU load, IO / HDD traffic, and eat up node memory.

It can be installed on debian using the following command:

apt-get -qq update && apt-get -qq install stress 
  • Master: stress seemed to have little effect on master... more investigation may be needed
  • LMA: Grafana stops responding while load issues occur, but works once load subsides... node profiling data may be lost during the outage period
  • GFS: Seemingly no effect... node profiling data may be lost during the outage period... I was able to read data off of one brick while stressing both of the gfs nodes serving its files... with the new sharding configuration, if 2 gfs nodes (housing the same brick) are stressed simultaneously, new files will go to the other (available) brick(s)... occasionally after long periods of heavy load, the kubelet can die and must be manually restarted to correct the issue
  • LoadBal: Requests through the loadbalancer hang slightly, but still seem to return correctly... Grafana hangs while loadbal experience issues, but works again once load subsides... node profiling data may be lost during the outage period
  • Compute: Seemingly no effect... running services will obviously hang while load issues occur... node profiling data may be lost during the outage period

FAQ

Does anything strange happen with the pods?

  • Master: Not applicable - no pods run on master
  • LMA: Seemingly nothing.. pods did not even report any restarts and uptime remained intact
  • GFS: DaemonSets will restart their pods once the node comes back up
  • LoadBal: Pods that were running restart their containers once the node comes back up
  • Compute: Pods that were running restart their containers once the node comes back up

Does anything strange happen with the kube services?

  • Master: Kubernetes services seemed to recover gracefully without user intervention
  • LMA: Kubelet recovered gracefully without user intervention
  • GFS: Kubelet recovered gracefully without user intervention
  • LoadBal: Kubelet recovered gracefully without user intervention
  • Compute: Kubelet recovered gracefully without user intervention

Were there any other strange observations or open questions to be aware of?

  1. Master
    • Q: What happens to NRPE on master? Is this running as a service? (it should be)
      • A: NDS-600 - Getting issue details... STATUS  included the --restart=unless-stopped flag, so this container should automatically restart with the node unless the nrpe container has been explicitly stopped for node maintenance
  2. LMA
    • Observation: LMA services encountered the usual "503" (then "502", then"504") issue, but started working again after a couple of minutes (see NDS-635)
  3. GFS
    • Seems super stable - full copy of the data was sitting at /media/brick0, so I wonder if the new configuration is providing some form of 4-way replication? half of the files are on gfs1 + gfs2, other half are on gfs3 + gfs4, so you can lose any two nodes and still access all of your data
  4. LoadBal
    • Assuming we mount the GLFS client to the loadbal for no reason anyways, we might consider running the apiserver and/or the angular-ui from this node, since it has virtually no load on it besides ingress web traffic
  5. Compute
    • Everything seems to be running fairly smoothly
    • Some odd behavior occurred that I have been unable to reproduce: the docker daemon went down on the gfs nodes and needed to be restarted via SSH + systemctl... I have since been unable to reproduce such behavior

 

  • No labels