...

From

Jira

On several occasions, we've had nodes that just won't reboot (e.g., corrupt disk image). There are two approaches to resolving this problem:

Option 1: re-run ansible

Option 2:

Shutdown nodes, rename to node-dead or delete.
In OpenStack, make a snapshot of a good node (similar type)
Create new instance from snapshot
Change instance name
Edit /etc/kubelet/kubelet.config, change name to correct name
Reattach any previously-attached OpenStack volumes to the instanceRe-attach volumes

Drain + Cordon

Drain will automatically execute cordon on a node, meaning the scheduler will no longer run any new pods there.

...

Space shortcuts