...
From
Jira | ||||||
---|---|---|---|---|---|---|
|
On several occasions, we've had nodes that just won't reboot (e.g., corrupt disk image). There are two approaches to resolving this problem:
Option 1: re-run ansible
- Shutdown nodes, rename to node-dead or delete.
- Detach volumes, but do not remove
- Re-run ansible openstack-provision and k8s-install
Option 2:
- Shutdown nodes, rename to node-dead or delete.
- In OpenStack, make a snapshot of a good node (similar type)
- Create new instance from snapshot
- Change instance name
- Edit /etc/kubelet/kubelet.config, change name to correct name
- Reattach any previously-attached OpenStack volumes to the instanceRe-attach volumes
Drain + Cordon
Drain will automatically execute cordon on a node, meaning the scheduler will no longer run any new pods there.
...