Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

From 

Jira
serverJIRA
serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
keyNDS-728

On several occasions, we've had nodes that just won't reboot (e.g., corrupt disk image). There are two approaches to resolving this problem:

Option 1: re-run ansible

  • Shutdown nodes, rename to node-dead or delete. 
  • Detach volumes, but do not remove
  • Re-run ansible openstack-provision and k8s-install

Option 2: 

  • Shutdown nodes, rename to node-dead or delete. 
  • In OpenStack, make a snapshot of a good node (similar type)
  • Create new instance from snapshot
  • Change instance name
  • Edit /etc/kubelet/kubelet.config, change name to correct name 
  • Re-attach volumes

Drain + Cordon

Drain will automatically execute cordon on a node, meaning the scheduler will no longer run any new pods there.

...