-
Task
-
Resolution: Fixed
-
Normal
-
None
-
None
-
None
-
NDS Sprint 16, NDS Sprint 17
For each of the following, test and document what happens, for each kind of node (compute, loadbal, gfs, master, etc):
- Reboot node
- Cordon/drain node
- Bring node back online
- Pod in pending state
- Node not responding:
- Hung node due to resource constraint - pegged cpu, out of memory, out of disk, etc
- Paused node
- Dead kubelet (this is apparently caused by resource constraints)
- Unschedulable node
Be sure to take note of:
- What happens to running pods?
- Read-only pods
- Read-write pods
- Is some manual step needed?
- Do they recover automatically? (reboot => ok)
- What happens to kube services?
- Do they fail?
- Do they recover?