Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Normal
Fix Version/s: Labs Workbench - Beta
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
NDS Sprint 16, NDS Sprint 17

For each of the following, test and document what happens, for each kind of node (compute, loadbal, gfs, master, etc):

Reboot node
Cordon/drain node
Bring node back online
Pod in pending state
Node not responding:
- Hung node due to resource constraint - pegged cpu, out of memory, out of disk, etc
- Paused node
- Dead kubelet (this is apparently caused by resource constraints)
- Unschedulable node

Be sure to take note of:

What happens to running pods?
- Read-only pods
- Read-write pods
- Is some manual step needed?
- Do they recover automatically? (reboot => ok)
What happens to kube services?
- Do they fail?
- Do they recover?

has to be done before

NDS-686 Discuss plan for redeploying the beta cluster

Closed

is related to

NDS-926 More: what happens when bad things happen

Open

relates to

NDS-346 Determine issues when CoreOS rolling-update is enabled

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...

(1 mentioned in)

Assignee:: Sara Lambert

Reporter:: Craig Willis

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 10/Nov/16 3:29 PM

Updated:: 01/Jun/17 10:37 AM

Resolved:: 16/Dec/16 11:17 AM

Estimated:

2d

Remaining:

1d 2h

Logged:

6h