2022/05/23 Update plan
Groups of cluster-prod
nodes:
(1) descut.cosmology.illinois.edu 141.142.161.46
(1) descut4.cosmology.illinois.edu 141.142.161.56
(2) descut5.cosmology.illinois.edu 141.142.161.57
(2) descut6.cosmology.illinois.edu 141.142.161.58
( ) desgpu1.cosmology.illinois.edu 141.142.161.18
(3) desrelease.cosmology.illinois.edu 141.142.161.51
(3) *descut3.ncsa.illinois.edu 141.142.161.48
(3) *descut2.cosmology.illinois.edu 141.142.161.54
*desgpu1 was already updated and so is skipped
Update procedure:
- Drain group (1). Update and reboot.
- Cordon groups (2), (3), and (4).
- Drain group (2). Update and reboot. Pods will primarily respawn on updated nodes in group (1).
- Uncordon group (2).
- Drain group (3). Update and reboot. Pods will primarily respawn on updated nodes in groups (1) and (2).
- Uncordon group (3).
Add desgpu2 back to the cluster
The GPU-capable node desgpu2 was offline for a while, but it is ready to rejoin the cluster.
cluster-dev
nodes:
Update and reboot without coordination.