2022/05/23 Update plan

Groups of cluster-prod nodes:

(1) descut.cosmology.illinois.edu        141.142.161.46
(1) descut4.cosmology.illinois.edu       141.142.161.56
(2) descut5.cosmology.illinois.edu       141.142.161.57
(2) descut6.cosmology.illinois.edu       141.142.161.58
( ) desgpu1.cosmology.illinois.edu       141.142.161.18
(3) desrelease.cosmology.illinois.edu    141.142.161.51
(3) *descut3.ncsa.illinois.edu           141.142.161.48
(3) *descut2.cosmology.illinois.edu      141.142.161.54

*desgpu1 was already updated and so is skipped

Update procedure:

  1. Drain group (1). Update and reboot.
  2. Cordon groups (2), (3), and (4).
  3. Drain group (2). Update and reboot. Pods will primarily respawn on updated nodes in group (1).
  4. Uncordon group (2).
  5. Drain group (3). Update and reboot. Pods will primarily respawn on updated nodes in groups (1) and (2).
  6. Uncordon group (3).

Add desgpu2 back to the cluster

The GPU-capable node desgpu2 was offline for a while, but it is ready to rejoin the cluster.

cluster-dev nodes:

Update and reboot without coordination.