2022/05/23 Update plan

Groups of cluster-prod nodes:

(1) descut.cosmology.illinois.edu
(1) descut4.cosmology.illinois.edu
(2) descut5.cosmology.illinois.edu
(2) descut6.cosmology.illinois.edu
( ) desgpu1.cosmology.illinois.edu
(3) desrelease.cosmology.illinois.edu
(3) *descut3.ncsa.illinois.edu 
(3) *descut2.cosmology.illinois.edu

*desgpu1 was already updated and so is skipped

Update procedure:

  1. Drain group (1). Update and reboot.
  2. Cordon groups (2), (3), and (4).
  3. Drain group (2). Update and reboot. Pods will primarily respawn on updated nodes in group (1).
  4. Uncordon group (2).
  5. Drain group (3). Update and reboot. Pods will primarily respawn on updated nodes in groups (1) and (2).
  6. Uncordon group (3).

Add desgpu2 back to the cluster

The GPU-capable node desgpu2 was offline for a while, but it is ready to rejoin the cluster.

cluster-dev nodes:

Update and reboot without coordination.