-
Task
-
Resolution: Won't Do
-
Normal
-
None
-
Labs Workbench - Beta
-
None
Running production cluster should have abillity to be gracefully shut-down and restarted with a fully operational return to services as before without loss of user or system data, services, applications. This supports planned IAAS service upgrades and maintenance.
- Determine shutdown order of user stacks, nds services, system services, and nodes.
- Develop a script to bring cluster down cleanly, saving any necessary data needed for restart
- Implement the shutdown procedure as ansible playbook or script
Related to rolling updates (NDS-346), as services/stacks that are not single-node reboot survivable are likely to have same or similar issues with full-system down/up