-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
Labs Workbench - Beta
-
None
-
NDS Sprint 42, NDS Sprint 43
The Workbench public beta has been online for almost a year at SDSC. We've made some considerable changes that would allow us to cut down on the resource footprint created by this cluster:
- shrank required number of GLFS node replicas from 4 to 2
- nodes can now have multiple labels to make more efficient use of resources
The following resources are currently provisioned at SDSC:
Instance Name | Flavor | vCPUs (M) | RAM Size (GB) | Root Disk Size (GB) |
---|---|---|---|---|
workbench-master1 | m1.large | 2 | 8 | 20 |
workbench-loadbal | m1.large | 2 | 8 | 20 |
workbench-lma | r1.xlarge | 4 | 32 | 20 |
workbench-node1 | r1.xlarge | 4 | 32 | 20 |
workbench-node2 | r1.xlarge | 4 | 32 | 20 |
workbench-gfs1 | r1.large | 2 | 16 | 20 |
workbench-gfs2 | r1.large | 2 | 16 | 20 |
workbench-gfs3 | r1.large | 2 | 16 | 20 |
workbench-gfs4 | r1.large | 2 | 16 | 20 |
Total | — | 24 | 176 | 180 |
Planned changes:
- Cut 4 dedicated GLFS instances down to 2
- Remove dedicated LMA node (since it's only running grafana)
After reprovision:
Instance Name | Flavor | vCPUs (M) | RAM Size (GB) | Root Disk Size (GB) |
---|---|---|---|---|
workbench-master1 | m1.large | 2 | 8 | 20 |
workbench-loadbal | m1.large | 2 | 8 | 20 |
workbench-node1 | r1.xlarge | 4 | 32 | 20 |
workbench-node2 | r1.xlarge | 4 | 32 | 20 |
workbench-gfs1 | r1.large | 2 | 16 | 20 |
workbench-gfs2 | r1.large | 2 | 16 | 20 |
Total | — | 16 | 112 | 120 |
For discussion:
- Remove dedicated loadbal node?
- Resize compute nodes?
- Dedicated etcd node?
- Are there any other considerations that might have been missed?
Assume purge GLFS and running stacks. See also https://opensource.ncsa.illinois.edu/confluence/display/NDS/Beta+release+communication
Completion criteria:
- Send downtime announcement
- Backup etcd (ndslabs user data)
- Deploy new cluster based on above specs
- Integration tests pass
- Send resume announcement
- Tear down old cluster
This ticket is complete when the above reprovision has been discussed and executed.
- depends on
-
NDS-1125 API server errors and restarts while trying to shutdown inactive service
- Open
-
NDS-1133 Stack trace starting standard Docker application
- Open
-
NDS-1168 API server stack trace + crash + restart when starting toolmanager
- Open
-
NDS-1212 Clicking endpoint link creates onslaught of check_token calls
- Open
-
NDS-1130 Frequent 500 errors from /accounts/{account-id}
- Open
-
NDS-1213 Flannel subnet changes break networking
- Resolved
-
NDS-998 Intermittent etcd timeouts from apiserver
- Resolved
-
NDS-1199 MTU problems deploying at SDSC
- Resolved
-
NDS-1200 Deploy tools bug with conditional register
- Resolved
-
NDS-1201 Deploy tools has wrong ingress configuration
- Resolved
- duplicates
-
NDS-1173 Deploy 1.1 to beta
- Closed
- is related to
-
NDS-833 Fix problem with flavors when combining node function
- Open
- mentioned in
-
Page Loading...