Loading...

XML

Word

Printable

JSON

Type: Story
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: Labs Workbench - Beta
Component/s: Infrastructure, Workbench Beta
Labels:
None

Sprint:
NDS Sprint 42, NDS Sprint 43

The Workbench public beta has been online for almost a year at SDSC. We've made some considerable changes that would allow us to cut down on the resource footprint created by this cluster:

shrank required number of GLFS node replicas from 4 to 2
nodes can now have multiple labels to make more efficient use of resources

The following resources are currently provisioned at SDSC:

Instance Name	Flavor	vCPUs (M)	RAM Size (GB)	Root Disk Size (GB)
workbench-master1	m1.large	2	8	20
workbench-loadbal	m1.large	2	8	20
workbench-lma	r1.xlarge	4	32	20
workbench-node1	r1.xlarge	4	32	20
workbench-node2	r1.xlarge	4	32	20
workbench-gfs1	r1.large	2	16	20
workbench-gfs2	r1.large	2	16	20
workbench-gfs3	r1.large	2	16	20
workbench-gfs4	r1.large	2	16	20
Total	—	24	176	180

Planned changes:

Cut 4 dedicated GLFS instances down to 2
Remove dedicated LMA node (since it's only running grafana)

After reprovision:

Instance Name	Flavor	vCPUs (M)	RAM Size (GB)	Root Disk Size (GB)
workbench-master1	m1.large	2	8	20
workbench-loadbal	m1.large	2	8	20
workbench-node1	r1.xlarge	4	32	20
workbench-node2	r1.xlarge	4	32	20
workbench-gfs1	r1.large	2	16	20
workbench-gfs2	r1.large	2	16	20
Total	—	16	112	120

For discussion:

Remove dedicated loadbal node?
Resize compute nodes?
Dedicated etcd node?
Are there any other considerations that might have been missed?

Assume purge GLFS and running stacks. See also https://opensource.ncsa.illinois.edu/confluence/display/NDS/Beta+release+communication

Completion criteria:

Send downtime announcement
Backup etcd (ndslabs user data)
Deploy new cluster based on above specs
Integration tests pass
Send resume announcement
Tear down old cluster

This ticket is complete when the above reprovision has been discussed and executed.

depends on

NDS-1125 API server errors and restarts while trying to shutdown inactive service

Open

NDS-1133 Stack trace starting standard Docker application

Open

NDS-1168 API server stack trace + crash + restart when starting toolmanager

Open

NDS-1212 Clicking endpoint link creates onslaught of check_token calls

Open

NDS-1130 Frequent 500 errors from /accounts/{account-id}

Open

NDS-1213 Flannel subnet changes break networking

Resolved

NDS-998 Intermittent etcd timeouts from apiserver

Resolved

NDS-1199 MTU problems deploying at SDSC

Resolved

NDS-1200 Deploy tools bug with conditional register

Resolved

NDS-1201 Deploy tools has wrong ingress configuration

Resolved

duplicates

NDS-1173 Deploy 1.1 to beta

Closed

is related to

NDS-833 Fix problem with flavors when combining node function

Open

mentioned in: Page Loading...

(5 depends on, 1 duplicates, 1 is related to, 1 mentioned in)

Assignee:: Sara Lambert

Reporter:: Sara Lambert

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 25/Oct/17 4:31 PM

Updated:: 30/Apr/18 1:03 PM

Details

Description

Gliffy Diagrams

Attachments

Issue Links

Activity

People

Dates

Tasks