Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-1071

Reprovisioning the Workbench public beta

XMLWordPrintableJSON

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • Labs Workbench - Beta
    • None
    • NDS Sprint 42, NDS Sprint 43

      The Workbench public beta has been online for almost a year at SDSC. We've made some considerable changes that would allow us to cut down on the resource footprint created by this cluster:

      • shrank required number of GLFS node replicas from 4 to 2
      • nodes can now have multiple labels to make more efficient use of resources

      The following resources are currently provisioned at SDSC:

      Instance Name Flavor vCPUs (M) RAM Size (GB) Root Disk Size (GB)
      workbench-master1 m1.large 2 8 20
      workbench-loadbal m1.large 2 8 20
      workbench-lma r1.xlarge 4 32 20
      workbench-node1 r1.xlarge 4 32 20
      workbench-node2 r1.xlarge 4 32 20
      workbench-gfs1 r1.large 2 16 20
      workbench-gfs2 r1.large 2 16 20
      workbench-gfs3 r1.large 2 16 20
      workbench-gfs4 r1.large 2 16 20
      Total 24 176 180

      Planned changes:

      • Cut 4 dedicated GLFS instances down to 2
      • Remove dedicated LMA node (since it's only running grafana)

      After reprovision:

      Instance Name Flavor vCPUs (M) RAM Size (GB) Root Disk Size (GB)
      workbench-master1 m1.large 2 8 20
      workbench-loadbal m1.large 2 8 20
      workbench-node1 r1.xlarge 4 32 20
      workbench-node2 r1.xlarge 4 32 20
      workbench-gfs1 r1.large 2 16 20
      workbench-gfs2 r1.large 2 16 20
      Total 16 112 120

      For discussion:

      • Remove dedicated loadbal node?
      • Resize compute nodes?
      • Dedicated etcd node?
      • Are there any other considerations that might have been missed?

      Assume purge GLFS and running stacks. See also https://opensource.ncsa.illinois.edu/confluence/display/NDS/Beta+release+communication

      Completion criteria:

      • Send downtime announcement
      • Backup etcd (ndslabs user data)
      • Deploy new cluster based on above specs
      • Integration tests pass
      • Send resume announcement
      • Tear down old cluster

      This ticket is complete when the above reprovision has been discussed and executed.

              lambert8 Sara Lambert
              lambert8 Sara Lambert
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: