Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-1071

Reprovisioning the Workbench public beta

    XMLWordPrintableJSON

Details

    • Story
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • Labs Workbench - Beta
    • None
    • None
    • NDS Sprint 42, NDS Sprint 43

    Description

      The Workbench public beta has been online for almost a year at SDSC. We've made some considerable changes that would allow us to cut down on the resource footprint created by this cluster:

      • shrank required number of GLFS node replicas from 4 to 2
      • nodes can now have multiple labels to make more efficient use of resources

      The following resources are currently provisioned at SDSC:

      Instance Name Flavor vCPUs (M) RAM Size (GB) Root Disk Size (GB)
      workbench-master1 m1.large 2 8 20
      workbench-loadbal m1.large 2 8 20
      workbench-lma r1.xlarge 4 32 20
      workbench-node1 r1.xlarge 4 32 20
      workbench-node2 r1.xlarge 4 32 20
      workbench-gfs1 r1.large 2 16 20
      workbench-gfs2 r1.large 2 16 20
      workbench-gfs3 r1.large 2 16 20
      workbench-gfs4 r1.large 2 16 20
      Total 24 176 180

      Planned changes:

      • Cut 4 dedicated GLFS instances down to 2
      • Remove dedicated LMA node (since it's only running grafana)

      After reprovision:

      Instance Name Flavor vCPUs (M) RAM Size (GB) Root Disk Size (GB)
      workbench-master1 m1.large 2 8 20
      workbench-loadbal m1.large 2 8 20
      workbench-node1 r1.xlarge 4 32 20
      workbench-node2 r1.xlarge 4 32 20
      workbench-gfs1 r1.large 2 16 20
      workbench-gfs2 r1.large 2 16 20
      Total 16 112 120

      For discussion:

      • Remove dedicated loadbal node?
      • Resize compute nodes?
      • Dedicated etcd node?
      • Are there any other considerations that might have been missed?

      Assume purge GLFS and running stacks. See also https://opensource.ncsa.illinois.edu/confluence/display/NDS/Beta+release+communication

      Completion criteria:

      • Send downtime announcement
      • Backup etcd (ndslabs user data)
      • Deploy new cluster based on above specs
      • Integration tests pass
      • Send resume announcement
      • Tear down old cluster

      This ticket is complete when the above reprovision has been discussed and executed.

      Gliffy Diagrams

        Attachments

          Issue Links

            Activity

              People

                lambert8 Michael Lambert
                lambert8 Michael Lambert
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                  Created:
                  Updated:

                  Tasks