Overview

This is a page to house the results of the manual load testing done on the NDS Labs Workbench (Beta)

Objective

  1. Generate load on the system for a given number of users
  2. Monitor the system's resource utilization using Grafana
    • This will give us a benchmark of the expected "load" on the cluster
  3. Take user feedback regarding general usability of the system under the desired load conditions
    • This will let us know if user performance has degraded due to any stress on the system
  4. Take note of how any node additions / removals affect resource constraint, and to what degree
    • If the system's resources become constrained, add a node to the cluster alleviating the resource constraint
    • If the system is far over-saturated with resources, remove a node from the cluster to simulate a downed node

Current Cluster Configuration

See inventory at: https://github.com/nds-org/ndslabs-deploy-tools/commit/d8d8ef30dac74b1fe84185c7abc6136516d60e7b

Resulting Actions

1 hour group testing
1 hour writing new issues

Phase 1: Labs Workbench + Management

Workbench Version

  • 1.0.5

Participants

  • Craig Willis
  • David Raila
  • Mike Lambert

Measurement Utilities

Results

Prognosis

So far, aside from a few minor issues, everything is running super smoothly.

Peak usage was measured at:

  • 6% cluster memory usage
  • 3-4% cluster CPU usage

Nearly every service possible was started at some point during 2-ish hours of testing, and only 2 or 3 services encountered the notorious "no data available" problem:

  • pyCharm
  • Jenkins

Overall, this is fantastic news for the stability of the platform. The testing has brought to light several issues that will need to be addressed

Resulting Actions

Higher priority:

  • NDS-464 - Getting issue details... STATUS
  • NDS-640 - Getting issue details... STATUS
  • NDS-621 - Getting issue details... STATUS
  • NDS-648 - Getting issue details... STATUS
  • NDS-173 - Getting issue details... STATUS

 

Lower priority:

  • NDS-646 - Getting issue details... STATUS
  • NDS-647 - Getting issue details... STATUS
  • NDS-645 - Getting issue details... STATUS
  • NDS-644 - Getting issue details... STATUS
  • NDS-649 - Getting issue details... STATUS

Phase 2: Bug Party

Workbench Version

  • 1.0.6

Participants

  • Craig Willis
  • David Raila
  • Mike Lambert
  • Michal
  • Jing
  • Sandeep
  • Qiyue
  • Marcus

Measurement Utilities

Results

  • Michal: No indication of which fields are required for registration
    • New ticket:  NDS-661 - Getting issue details... STATUS
  • Michal: Needs to know what they are doing (i.e., Quickstart)
    • See  NDS-485 - Getting issue details... STATUS
  • David: Recommend whitelisting our site for / disabling pop-ups - can we detect this and make a recommendation to users without correct settings?
    • New ticket:  NDS-662 - Getting issue details... STATUS
  • Michal: couldn't sign up for DSpace - address in use
    • This is a more general problem with any service that generates admin credentials... user should be directed to the Config page
    • See  NDS-560 - Getting issue details... STATUS
  • Jing: Docker image name validation is incomplete
    • Underscore should be among accepted characters
    • New ticket:  NDS-663 - Getting issue details... STATUS
  • Jing: No indication of required fields during spec create?
    • See  NDS-661 - Getting issue details... STATUS
  • Mike: Saw a failure adding Sufia, only one time... next time it added properly
    • Was not able to reliably recreate, and no error message given... will file a ticket if I see it again
  • Jing: Custom service failed to start
  • Qiyue: No indication of which fields are required for registration
    • See  NDS-661 - Getting issue details... STATUS
  • Qiyue: How do we use different versions... for example: Cloud9 Java7 vs Cloud9 Java8
    • New ticket:  NDS-665 - Getting issue details... STATUS
  • Qiyue: What is the storage quota? 20GB
    • See  NDS-201 - Getting issue details... STATUS
  • Jing: Redis is missing an info link
  • Marcus: NDS Confluence went down, as a result icons could not load
    • See  NDS-591 - Getting issue details... STATUS
  • Jing: Error messages are confusing - need to translate the error messages (or document them)
    • New ticket:  NDS-666 - Getting issue details... STATUS
  • Michal: would it be better to have a pre-populated instances?
    • This would be nice, but may be difficult to handle programmatically in a general way
  • Qiyue: Any plans to support Fortran?
    • New ticket:  NDS-667 - Getting issue details... STATUS
  • Mike: Kibana caused the following nagios alerts to come from the LMA node:
    • "workbench-lma/Load is WARNING:"
    • "WARNING - load average: 8.94, 7.92, 6.52"
    • New ticket:  NDS-668 - Getting issue details... STATUS
  • Jing: Order of top menu – Catalog then Applications?
    • New ticket:  NDS-669 - Getting issue details... STATUS
  • Michal: Can I use this framework to compare montecarlo simulations?
    • See  NDS-664 - Getting issue details... STATUS
  • David: Green/red bars are too big or other parts of application UI are too small.
    • I would be happy to look over any UI mockups that you would be willing to provide
  • David: Stopped "X" is confusing – thought it was delete
    • New ticket:  NDS-670 - Getting issue details... STATUS
  • Sandeep: Better way of differentiating user versus system specs (little icon isn't readily apparent)
  • Sandeep: Help pages as Wiki isn't great – should be part of application
    • See  NDS-485 - Getting issue details... STATUS
  • Marcus: Not sure what to do (quickstart/tutorial)
    • See  NDS-485 - Getting issue details... STATUS
  • Marcus: Documentation isn't clear
    • See  NDS-485 - Getting issue details... STATUS
  • Marcus: Can I use this to launch Jupyter notebooks for BrownDog users?
    • Labs Workbench is more for testing and development - publically-accessible services with real users are highly discouraged
    • That being said, if users did want to use Workbench to spin up personal notebook for their own private analysis, that would be highly encouraged
  • Craig: iRODS problems (multiple volumes; CloudBrowser Zone)
    • See  NDS-654 - Getting issue details... STATUS
  • Craig: Multiple port problem
    • See  NDS-655 - Getting issue details... STATUS

Prognosis

Aside from a slew of UX problems, the platform itself performed rather well!

Usage from 8 users peaked at:

  • ~10% Memory
  • ~6% CPU

This means that we should be able to easily support our target of 50 users.

Optimistically, assuming that gluster doesn't fall over and that our usage scales fairly linearly with increasing users, these results mean that we might be able to support upward of 60 or 70 users simultaneously using the Beta cluster without needing to resize it.

Resulting Actions

  • NDS-201 - Getting issue details... STATUS
  • NDS-560 - Getting issue details... STATUS
  • NDS-591 - Getting issue details... STATUS
  • NDS-485 - Getting issue details... STATUS
  • NDS-654 - Getting issue details... STATUS
  • NDS-655 - Getting issue details... STATUS
  • NDS-661 - Getting issue details... STATUS
  • NDS-662 - Getting issue details... STATUS
  • NDS-663 - Getting issue details... STATUS
  • NDS-664 - Getting issue details... STATUS
  • NDS-665 - Getting issue details... STATUS
  • NDS-666 - Getting issue details... STATUS
  • NDS-667 - Getting issue details... STATUS
  • NDS-668 - Getting issue details... STATUS
  • NDS-669 - Getting issue details... STATUS
  • NDS-670 - Getting issue details... STATUS
  • No labels