Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: Labs Workbench - Beta
Component/s: EarthCube Workbench, iSchool Workbench, TERRAREF Workbench
Labels:
None

Sprint:
NDS Sprint 30

We saw this with ETK2017 and (later) the EarthCube Workbench instance: filling up the disk is the worst thing that can ever happen to one of these clusters, and the most difficult to recover from without a full reprovision/migration.

We should discuss strategies for deployment that will prevent us from forgetting to mount such a volume.

We should also make sure that /ndslabs/data is housed on this new volume as well.

Ultimately we should at least do one or more of the following:

Formalize and thoroughly document the single-node deployment via ndslabs-startup, automating whenever possible, to prevent us from accidentally skipping steps
Think of a way to deploy a gluster-free cluster, consisting of a single master and a single compute node
1. Attempt to use deploy-tools as-is with a gluster-less inventory to produce
2. Write a new playbook to deploy such a cluster

The big checklist of things to include in the documentation:

Request / verify TLS certs
Verify MTU settings
Verify volume sizes
- ALWAYS attach a large data volume for /var/lib/docker and cluster data
Import or mount existing data / configuration
Verify NGINX config
- for example, max body size
Ensure that custom default backend is deployed (Ansible does not do this yet)
Enable NAGIOS / LMA
Disable Logging via ElasticSearch
Double-check node labels
Create accounts
Double check that a basic auth secret exists for all accounts
Verify service specs
- for example, is everything present? are sensible limits set for all specs?
Cache service images
Smoke test
Write some tutorials / examples for usage of the new instance
- for example, is there anything new that is specific to this instance? new services, new features, etc

This ticket is complete when the above has been discussed, formalized, presented in a digestible fashion, and automated wherever possible. This ticket might need to be broken down into smaller tasks.

is related to

NDS-985 Single-node installations flake out in high-load scenarios

Resolved

mentioned in: Page Loading...; Page Loading...

Assignee:: Sara Lambert

Reporter:: Sara Lambert

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 03/Aug/17 11:37 AM

Updated:: 25/Aug/17 4:15 PM

Resolved:: 25/Aug/17 4:15 PM

Estimated:

1d 4h

Remaining:

1d 3h 30m

Logged:

30m

Details

Description

Gliffy Diagrams

Attachments

Issue Links

Activity

People

Dates

Time Tracking

Tasks