...
Date/Time | What happened | How was it resolved |
---|---|---|
12/16/2019 | Disk space errors | truncated large log files |
12/4/2019 | Disk space errors | ?? likely truncated large log files |
9/9/2019 | Disk space errors lma | ?? likely truncated large log files / deleted registry cache, possibly purged log/monitoring data |
7/11/2019 | Pod exceeding restart threshold | Killed pod to reset restart count |
6/17/2019 | SDSC Maintenance | Brief network outage, then everything automatically came back up |
6/13/2019 | Disk space errors gfs4 | ?? likely truncated large log files / deleted registry cache |
6/5/2019 | Disk space errors node1 | Registry was consuming all disk on node1, likely deleted registry cache NRPE daemonset wouldn't run on all nodes. Will run on 7/8 Got it working on node1, then node2 fell off. Manually started nrpe for now. |
5/7/2019 | SDSC Maintenance | Brief network outage, then everything automatically came back up |
4/4/2019 | Disk space errors lma | ?? likely truncated large log files / deleted registry cache, possibly purged log/monitoring data |
3/29/2019 | Pod exceeding restart threshold | ?? Likely killed pod to reset restart count |
3/1/2019 | Disk space errors | Kube registry and GLFS client pods were using ~1.5GB each. MISTAKE: Deleted pods from master to clear old log files. Found that Docker doesn't actually release space from deleted resources until the daemon is restarted. Required a restart of the Docker daemon on gfs2 to resolve after deleting pods See https://github.com/moby/moby/issues/21925 In the future, truncating huge log files with the following method is preferred:
|
...