Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Date/TimeWhat happenedHow was it resolved
9/9/2019Disk space errors
lma

??

likely truncated large log files / deleted registry cache, possibly purged log/monitoring data

7/11/2019Pod exceeding restart threshold Killed pod to reset restart count
6/17/2019SDSC MaintenanceBrief network outage, then everything automatically came back up
6/13/2019Disk space errors
gfs4

??

likely truncated large log files / deleted registry cache

6/5/2019Disk space errors
node1

Registry was consuming all disk on node1, likely deleted registry cache

NRPE daemonset wouldn't run on all nodes. Will run on 7/8

Got it working on node1, then node2 fell off.

Manually started nrpe for now.

5/7/2019SDSC MaintenanceBrief network outage, then everything automatically came back up
4/4/2019Disk space errors
lma

??

likely truncated large log files / deleted registry cache, possibly purged log/monitoring data

3/29/2019Pod exceeding restart threshold ?? Likely killed pod to reset restart count
3/1/2019

Disk space errors
gfs2

Kube registry and GLFS client pods were using ~1.5GB each.

MISTAKE: Deleted pods from master to clear old log files.

Found that Docker doesn't actually release space from deleted resources until the daemon is restarted.

Required a restart of the Docker daemon on gfs2 to resolve after deleting pods

See https://github.com/moby/moby/issues/21925

In the future, truncating huge log files with the following method is preferred:

echo " " > big-log-file.json

...