Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-692

Explore Prometheus/Nagios rules for monitoring Kubernetes

XMLWordPrintableJSON

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • Labs Workbench - Beta
    • None
    • None
    • None
    • NDS Sprint 18, NDS Sprint 19

      Things we want to know about:

      • Pod restarts/CrashLoopBackoff
      • Hung terminating (zombies)
      • Pending things
      • Backtrace from dmesg (segv, OOM)
      • Nodes in NotReady state (down Kubelet)
      • OpenStack state (error)

              willis8 Craig Willis
              willis8 Craig Willis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - 4 hours
                  4h
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 4 hours
                  4h