Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-630

Change NRPE config to support custom limits

XMLWordPrintableJSON

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Normal Normal
    • Labs Workbench - Beta
    • None
    • None
    • None
    • NDS Sprint 16

      Starting monitoring on the beta cluster, Nagios is complaining about too many processes on the compute nodes. By default, Nagios NRPE warns at 150 and reports critical at 200. Each of these nodes has >200 total processes.

      On node1, 244 are kernel processes such as kworker (100), ksoftirqd (24), cpuhp, migration, watchdog. For these big nodes,

      ***** Nagios *****
      Notification Type: PROBLEM
      Service: Total processes
      Host: workbench-node2
      Address: 141.142.210.100
      State: CRITICAL
      Date/Time: Sat Oct 8 19:45:26 UTC 2016
      Additional Info:
      PROCS CRITICAL: 281 processes

      Temporary workaround: disable total_proc monitoring, which probably isn't good.

              willis8 Craig Willis
              willis8 Craig Willis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - 4 hours
                  4h
                  Remaining:
                  Time Spent - 3 hours Remaining Estimate - 1 hour
                  1h
                  Logged:
                  Time Spent - 3 hours Remaining Estimate - 1 hour
                  3h