-
Bug
-
Resolution: Fixed
-
Normal
-
None
-
None
-
None
-
NDS Sprint 16
Starting monitoring on the beta cluster, Nagios is complaining about too many processes on the compute nodes. By default, Nagios NRPE warns at 150 and reports critical at 200. Each of these nodes has >200 total processes.
On node1, 244 are kernel processes such as kworker (100), ksoftirqd (24), cpuhp, migration, watchdog. For these big nodes,
***** Nagios *****
|
Notification Type: PROBLEM
|
Service: Total processes
|
Host: workbench-node2
|
Address: 141.142.210.100
|
State: CRITICAL
|
Date/Time: Sat Oct 8 19:45:26 UTC 2016
|
Additional Info:
|
PROCS CRITICAL: 281 processes
|
Temporary workaround: disable total_proc monitoring, which probably isn't good.