...
For NDS Labs, we'll do the following:
- Evaluate using https://github.com/QuantumObject/docker-nagios
- Create Nagios server Docker image if docker-nagios is not acceptable, following the instructions in
- Create Nagios daemonset for NRPE following the instructions in
- Provision VM to run Nagios server at remote site (TACC)
- Create nagios configuration github repository to maintain versioned nagios monitoring per-cluster (starting with beta) configurations
- Configure Nagios contacts
- Configure Nagios hosts for priority systems. This includes;
- Ingress/Nginx
- Web UI/API including Kube API/Etcd availability
- Kube system (GFS, LMA tools, etc)
- OpenstackBackups
- Backups
- NOTE: nagios server will not be able to directly access cluster servers which currently live in private network without going through ingress loadbalancer. Monitoring should be direct if possible, which is addressed by NDS-581
Additionally, we will want to add health checks (healthz) to all system services.
...