Resulting from the discussions surrounding
NDS-405, we have a slightly better idea of how we would like to approach logging, monitoring, and alerts (LMA).
We now know:
- which infrastructure / services we will need to monitor - ingress, ui, api, gfs, kube-system, openstack, backups, etc
- we will need to run Qualys on every container running on nodes with a public IP - loadbalancer, skydns, etc
- that we should be running a healthz on each service to ensure things stay running smoothly
Now we just need to explore the tools themselves (nagios, healthz, kibana, prometheus) and set up a prototype.
This ticket is complete when we have laid out how we plan to approach logging, monitoring, and (most importantly) alerts and filed any resulting work that we deem necessary into new JIRA tickets.