Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-477

Explore the nuances of monitoring and alerts

XMLWordPrintableJSON

    • NDS Sprint 12

      Resulting from the discussions surrounding NDS-405, we have a slightly better idea of how we would like to approach logging, monitoring, and alerts (LMA).

      We now know:

      1. which infrastructure / services we will need to monitor - ingress, ui, api, gfs, kube-system, openstack, backups, etc
      2. we will need to run Qualys on every container running on nodes with a public IP - loadbalancer, skydns, etc
      3. that we should be running a healthz on each service to ensure things stay running smoothly

      Now we just need to explore the tools themselves (nagios, healthz, kibana, prometheus) and set up a prototype.

      This ticket is complete when we have laid out how we plan to approach logging, monitoring, and (most importantly) alerts and filed any resulting work that we deem necessary into new JIRA tickets.

              willis8 Craig Willis
              lambert8 Sara Lambert
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - 4 hours
                  4h
                  Remaining:
                  Time Spent - 2 hours Remaining Estimate - 2 hours
                  2h
                  Logged:
                  Time Spent - 2 hours Remaining Estimate - 2 hours
                  2h