Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-477

Explore the nuances of monitoring and alerts

    XMLWordPrintableJSON

Details

    • NDS Sprint 12

    Description

      Resulting from the discussions surrounding NDS-405, we have a slightly better idea of how we would like to approach logging, monitoring, and alerts (LMA).

      We now know:

      1. which infrastructure / services we will need to monitor - ingress, ui, api, gfs, kube-system, openstack, backups, etc
      2. we will need to run Qualys on every container running on nodes with a public IP - loadbalancer, skydns, etc
      3. that we should be running a healthz on each service to ensure things stay running smoothly

      Now we just need to explore the tools themselves (nagios, healthz, kibana, prometheus) and set up a prototype.

      This ticket is complete when we have laid out how we plan to approach logging, monitoring, and (most importantly) alerts and filed any resulting work that we deem necessary into new JIRA tickets.

      Gliffy Diagrams

        Attachments

          Activity

            People

              willis8 Craig Willis
              lambert8 Sara Lambert
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 4 hours
                  4h
                  Remaining:
                  Time Spent - 2 hours Remaining Estimate - 2 hours
                  2h
                  Logged:
                  Time Spent - 2 hours Remaining Estimate - 2 hours
                  2h

                  Tasks