Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-477

Explore the nuances of monitoring and alerts

    XMLWordPrintableJSON

Details

    • NDS Sprint 12

    Description

      Resulting from the discussions surrounding NDS-405, we have a slightly better idea of how we would like to approach logging, monitoring, and alerts (LMA).

      We now know:

      1. which infrastructure / services we will need to monitor - ingress, ui, api, gfs, kube-system, openstack, backups, etc
      2. we will need to run Qualys on every container running on nodes with a public IP - loadbalancer, skydns, etc
      3. that we should be running a healthz on each service to ensure things stay running smoothly

      Now we just need to explore the tools themselves (nagios, healthz, kibana, prometheus) and set up a prototype.

      This ticket is complete when we have laid out how we plan to approach logging, monitoring, and (most importantly) alerts and filed any resulting work that we deem necessary into new JIRA tickets.

      Gliffy Diagrams

        Attachments

          Issue Links

            Activity

              People

                willis8 Craig Willis
                lambert8 Sara Lambert
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  Time Tracking

                    Estimated:
                    Original Estimate - 4 hours
                    4h
                    Remaining:
                    Time Spent - 2 hours Remaining Estimate - 2 hours
                    2h
                    Logged:
                    Time Spent - 2 hours Remaining Estimate - 2 hours
                    2h

                    Tasks