Page History

Table of Contents

Monitoring

Qualys (vulnerability)
- Loadbalancer, Nginx controller
Nagios
- Need to understand
- Where? AWS, TACC, ISDA instance
- Who gets notified?
- When does it run
Kube tools/Prometheus
Log aggregation
Healthz on all services?
Priorities
- Ingress - Nginx - using default backend 404
- Web UI / API (Kube API/Etcd availability)
- Kube system (GFS, etc)
- Openstack
- Backups

...

Where? AWS, BW, TACC
How? Some script/Job/rsync
When? Daily rolling
Q
- Hot backup of DBs – backupz + side car
GFS backup options, depends on # of users
- Snapshots + diffs
- Checkpointing
- Replication to another GFS/geolocation

...