You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

 

Monitoring

  • Qualys (vulnerability)
    • Loadbalancer, Nginx controller
  • Nagios
    • Need to understand
    • Where? AWS, TACC, ISDA instance
    • Who gets notified?
    • When does it run
  • Kube tools/Prometheus
  • Log aggregation
  • Healthz on all services?
  • Priorities
    • Ingress - Nginx - using default backend 404
    • Web UI / API (Kube API/Etcd availability)
    • Kube system (GFS, etc)
    • Openstack
    • Backups

Backup/Disaster Recovery

  1. GFS, Etcd "best effort" for beta
  2. Cluster config (using kubectl)
  3. Deploy tools provisioning
  • Where? AWS, BW, TACC
  • How? Some script/Job/rsync
  • When? Daily rolling
  • Q
    • Hot backup of DBs – backupz + side car
  • GFS backup options, depends on # of users
    • Snapshots + diffs
    • Checkpointing
    • Replication to another GFS/geolocation

 

Performance Testing

 

Capacity Planning

  • No labels