You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Current »

Monitoring

See NDS Labs Monitoring.

Backup/Disaster Recovery

  1. GFS, Etcd "best effort" for beta
  2. Cluster config (using kubectl)
  3. Deploy tools provisioning
  • Where? AWS, BW, TACC
  • How? Some script/Job/rsync
  • When? Daily rolling
  • Q
    • Hot backup of DBs – backupz + side car
  • GFS backup options, depends on # of users
    • Snapshots + diffs
    • Checkpointing
    • Replication to another GFS/geolocation

 

Performance Testing

  • GFS
  • “iassist” redux
    • est. per-user quotas?
    • what do we need on day 1 

Open Questions

  • How many beta users?
    • what is the workload?
  • By what performance metrics do we judge pass/fail?
  • How do we learn our limits?
    • Capacity planning / monitoring
  • What happens when we need to:
    • add GFS bricks?
    • add kubernetes nodes?
  • What constitutes a failure?
    • Dead node

Capacity Planning

  • No labels