Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Qualys is used by NCSA IT for vulnerability assessment and management.  Qualys will require SSH access to any public-facing host or service.  This will likely mean the loadbalancer host and Nginx ingress controller container.

  • Create SSH keypair
  • Open SSH access to NCSA Qualys server (IP)
  • Create non-root user
  • Install Qualys client?

Nagios

  • Nagios
    • Need to understand
    • Where? AWS, TACC, ISDA instance
    • Who gets notified?
    • When does it run

Features:

...

 

NCSA Security has opened a ticket for this: https://jira.ncsa.illinois.edu/browse/SECOPS-340. We need to:

  • Provide a list of IPs that we want scanned (in general they try to scan one system of each type)
  • Security will provide SSH public key to use to login to local qualys user account.
  • Instructions for setting up Qualys user: https://wiki.ncsa.illinois.edu/pages/viewpage.action?pageId=41461115
  • Provide email address for reports.
  • We will also need to do this to public-facing containers (e.g., Nginx controller)

Nagios

Nagios is an open source monitoring system. In general, the Nagios server is installed in one location and the Nagios Remote Plugin Executor (NRPE) on each node to be monitored. Nagios provides public service monitoring through standard plugins (e.g., DNS, HTTP, SMTP, etc).  It provides private service monitoring throug NRPE

...

(CPU, memory, disk,

...

etc).

For NDS Labs, we'll do the following:

 

Kube tools/Prometheus

 

  • Kube tools/Prometheus
  • Log aggregation
  • Healthz on all services?
  • Priorities
    • Ingress - Nginx - using default backend 404
    • Web UI/API (including Kube API/Etcd availability)
    • Kube system (GFS, LMA tools, etc)
    • OpenstackBackups
    • Backups 

Additionally, we will want to add health checks (healthz) to all system services.

Usage monitoring

We will use the Kubernetes addons, specifically ELK and Grafana, to monitor usage during the beta period.

Backup/Disaster Recovery

  1. GFS, Etcd "best effort" for beta
  2. Cluster config (using kubectl)
  3. Deploy tools provisioning

...