Attendees

  • Steve: LSST
  • Matias: DES
  • Justin: BRO
  • Rob: BD
  • Sinan: ??
  • Bill: Nebula
  • Kacper: WT
  • Ben: NDS
  • Craig: NDS
  • Mike: NDS

Terminology

  • Container: Virtualized process packaged as a Docker image.. NOTE: everything is ephemeral
  • Ephemeral: If container is killed, ephemeral data is not recoverable
  • Node: A VM running containers in your Kubernetes cluster
  • Namespace: Allow related services to be isolated from unrelated services (this helps reduce naming collisions)
  • Pod: One or more containers running in a shared network (i.e. they can access each other on localhost)
  • ReplicationControllers / ReplicaSets / Deployments: Given an integer number of replicas, ensures that the desired number of Pod replicas are running at all times
  • DaemonSet: A Deployment that is expected to be run on every node in the cluster
  • Service: A cluster-local static IP that will round-robin your requests to matching Pod(s)
  • Selector: The field used to determine whether a Service should affect a Pod
  • Ingress: A list of one or more rules routing a host + path to a Service + port and providing SSL encryption
  • Job: A Pod that is expected to terminate gracefully (i.e. backup, batch processes, etc)
  • CronJob: A Job that is run on a predetermined schedule
  • Volumes: Non-ephemeral data within the container (i.e. will persist after Pod is deleted)
  • HostPath: A volume mounted into a container directly from the host FS

Features

  • In-cluster DNS and service discovery
  • Service discovery via environment variables (limited by namespace)
  • Resilience through multiple replicas
  • Load balancing through round-robin of multiple replicas
  • Ingress Loadbalancer (reverse proxy) will map hostname/path to service/port
  • Several strategies for mounting volumes into Pods, some of which are cloud-dependent
  • Networking is handled by external framework like Flannel or Calico

Ideas for Workbench

  • UI / API for job submission... make this friendlier?
    • DES has a nice interface for Celery job submission
  • Why not use NFS instead of GLFS?
    • DES uses PersistentVolumes backed by read-only NFS
  • Allow for SMB mounts on a per-user basis to cover permissions?
  • Use DaemonSet for pre-pulling images on all nodes
  • Allow user to launch jobs directly from within containers? (i.e. HTCondor, Sparq, etc)
  • Auth + RBAC + Kubedash > our current UI... discuss this
  • Gluster client / kubectl security... can users just install gluster client and mount anything they want?
    • Look into volume security - our current GLFS is likely very insecure
  • UID mapping - not running as root vs actual proper permissions
    • Running as a "real user" does not necessarily adress all security concerns
  • Image security - how do we determine whether an image is trusted?
    • docker history to see the layers involved in building the image?
    • public data vs secret data... public data likely leads to lax security
  • General security: protecting infrastructure vs protecting data
    • How will these security protocols affect performance
    • The old "Anyone can access etcd from anywhere" problem
      • Flannel vs Calico - supposedly calico has better network isolation features
  • Private registry doesn't seem to come with kubespray
    • it would be nice to have a place to push private images
  • OpenShift as a replacement for DES Labs?
  • Swarm vs Kubernetes
    • Kubernetes is a better "nanny" when it comes to watching services
  • Minio allows users to pull directly from S3
    • This would be more secure and likely less maintainence than an NFS-like approach
  • How to manage/limit user kubectl access?
    • Deploying for multi-tenancy is a pain
  • No labels