Building on Gluster Alternatives and Cloud Provider Alternatives but with the Whole Tale requirements.
Requirements
Use case: A user creates a new tale based on an existing dataset (readonly). The notebook uses the data to produce new outputs. This is published as a new tale (ideally the notebook is part of the permanent data for the tale, captured in the workspace).
From the Whole Tale project:
- Shared home directory
- When capturing a tale, want an exact copy of the home directory at instance in time
- Relates to provenance, capturing current state, can be published
- Fast
- POSIX?
- Versioning:
- Conceptually similar to object stores – when you modify a file, you create a new version while potentially maintaining the old one
- Relates to reproducibility, allowing pointers to immutable versions of data
- Mountable anywhere
- sshfs
- Notifications: May be implemented in Fuse
Options
- minio
- NFS
- GlusterFS
- BRTFS/ZFS
- DataExpLab using ZFS for testing
- TACC Corral (GPFS)
Block v object storage
Notes
POSIX
Object storage
- Limited access functions (PUT/GET/DELETE/HOST/HEAD)
Flocker:
- Documentation remains unavailable (https://clusterhq.com/flocker/introduction/)
Ceph
- Used by SDSC OpenStack
- Gluster v Ceph
- Ceph= object store
- Gluster = scale-out NAS and object store
- Both scale out linearly
- More Ceph v Gluster
- Gluster performs better at higher scales
- Majority of OpenStack implementations use Ceph
- Gluster is classic file-serving, second-tier storage
- Gluster = file storage with object capabilities; Ceph = object storage with block/file capabilities
Rook
- https://github.com/rook/rook
- Distributed storage orchestration for Kubernetes (1.6+) based on Ceph;
minio
- S3 compatible API
- Used by Deis
- Example built using Gluster...
NFS
- Single point of failure
- NFSv2 and NFSv3 have host-based authentication (1). Access control through host and file/directory permissions only.
- NFSv4 has improved security via Kerberos and ACLs
- NFS Ganesha (user level NFS server)
GlusterFS
- Parallel network file storage system
- Good for large static files; immutable files
- Bad for lots of small files; resulting in split brain;
- More complex backup/restore
- Performance degradation under certain load scenarios
- Hard to administer (see Nebula)
- Network authentication, POSIX ACLs
- Version 3.7 supports NFSv4 and pNFS
BTRFS
- Copy on write filesystem for Linux
- Used in one example with Minio
Luster
Other
- Container Storage Interface
- https://github.com/cncf/landscape
References