Page History

...

Gliffy Diagram


name	sc16-central-feder8

...

Centralized
- Assuming that we use Girder as-is, the centralized model requires mounting each dataset filesystem via NFS/SSHFS for the initial metadata "ingest". This is only temporary and does not ingest the actual file data, but is awkward.
- We would need to use or extend the Girder API to support the remote repository request – resolving the dataset identifier (DOI, URL, URN) to the Girder folderId to get the notebook.
- Requires a user account on the Girder instance to launch notebooks at each site
- Solves the Whole Tale problem of running remote docker containers.
Federated:
- New "Data DNS" component to handle registration and resolution of IDs to sites
- New "Federate" component at each site is needed to post data to the federation/Data DNS service
- Assumes local user accounts at each site – which means users can access the datasets without the federation server, but also means that there are unique user accounts at each site. Using tmpnb, we can't have a single guest user, since there's one notebook per user?
- In this model, we could use the hub.yt infrastructure as-is, with the addition of the "federat8" component. No need to copy or mount the DarkSky dataset.
- Doesn't solve the Whole Tale problem of running remote docker containers

...

leverage existing Labs/Kubernetes API for authentication and container orchestration / access across remote sites
- etcd.go / kube.go can likely take care of talking to the necessary APIs for us, maybe needing some slight modification
- possibly extend Labs apiserver to include the functionality of delegating jobs to tmpnb and/or ToolManager agents?
- this leaves an open questions: single geodistributed kubernetes cluster? or one kubernetes cluster per site, federated across all sites ("ubernetes")?

Storyboard for Demo Presentation

...