...
- Build some kind of quasi-auth scheme (similar to ndslabs) on top of the existing ToolManager
- Inherit Girder's auth scheme and solve the problem of sharing these "users" between sites
- Create a "guest" user at each site and use that to launch tools from remote sources
- NOTE: tmpnb only allows one notebook per user (per folder?), so anyone launching remotely would be sharing a notebook
- this is undesirable, as ideally each request would launch a separate instance
- lingering question: how do we get you back to the notebook if you lose the link? how do we know which notebook is yours?
Inclinations: SC16 Demo
- transfer (if necessary) each dataset to existing cloud architecture - in progress?
- discover mount points for each large dataset within existing cloud architecture - in progress?
- spin up a Docker-enabled host and mount nearby datasets (NFS, direct mount, etc.) - in progress?
- using docker-compose, bring up provided girder-dev on each Docker host - pending
- extend existing ToolManager to receive site metadata - in progress
- modify girder-dev to POST site metadata on startup - in progress
- extend existing ToolManager to delegate tmpnb jobs to remote instances of Girder
- wrap existing ToolManager in a simple auth mechanism
- could we possibly import existing users from Girder using their API? probably not, due to security
- we could callback to Girder when sites push their metadata (assuming this can be done as Girder comes online)
- run a centralized ToolManager instance on Nebula for the purposes of the demo
- modify existing ToolManager UI to list off collections in connected Girder instance
- Add a "Launch Notebook" button next to each dataset where no notebook is running
- Add a "Stop Notebook" button next to each dataset where a notebook has been launched
Using the above we would be able to show:
- Searching for data on compute-enabled systems (albeit in a list of only 3 datasets registered in the system), possibly linking back to the original data source
- Launch a Jupyter notebook next to each remote dataset without explicitly navigating to where that data is stored (i.e. the Girder UI)
- How to bring this same stack up next to your data to make it searchable in our UI (we could even demonstrate this live, if it goes smoothly enough)
Inclinations: As a Long-Term service
- leverage existing Labs/Kubernetes API for authentication and container orchestration / access across remote sites
- etcd.go / kube.go can likely take care of talking to the necessary APIs for us, maybe needing some slight modification
- possibly extend Labs apiserver to include the functionality of delegating jobs to tmpnb and/or ToolManager agents?
- this leaves an open questions: single geodistributed kubernetes cluster? or one kubernetes cluster per site, federated across all sites ("ubernetes")?
Storyboard for Demo Presentation
...