Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Build some kind of quasi-auth scheme (similar to ndslabs) on top of the existing ToolManager
  2. Inherit Girder's auth scheme and solve the problem of sharing these "users" between sites
  3. Create a "guest" user at each site and use that to launch tools from remote sources
    • NOTE: tmpnb only allows one notebook per user (per folder?), so anyone launching remotely would be sharing a notebook
    • this is undesirable, as ideally each request would launch a separate instance
    • lingering question: how do we get you back to the notebook if you lose the link? how do we know which notebook is yours?

Inclinations: SC16 Demo

  • transfer (if necessary) each dataset to existing cloud architecture - in progress?
  • discover mount points for each large dataset within existing cloud architecture - in progress?
  • spin up a Docker-enabled host and mount nearby datasets (NFS, direct mount, etc.) - in progress?
  • using docker-compose, bring up provided girder-dev on each Docker host - pending
  • extend existing ToolManager to receive site metadata - in progress
  • modify girder-dev to POST site metadata on startup - in progress
  • extend existing ToolManager to delegate tmpnb jobs to remote instances of Girder
  • wrap existing ToolManager in a simple auth mechanism
    • could we possibly import existing users from Girder using their API? probably not, due to security
    • we could callback to Girder when sites push their metadata (assuming this can be done as Girder comes online)
  • run a centralized ToolManager instance on Nebula for the purposes of the demo
  • modify existing ToolManager UI to list off collections in connected Girder instance
    • Add a "Launch Notebook" button next to each dataset where no notebook is running
    • Add a "Stop Notebook" button next to each dataset where a notebook has been launched

Using the above we would be able to show:

  • Searching for data on compute-enabled systems (albeit in a list of only 3 datasets registered in the system), possibly linking back to the original data source
  • Launch a Jupyter notebook next to each remote dataset without explicitly navigating to where that data is stored (i.e. the Girder UI)
  • How to bring this same stack up next to your data to make it searchable in our UI (we could even demonstrate this live, if it goes smoothly enough)

Inclinations: As a Long-Term service

  • leverage existing Labs/Kubernetes API for authentication and container orchestration / access across remote sites
    • etcd.go / kube.go can likely take care of talking to the necessary APIs for us, maybe needing some slight modification
    • possibly extend Labs apiserver to include the functionality of delegating jobs to tmpnb and/or ToolManager agents?
    • this leaves an open questions: single geodistributed kubernetes cluster? or one kubernetes cluster per site, federated across all sites ("ubernetes")?

Storyboard for Demo Presentation

...