...
- On the left is the repository landing page for a dataset (Globus, SEAD, Dataverse) with a button/link to the "Job Submission" UI
- Job Submission UI is basically the Tool manager or Jupyter tmpnb
- At the top (faintly) is a registry that resolves a dataset ID (Data DOI, PID) URL to it's location with mountable pathpath
- (There was some confusion whether this was the dataset URL or dataset DOI or other PID, but now it sounds like URL – see example below)
- On the right are the datasets at their locations (SDSC, NCSA)
- The user can launch a container (e.g., Jupyter) that mounts the datasets readonly and runs on a docker-enabled host at each site.
- Todo list on the right:
- Data access at SDSC (we need a docker-enabled host that can mount the Norman dataset)
- Auth – how are we auth'ing users?
- Container orchestration – how are we launching/managing containers at each site
- Analysis?
- BW → Condo : Copy the MHD dataset from Blue Waters to storage condo at NCSA
- Dataset metadata (Kenton)
- Resolution (registry) (Kyle)
...
- Copy MHD dataset to storage condo
- Docker-enabled hosts with access to each dataset (e.g., NFS) at SDSC, possibly in the yt DXL project, and in the NDS Labs project for MHD
- Decide whether to use/extend existing Tool Manager, yt/tmpnb or Jupyter tmpnp (or something else)
- Define strategy for managing containers at each site
- Simple: "ssh docker run -v" or use the Docker API
- Harder: Use Kubernetes or Docker Swarm for container orchestration. For example, launch a jupyter container on a node with label "sdsc"
- Implement the resolution/registry
- Ability to register a data DOI or PID URL with some associated metadata – although these example datasets don't have data DOIs.metadata.
- Metadata would include site (SDSC, NCSA) and volume mount information for the dataset.
- The PRAGMA PID service looks possible at first glance, but may be too complex for what we're trying to do. It requires handle.net integration.
- Implement bookmarklet: There was discussion of providing some bookmarklet javascript to link a data DOI/PID to the "tool manager" service
- Authentication:
- TBD – how do we control who gets access, or is it open to the public?
- In the case of Clowder/Dataverse, all API requests include an API key
- Analysis:
- Need to get notebooks/code to demonstrate how to work with the MHD and Norman data.
Example case for resolution (not a real dataset for SC16)
- A dataset has a Globus Publish landing page https://publish.globus.org/jspui/handle/ITEM/113
- This dataset has the URL
- This would map to Nebula:
- /scratch/mdf/publication_113