Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The cosmological N-body simulation designed to provide a quantitative and accessible model of the evolution of the large-scale Universe.

4. ... 
 

Miscellaneous

...

Design notes

Image Added

Photo of whiteboard from NDSC6

  • On the left is the repository landing page for a dataset (Globus, SEAD, Dataverse) with a button/link to the "Job Submission" UI
  • Job Submission UI is basically the Tool manager or Jupyter tmpnb
  • At the top (faintly) is a registry that resolves a dataset ID (Data DOI, PID) to it's location with mountable path
  • On the right are the datasets at their locations (SDSC, NCSA)
  • The user can launch a container (e.g., Jupyter) that mounts the datasets readonly and runs on a docker-enabled host at each site.
  • Todo list on the right:
    • Data access at SDSC (we need a docker-enabled host that can mount the Norman dataset)
    • Auth – how are we auth'ing users?
    • Container orchestration – how are we launching/managing containers at each site
    • Analysis?
    • BW → Condo : Copy the MHD dataset from Blue Waters to storage condo at NCSA
    • Dataset metadata (Kenton)
    • Resolution (registry) (Kyle)

 

Design Notes

Notes from discussion (Craig W, David R, Mike L) based on above whiteboard diagram:

Gliffy Diagram
namesc16-box-diagram

What we have:

  • "Tool Manager" demonstrated at NDSC6 
    • Angular UI over a very simple Python/Flask REST API. 
    • The REST API allows you to get a list of supported tools, post/put/delete instances of running tools.  
    • Fronted with a basic NGINX proxy that routes traffic to the running container based on ID (e.g., http://tooserver/containerId/)
    • Data is retrieved via HTTP get using repository-specific APIs. Only Clowder and Dataverse are supported
    • Docker containers are managed via system calls (docker executable)
  • ytHub/tmpnb:
    • The yt.hub team has extended Jupyter tmpnb to support volume mounts.  They've created fuse mounts for Girder.
  • PRAGMA PID service
    • Demonstrated at NDSC6, appears to allow attaching arbitrary metadata to registered PID. 
  • Analysis
    • For the Dark Sky dataset, we can use the notebook demonstrated by the yt team.

What we need to do:

  • Copy MHD dataset to storage condo
  • Docker-enabled hosts with access to each dataset (e.g., NFS) at SDSC, possibly in the yt DXL project, and in the NDS Labs project for MHD
  • Decide whether to use/extend existing Tool Manager, yt/tmpnb or Jupyter tmpnp  (or something else)
  • Define strategy for managing containers at each site
    • Simple:  "ssh docker run -v" or use the Docker API
    • Harder:  Use Kubernetes or Docker Swarm for container orchestration.  For example, launch a jupyter container on a node with label "sdsc"
  • Implement the resolution/registry
    • Ability to register a data DOI or PID with some associated metadata – although these example datasets don't have data DOIs..
    • Metadata would include site (SDSC, NCSA) and volume mount information for the dataset.
  • Implement bookmarklet:  There was discussion of providing some bookmarklet javascript to link a data DOI/PID to the "tool manager" service
  • Authentication: 
    • TBD – how do we control who gets access, or is it open to the public?
    • In the case of Clowder/Dataverse, all API requests include an API key
  • Analysis:
    • Need to get notebooks/code to demonstrate how to work with the MHD and  Norman data.