http://sc16.supercomputing.org/ - Wednesday, Nov. 16th, 3:00pm MST

...

Booth demos: NCSA, SDSC, ... (
- https://docs.google.com/spreadsheets/d/1WXtu0wpU5bemFKEIfbYRmkasw_w5Po4epiy4tiLjkJc/edit#gid=0
) -
- Wednesday, Nov. 16th, 3:00pm MST
Draft list of specific individuals to invite to booths
Create a flyer (address how we approached the problem? discuss tech options)
SCinet late breaking news talk

...

1. MHD Turbulence in Core-Collapse Supernovae
Authors: Philipp Moesta (pmoesta@berkeley.edu), Christian Ott (cott@tapir.caltech.edu)

Paper URL: http://www.nature.com/nature/journal/v528/n7582/full/nature15755.html
Paper DOI: dx.doi.org/10.1038/nature15755
Data URL: https://go-bluewaters.ncsa.illinois.edu/globus-app/transfer?origin_id=8fc2bb2a-Citation: Mösta, P., Ott, C. D., Radice, D., Roberts, L. F., Schnetter, E., & Haas, R. (2015). A large-scale dynamo and magnetoturbulence in rapidly rotating core-collapse supernovae. Nature, 528(7582), 376–379. http://dx.doi.org/10.1038/nature15755
Paper URL: http://www.nature.com/nature/journal/v528/n7582/full/nature15755.html
Paper DOI: dx.doi.org/10.1038/nature15755
Data Citation: ??
Data URL: https://go-bluewaters.ncsa.illinois.edu/globus-app/transfer?origin_id=8fc2bb2a-9712-11e5-9991-22000b96db58&origin_path=%2F
Data DOI: ??
Size: 90 http://dx.doi.org/doi:10.21970/N9RP4P
Data Location: Blue Waters (/projects/sciteam/jr6/share/)
Size: 205 TB
Code & Tools: Einstein Toolkit, see this page for list of available vis tools for this format
Jupyter Notebook: ??

The dataset is a series of snapshots in time from 4 ultra-high resolution 3D magnetohydrodynamic simulations of rapidly rotating stellar core-collapse. The 3D domain for all simulations is in quadrant symmetry with dimensions 0 < x,y < 66.5km, -66.5km < z < 66.5km. It covers the newly born neutron star and it's shear layer with a uniform resolution. The simulations were performed at 4 different resolutions [500m,200m,100m,50m]. There are a total of 350 snapshots over the simulated time of 10ms with 10 variables capturing the state of the magnetofluid. For the highest resolution simulation, a single 3D output variable for a single time is ~26GB in size. The entire dataset is ~90TB ~205TB in size. The highest resolution simulation used 60 million CPU hours on BlueWaters. The dataset may be used to analyze the turbulent state of the fluid and perform analysis going beyond the published results in Nature doi:10.1038/nature15755.

2. Probing the Ultraviolet Luminosity Function of the Earliest Galaxies with the Renaissance Simulations
Authors: Brian O'Shea (oshea@msu.edu), John Wise, Hao Xu, Michael Norman

Paper URL: http://iopscience.iop.org/article/10.1088/2041-Citation: Norman, B. W. O. and J. H. W. and H. X. and M. L. (2015). Probing the Ultraviolet Luminosity Function of the Earliest Galaxies with the Renaissance Simulations. The Astrophysical Journal Letters, 807(1), L12.
Paper URL: http://iopscience.iop.org/article/10.1088/2041-8205/807/1/L12/meta;jsessionid=40CF566DDA56AD74A99FE108F573F445.c1.iopscience.cld.iop.org
Paper DOI: dx.doi.org/10.1088/2041-8205/807/1/L12
Data URL:

Data Citation: ??
Data DOI: ??
Size: 89 TB
Code & Tools: Enzo
Jupyter Notebook: http://yt-project.org/docs/dev/cookbook/cosmological_analysis.html

In this paper, we present the first results from the Renaissance Simulations, a suite of extremely high-resolution and physics-rich AMR calculations of high-redshift galaxy formation performed on the Blue Waters supercomputer. These simulations contain hundreds of well-resolved galaxies at z ~ 25–8, and make several novel, testable predictions. Most critically, we show that the ultraviolet luminosity function of our simulated galaxies is consistent with observations of high-z galaxy populations at the bright end of the luminosity function (M1600 ⩽ -17), but at lower luminosities is essentially flat rather than rising steeply, as has been inferred by Schechter function fits to high-z observations, and has a clearly defined lower limit in UV luminosity. This behavior of the luminosity function is due to two factors: (i) the strong dependence of the star formation rate (SFR) on halo virial mass in our simulated galaxy population, with lower-mass halos having systematically lower SFRs and thus lower UV luminosities; and (ii) the fact that halos with virial masses below ~2 x 10^8 M do not universally contain stars, with the fraction of halos containing stars dropping to zero at ~7 x 10^6 M . Finally, we show that the brightest of our simulated galaxies may be visible to current and future ultra-deep space-based surveys, particularly if lensed regions are chosen for observation.

3. Dark Sky Simulation
Authors: Michael Warren, Alexandar Friedland, Daniel Holz, Samuel Skillman, Paul Sutter, Matthew Turk (mjturk@illinois.edu), Risa Wechsler

Paper URLCitation: https://zenodo.org/record/10777#.V_VvKtwcK1M, https://arxiv.org/abs/1407.2600
Paper DOI: http://dx.Warren, M. S., Friedland, A., Holz, D. E., Skillman, S. W., Sutter, P. M., Turk, M. J., & Wechsler, R. H. (2014). Dark Sky Simulations Collaboration. Zenodo. https://doi.org/10.5281/zenodo.10777
Data Paper URL: https://

...

zenodo.org/record/10777#.V_VvKtwcK1M, https://

...

arxiv.

...

org/

...

abs/1407.2600
Paper DOI: http://

...

dx.

...

doi.

...

org/10.

...

5281/

...

zenodo.

...

10777

...

Data DOI: ??
Size: 31 TB
Code & Tools: Citation: Warren, M. S., Friedland, A., Holz, D. E., Skillman, S. W., Sutter, P. M., Turk, M. J., & Wechsler, R. H. (2014). Dark Sky Simulations Collaboration. Zenodo. https://bitbucketdoi.org/darkskysims/darksky_tour/

...

10.5281/zenodo.10777
Data URL:

- https://girder.hub.yt/api/v1/collection/578501e0c2a5f40001cec1d6/download (https://girder.hub.yt/#collection/578501e0c2a5f40001cec1d6)
- http://darksky.slac.stanford.edu/about.html

Data DOI: https://doi.org/10.5281/zenodo.10777 (Although this is classified as a report in Zenodo, the authors intended this to be the DOI for the dataset)
Size: 31 TB
Code & Tools: https://bitbucket.org/darkskysims/darksky_tour/
Jupyter Notebook: https://girder.hub.yt/#user/570bd8fc2f2b14000176822c/folder/5820b9c09ea95c00014c71a1

The cosmological N-body simulation designed to provide a quantitative and accessible model of the evolution of the large-scale Universe.

4. ...

Design Notes

Planning discussion 1 (NDSC6)

Image Added

Photo of whiteboard from NDSC6

On the left is the repository landing page for a dataset (Globus, SEAD, Dataverse) with a button/link to the "Job Submission" UI
Job Submission UI is basically the Tool manager or Jupyter tmpnb
At the top (faintly) is a registry that resolves a dataset URL to it's location with mountable path
- (There was some confusion

4. ...

Design Notes

Planning discussion 1 (NDSC6)

Image Removed

Photo of whiteboard from NDSC6

On the left is the repository landing page for a dataset (Globus, SEAD, Dataverse) with a button/link to the "Job Submission" UI
Job Submission UI is basically the Tool manager or Jupyter tmpnb
At the top (faintly) is a registry that resolves a dataset URL to it's location with mountable path
- (There was some confusion whether this was the dataset URL or dataset DOI or other PID, but now it sounds like URL – see example below)
On the right are the datasets at their locations (SDSC, NCSA)
The user can launch a container (e.g., Jupyter) that mounts the datasets readonly and runs on a docker-enabled host at each site.
Todo list on the right:
- Data access at SDSC (we need a docker-enabled host that can mount the Norman dataset)
- Auth – how are we auth'ing users?
- Container orchestration – how are we launching/managing containers at each site
- Analysis?
- BW → Condo : Copy the MHD dataset from Blue Waters to storage condo at NCSA
- Dataset metadata (Kenton)
- Resolution (registry) (Kyle)

...

Build some kind of quasi-auth scheme (similar to ndslabs) on top of the existing ToolManager
Inherit Girder's auth scheme and solve the problem of sharing these "users" between sites
Create a "guest" user at each site and use that to launch tools from remote sources
- NOTE: tmpnb only allows one notebook per user (per folder?), so anyone launching remotely would be sharing a notebook
- this is undesirable, as ideally each request would launch a separate instance
- lingering question: how do we get you back to the notebook if you lose the link? how do we know which notebook is yours?

Inclinations: SC16 Demo

Transfer (if necessary) each dataset to existing cloud architecture - in progress?
Spin up a Docker-enabled host and mount nearby datasets (NFS, direct mount, etc.) - in progress?
Federated model
- Using docker-compose, bring up provided girder-dev environment on each Docker host - pending
- Develop the "resolver" REST API to
  - Receive site metadata - in progress
  - Delegate tmpnb requests to remote Girder instances using existing /notebook API endpoint
  - Add authentication:
    - We simply need to collect an e-mail address (identity) to run things on their behalf
    - Could possibly import existing users from Girder using their API? probably not, due to security
    - We could callback to Girder when sites push their metadata (assuming this can be done as Girder comes online)
  - Extend UI to list off collections in connected Girder instance
    - Add a "Launch Notebook" button next to each dataset where no notebook is running
    - Add a "Stop Notebook" button next to each dataset where a notebook has been launched
- Modify girder-dev to POST site metadata on startup (feder8)- in progress
Run a centralized resolver instance on Nebula for the purposes of the demo

Using the above we would be able to show:

Searching for data on compute-enabled systems (albeit in a list of only 3 datasets registered in the system), possibly linking back to the original data source
Launch a Jupyter notebook next to each remote dataset without explicitly navigating to where that data is stored (i.e. the Girder UI)
How to bring this same stack up next to your data to make it searchable in our UI (we could even demonstrate this live, if it goes smoothly enough)

Inclinations: As a Long-Term service

of the existing ToolManager
Inherit Girder's auth scheme and solve the problem of sharing these "users" between sites
Create a "guest" user at each site and use that to launch tools from remote sources
- NOTE: tmpnb only allows one notebook per user (per folder?), so anyone launching remotely would be sharing a notebook
- this is undesirable, as ideally each request would launch a separate instance
- lingering question: how do we get you back to the notebook if you lose the link? how do we know which notebook is yours?

Inclinations: SC16 Demo

Transfer (if necessary) each dataset to existing cloud architecture - in progress?
Spin up a Docker-enabled host and mount nearby datasets (NFS, direct mount, etc.) - in progress?
Federated model
- Using docker-compose, bring up provided girder-dev environment on each Docker host - pending
- Develop the "resolver" REST API to
  - Receive site metadata in - done => POST to /datasets
  - Delegate tmpnb requests to remote Girder instances using existing /notebook API endpoint - done => POST to /resolve/:id
  - Add authentication:
    - We simply need to collect an e-mail address (identity) to run things on their behalf
    - Could possibly import existing users from Girder using their API? probably not, due to security
    - We could callback to Girder when sites push their metadata (assuming this can be done as Girder comes online)
  - Extend UI to list off collections in connected Girder instance - doneish... very primitive, but styling it is trivial
    - Add a "Launch Notebook" button next to each dataset where no notebook is running - doneish... prototype is working, once real metadata is in place this is trivial
    - Add a "Stop Notebook" button next to each dataset where a notebook has been launched - TBD
      - this is a slightly tougher problem, as we now need to call out to every Girder's /notebook endpoint
- Modify girder-dev to POST site metadata on startup (feder8)- in progress
Run a centralized resolver instance on Nebula for the purposes of the demo

Using the above we would be able to show:

Searching for data on compute-enabled systems (albeit in a list of only 3 datasets registered in the system), possibly linking back to the original data source
Launch a Jupyter notebook next to each remote dataset without explicitly navigating to where that data is stored (i.e. the Girder UI)
How to bring this same stack up next to your data to make it searchable in our UI (we could even demonstrate this live, if it goes smoothly enough)

Inclinations: As a Long-Term service

leverage existing Labs/Kubernetes API for authentication and container orchestration / access across remote sites
- etcd.go / kube.go can likely take care of talking to the necessary APIs for us, maybe needing some slight modification
- possibly extend Labs apiserver to include the functionality of delegating jobs to tmpnb and/or ToolManager agents?
- this leaves an open questions: single geodistributed kubernetes cluster? or one kubernetes cluster per site, federated across all sites ("ubernetes")

Notes from 10/28 meeting

Present: Mike, David, Kenton, Kandace, Kacper, Craig

Kacper explained more of the Whole Tale design:
- There will be a website where the user can enter the DOI, DOI will resolve to remote repository (e.g., DataOne). Ingest will only happen at that point (on-demand)
- When the data can't be moved, if compute near the data it will be used
- Need to support composing multiple datasets – e.g., DarkSky + some smaller dataset. In this case, the smaller data will be moved to site with the large dataset.
Might look into Agave project for capabilities API (long-term)
Specific comments about SC16 demo:
- folderId requirement in volman can be removed – just hardcode the mountpoint. So can the userId requirement.
- tmpnb can be used to simply create a temporary notebook – not tied to a user/folder
- The folderId is useful when the user want's access to a subset/subdirectory
- tmpnb is nothing new, so alone this isn't much of a demo.
- DarkSky data is NFS mountable read
- Regarding Norman dataset, Girder does support Swift for ingest, but need to test it.
- Girder supports Oauth, if useful
There is now a presentation on November 16th
Next steps – for the demo, we will used the "Federated" model above, but long term there's still much to discuss
- Data transfer (MHD)
- Write the registry API and UI for proof-of-concept
- Swift problem: Ingest into Girder directly or find out how best to mount Swift into container
- Create VM near data at SDSC with Girder stack
- Example notebooks for 3 datasets
Discussion of big-data publishing stack (spitballing)
- Girder+tmpnb is now an option we can recommend to SCs to make these big datasets available. Install these services, and you can make an inaccessible dataset accessible, with minimal analysis support.
- This isn't the only stack – one of many options, but this works for the large physics simulation data.
- If they install this stack, they could (potentially, with much more thought) be Whole-Tale compatible.
Discussion of "Data DNS" service (spitballing)
- This came up during an earlier whiteboard session. The resolver can be a sort of data DNS service – given an identifier, resolve to one or more locations.
- This would be different than the RDA PID concept, not an authoritative registry, just a way of saying "I have a copy of this data available here already" for larger datasets
- Sites could possibly publish capabilities – I have this data and can launch a Docker container (e.g., Jupyter); I have this data in Hadoop/HDFS; I can support MPI jobs, etc.
- The identifier now is anything that uniquely identifies the dataset (PID, DOI, Handle, URN, URL
leverage existing Labs/Kubernetes API for authentication and container orchestration / access across remote sites
- etcd.go / kube.go can likely take care of talking to the necessary APIs for us, maybe needing some slight modification
- possibly extend Labs apiserver to include the functionality of delegating jobs to tmpnb and/or ToolManager agents?
- this leaves an open questions: single geodistributed kubernetes cluster? or one kubernetes cluster per site, federated across all sites ("ubernetes")

Storyboard for Demo Presentation

Here is my PPT version of my napkin sketch for the SC demo. Also context on where the demo product fits in the story. Comments, please!

nds_sc16_demo.pptx

Presentation v1

Please give me your feedback! Graphics executed to the edge of my abilities and patience. See presentation notes for script written so far.

nds_sc16_demo_111216.pptx

Space shortcuts

Page tree

Versions Compared

Old Version 46

New Version Current

Key

Design Notes

Planning discussion 1 (NDSC6)

Design Notes

Planning discussion 1 (NDSC6)

Inclinations: SC16 Demo

Inclinations: As a Long-Term service

Inclinations: SC16 Demo

Inclinations: As a Long-Term service

Notes from 10/28 meeting

Storyboard for Demo Presentation

Presentation v1

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 46

New Version Current

Key

Design Notes

Planning discussion 1 (NDSC6)

Design Notes

Planning discussion 1 (NDSC6)

Inclinations: SC16 Demo

Inclinations: As a Long-Term service

Inclinations: SC16 Demo

Inclinations: As a Long-Term service

Notes from 10/28 meeting

Storyboard for Demo Presentation

Presentation v1