Overview
The Container Analysis Environments Workshop was sponsored by NDS and DataExpLab to bring together a variety of groups leveraging container technology in research computing and data analysis access. Groups represented a wide variety of projects including Blue Waters, LIGO, LSST/DES, CyVerse, TERRA-REF, SciServer, Whole Tale, yt.Hub, NDS Labs, TACC (Agave, BioContainers), SDSC (JupyterHub), XSEDE Gateways, CyberGIS, CRC, and Brown Dog.
Following a set of presentations, the group was asked to prioritize a list of topics for breakout groups and deep dive discussions. The topics were discussed as follows:
- Integration with HPC environments
- Singularity/Shifter
- Launching jobs with data from interactive environments
- Agave API
- Shared storage across containers
- How groups are supporting data access
- Performance/reliability/scalability
- Deep dive on Whole Tale data management architecture
- Security (i.e., HIPAA compliance)
- Archiving/management/preservation of images
- Centralized registry for Docker and singularity images used in research environments
- Best-practices for image preservation
- Allowing users to dynamically compose images
- Interactive analysis environments
Figuring out "gotchas" -- how are people solving problems. What do they include? Jupyter vs Rstudio vs X environments
Profiling load/capping access
- Opportunities for interoperability/collaboration between systems
- Container orchestration (scheduling) systems
Several topics could not be discussed in detail due to a lack of expertise including workflow systems and authentication.
Major takeaways from the workshop include:
Similarity between architectures (SciServer, CyVerse, WT, yt.Hub, Workbench). Some differences, but lots of core similarities. Presents clear opportunity for interoperability.
Similarities between these services and science gateways, particularly as we look at integration with HPC (but ig differences between Singularity and Docker)
Jupyter notebooks are a portable research product -- everyone is using them.
But RStudio, Shiny, Matlab are still used.
Distinction between the developer and the researcher
Lots of components that we might be able to use at least for design patterns or possibly as implementations
Whole Tale data management architecture
Agave API
Actions:
- Encapsulating iPython notebooks (make the notebook sharable, not the container)
- Integration prototypes:
TERRA-REF -> CyVerse
SciServer -> yt.hub
Whole Tale -> CyVerse
Workbench to HPC via Agave
Follow-up about authentication
Report at NDS8 (Craig)