Use Cases

iSchool Pilot

Month-long pilot for introductory and advanced courses in data curation.  Course included final assignment where students worked directly with CKAN in Workbench.  Long-running instance tied to lesson plans. Users registered with their own email addresses.  Instances needed to be available for grading.  Potential integration with SSO or courseware.  Workload tied to assignments (which students did at the last minute). Most load happened on a single weekend before the assignment was due.  They requested custom documentation and videos for CKAN. Required containerizing CKAN, but it's a stock service for the most part – nothing special for the pilot.

Phenotype 2017 Tutorial

Created instance phenome2017.ndslabs.org at NCSA with backup at SDSC due to Nebula instability issues. This was for a 3 hour workshop at a conference in Arizona with a single instructor.  Initial registration problems (project manager registered users with the wrong instance). Setup instance ~1 week early for instructor, who was already familiar with the system.  Used images from the TERRA-REF project to analyze sample data (stored in shared directory).  Support was handled via email.

PI4 Bootcamp

Two-week data science bootcamp focused on the use and analysis of large and complex data. Required custom RStudio, Jupyter and OpenRefine containers tied to instructors' lesson plans. They had hoped to use it for a Hadoop/MapReduce tutorial, but we didn't get it working in time.  Also requested custom PostGIS container for BETYdb. Required access to TERRA-REF dataset (exported via NFS). One instructor was already familiar with the system. Setup instance one week early so instructors could test/evaluate. Included custom UI, catalog, and pre-created user accounts. Used Slack team for communication (very effective).  Due to poor workload characterization/preparation, encountered resource issues almost immediately. Had to provision additional resources.  NCSA security blackholed the main instance due to open terminals in OpenRefine containers. Tutorial data was loaded in shared directory.  30 users, 67 active containers. Instructor opted to use AWS EMR for mapreduce tutorial. Ran into resource limits in the system overall, but also in containers.  Created instance pi4.ndslabs.org. 

Complete notes.

Einstein Toolkit School

Three-day workshop using a combination of Jupyter Notebooks and BlueWaters training allocation. Multiple instructors. One runs their own JupyterHub (which was expected to be used for the workshop). Multi-node configuration was declined due to poor I/O performance. Compilation of Einstein Toolkit was too slow. In the end, they used Workbench for part of the tutorial – but due to system failure reverted to other options (JupyterHub/BlueWaters).  One instructor is still interested in using Workbench for ongoing tutorial support. This way mainly a Jupyter workshop – with compilation of Einstein Toolkit.  No shared data, but they did want to pre-compile the toolkit. Safari issues with Jupyter (due to our authentication implementation). Required live support – Slack and in-person. Pre-created accounts (user1-n). Highly-customized Jupyter image. No special documentation, since instructors provided tutorial material.

ThinkChicago Hackathon

Three-day event. Required significant planning with organizers, who relied on us to provide technical recommendations. Participants are given open-ended prompts (with unpredictable workloads). We used stock images, but students will likely want specialized tools.  It's difficult to anticipate needs.  Created custom documentation, UI, and catalog.  Provided example code, which is useful for Workbench in general. They asked for us to host several large datasets, which pose interesting problems (Gluster I/O is slow, they can't work with the data in it's entirety, since it's so big).


  • No labels