Update on Workbench development as of 9/29/2017.

Sprint 33

Sprint 33 starts next week. Notes are available on the Wiki: 2017-09-27 Sprint Planning

The focus for this sprint includes:

  • NBI Pilot/BBD workshop
  • Einstein Toolkit pilot instance
  • NCSA Industry demo(s) 
    • Workbench integration with Spark
      • NBI use case
      • Genomics use case
    • Workbench integration with HPC
      • TERRA stitching use case
  • Website and documentation updates
  • SC17

Additional items include

  • QMCDB
  • bioCaddie update
  • Priority features

NBI Pilot/BBD Workshop

We have a preliminary version of the NBI MongoDB running in Labs Workbenc with a sample Jupyter environment provided by Robin Gandhi from the University of Nebraska.  Robin was directed to NDS through the MBDH as a possible place to host the NBI data, a public dataset used by structural engineering and transportation researchers. 

Craig will be attending the Bridging Big Data workshop on 10/4 and will present the Labs Workbench as a component in the TERRA-REF data and computing pipeline. 

We will likely pursue formalizing this pilot as the first collaboration between MBDH and NDS.

The NBI pilot is potentially beneficial in several ways:

  • Connects NDS to Bridge Health group through Midwest Big Data Hub
  • Represents another example of NDS hosting "active" data (e.g., in ready-to-analyze framework)
  • Provides immediate access to a resource and example analysis for Bridge Health community

Einstein Toolkit pilot instance 

The NSF-funded Einstein Toolkit project submitted a pilot request for us to host an instance of the Labs Workbench service for their ongoing tutorial and workshop support.  The instance is in testing and currently available at:

https://www.einsteintoolkit.nationaldataservice.org/

The purpose of this instance is to support one-time tutorial users (via Jupyter notebook).  The instance may be expanded to support workshops (like the 2017 Einstein Toolkit School) in the future. The Einstein Toolkit will likely be a good demonstration case for SC17.

NCSA Industry Demos

We will be presenting the Labs Workbench platform at the NCSA Industry conference in October.  The main focus is demonstrating Workbench integration with larger analytical platforms, including Spark and HPC (with a priority on Spark).

Ben has installed a small Spark cluster for integration testing and demonstrated the ability to connect to the instance from Workbench using a Zeppelin notebook via the Livy API ( NDS-1013 - Getting issue details... STATUS .  During Sprint 33 we will be finalizing two different use cases: 

  • NDS-1029 - Getting issue details... STATUS : using code and data from the NCSA Genomics group to demonstrate Lasso regression for marker selection via Spark
  • NDS-1027 - Getting issue details... STATUS : using the NBI data to demonstrate simple predictive model for bridge health.

We will also go forward with the creation of a Jupyter notebook to demonstrate the TERRA-REF full-field image stitching process

  • NDS-1030 - Getting issue details... STATUS
  • NDS-1025 - Getting issue details... STATUS

. The basic use case is to create a full-field representation (and eventually mosaic) of crop data from high-resolution sensor data that may be applicable to other domains.

Website and documentation updates

We will be working to refine the description of the NDS Labs service to prioritize the "sandboxing" capability near research data, reducing focus on the "playground" for research data management tools.  We will also correct some problems with existing pilot descriptions, specifically MDF (broken links, inaccurate information, etc).

SC17 planning

The focus of the SC17 demonstration will be Workbench to HPC.  At this point, we're focused on the TERRA-REF image stitching and Einstein Toolkit cases, but are open to other suggestions.  We plan to demonstrate the following:

  1. New user launches tutorial or exploratory interactive environment in Workbench
  2. The user runs some small-scale analysis via interactive environment in Workbench
  3. The user runs the same analysis at scale on an HPC system, via the Workbench system.

The goal is to demonstrate how Workbench supports "scaling up" the research process from a single interface.

QMCDB 

Kandace received a request for information about the QMC-DB database, apparently hosted by NDS Labs. This was an early pilot project. We've determined that the VMs exist but are shutdown in OpenStack and have reached out to Ray Plante for information about accessing the systems. The PI is interested in resuming the pilot, if possible. This would be similar to the NBI pilot (above).

bioCADDIE

The bioCADDIE final report was delivered on 9/5.  We're waiting for the final invoices to post, but everything must be complete by 9/31.

Features

In addition to the above demonstrations and instances, we are prioritizing the following features:

  • Improved authentication and authorization
    • To support integration with existing Spark and HPC clusters
    • Focus on OIDC/Oauth support (single sign-on) and LDAP support for authorization
  • Commercial cloud support
    • Ability to deploy workbench on AWS, GCE, Azure
    • Expanded storage options for both commercial cloud and OpenStack
  • Improved security


  • No labels