Motivating case: GCMC

The Galaxy Cluster Merger Catalog (GCMC) is a data portal developed by the primary researcher (J. Zuhone) to create an external interface to data stored in yt.hub.  The site is currently hosted on Nebula with a dedicated VM and uses the hub.yt core REST API.  This required 1) creating and LDAP account at NCSA for the researcher, 2) adding him to a project on Nebula and 3) configuring a VM to host his service.  The researcher developed this site themselves, ostensibly using Python and Sphinx. It's difficult to manage the VM (OS upgrades, nebula migrations, etc), since it requires coordination with the researcher.

It seems that this might be a common case: a researcher produces a dataset and develops a custom "data portal" to enable exploration and reuse. This is related to but separate from the DataDNS service. Other examples include:

 On the one hand we have the data, on the other a custom website or portal to provide access to it – designed and implemented by the research team.

Workbench for Researchers

Q. Can we evolve the NDS Labs Workbench to be useful for researchers, not just cyberinfastructure developers and information specialists?

From the yt team in trying to solve the GCMC problem:

"I was unable to come up with a satisfactory solution for enabling users
to easily deploy portals/services that would utilize the core REST API
we've been developing for hub.yt. Aforementioned John's gcmc required 1)
creating LDAP account @ NCSA, 2) adding him to a project on nebula, 3)
configuring VM etc. That's clearly not something we would like to
undergo with each user that expresses interested in creating external
interface to hub.yt.

I think NDS Labs Workbench could significantly ease up that process.

In this case, Labs Workbench would enable researchers to:

  • Explore and share software for creating data portals
  • Compose services using Docker containers
  • Possibly run these services at participating sites, ideally near their datasets

These researchers are programmers and are capable of developing their own custom portals, but ultimately they want to do science, not development/hosting/IT infrastructure.  The more we can simplify the process, the better.

What would this require:

  • Production support:  these will be long-running services. (However, not as bad as trying to host a multi-user Dataverse archive. I imagine researchers will accept some downtime for these kinds of portal)
  • Support for data:  The goal will be to launch their services near datasets, similar as with GCMC and hub.yt. 
  • No labels