Craig
David's observations and some ideas about paths forward
Workbench:
The most notable interest in the workbench was as a platform for training and education, with strong interest from the I-School constituents as a turn-key system for educating students in hands-on data-lab environments. This scenario can easily be expanded to professional training, and in-fact the workbench has been used for small-scale targets workshop education. To implement and support these activities would require the following features, modifications, and additions:
next-level data support - more realistic in real-world settings but still small-scale
better tool catalog support – fast self-service creation and update of curated tools to allow very small, activity-specific mini catalogs that could be easily changed by instructors week-by-week
ability to run multiple independent workbench instances simultaneously
support for sites to deploy and operate their own workbench infrastructures – we can’t be the only provider
a configurable system for user management, authentication, and access-control by educators that is self-service and adaptable to their on-premise systems (campus course logins, etc.)
addition of some analytics/usage gathering that feeds back to NDSlabs so we can understand the who/how/what usage of the platform
Implementing this would further the capabilities of the workbench in a real-world, real-data scenario in addition to enabling a useful service in support of data educators. Based on the need and enthusiasm expressed at NDSC, the scope of the work, and the benefits to a targeted community – this could be the centerpiece of a proposal between NDSL and ischools around big data education.
NDS Federation Services:
To support NDS services based on the search/share/publish/re-use model where data, data-tools, and documents are first-class objects across many sites and are orchestrated by a set of NDS federated services, NDSLabs should be working towards providing turn-key Federation as a Service for Data Applications - a packaged set of services that can be deployed quickly and easily at NDS sites (on site resources, with site management) that enable NDS capabilities on-site and automatically inter-operate with the larger federation to provide data search and access, compute-near-data, ready-to-use data tools, and ease-of-use. The features of such a system include
Site self-service - a site can self-deploy and self-manage NDS services
Loosly coupled – sites and site-services can come/go at will without intervention
Extend not replace – NDS services work with existing infrastructure, leveraging their capabilities and making resources available across the federation – including compute, data, and authentication/access
Self-monitoring – monitoring is automated for the site, and across the federation
Auto-remediation – automatic recovery from resource unavailability, network errors, application crashes, etc. without human intervention
Scalability – sites can easily add more compute or storage as demand requires
Simple data registration – sites can easily publish data-sets making them usable and searchable
Policy-based controls – data and compute access can be restricted using simple policy controls
Data and compute mobility – data can be copied to compute, compute can be deployed near-data, on-demand
To support these features, NDSLabs can develop and provide:
Turn-key site infrastructure – container-based scalable cluster-OS and installation tools to deploy on existing IaaS (OpenStack), or bare systems.
Local service adapters - a set of containerized services that adapt and integrate existing site compute and data resources, and provide local control, monitoring, analytics
Federation services – a set of containerized services that integrate the site with the federation – provides search, data transfer, and local compute
Interfaces for 3rd party NDS applications – enables development and deployment of new services onto the NDS fabric. Supports the NDS core services and approved 3rd party services
Build federated multi-site cluster SDSC/TACC/NCSA and evaluate
In cooperation with site-admins using refined deploy tools
Not clear if federated clusters in-orchestration or multiple independent clusters with NDSL federation services are better, or some mix.
Build/test core per-site with federation integration services:
ID/auth/access – dex or keycloak?
Data interface service: site-run auto-federating data/tool registration
Determine a re-usable policy architecture for federated services: needs goals in federation, points of policy application, mechanisms for policy enforcement at those points, policy decision engine.
Implement site/federated software-defined data-integration services: – interfaces and mechanisms for adapting to local data in low-level: (filesystem, NFS, GPFS,…) and high-level (swift, HFS, …)
Perhaps with data movement (globus, zfs-send, …) and dataset caching for multi-site copy.
Implement compute near-data service – integrated with local data-set registration and auth/access policy integration
Implement turn-key data-publishing service for sites: turnkey full-stack of re-usable site-side publishing services/tools enable deploy on local provisioned infrastructure.