Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Related to 

Jira
serverJIRA
serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
keyNDS-1021

Background

We are currently exploring how to support integration between the Workbench system and Spark clusters.  This means the ability to authenticate into, run and monitor jobs on a Spark cluster remotely.  There are existing integrations with Jupyter and the Zeppelin notebook frameworks as well as Rstudio.  A simple proof of concept would be to demonstrate running Zeppelin or Jupyter notebooks (or both) in Workbench connecting to a remote Spark cluster.

Zeppelin v Jupyter v RStudio v Cloud9

Zeppelin is an Apache data-driven notebook application service, similar to Jupyter. Zeppelin supports both single and multi-user installations. They both seem to support very similar features with different strengths/weaknesses.  This might be another compelling case for the Workbench system – we don't care whether you use Zeppelin notebooks or Jupyter notebooks.

...

Livy v. no-livy

As noted in 

Jira
serverJIRA
serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
keyNDS-1013
, we have the ability to connect remotely to Spark via Yarn or Livy.  Yarn integration seems to assume that you're on the same network with the cluster configuration available. Livy is designed specifically to support remote execution. The Livy REST API Server does appear to be required for both Zeppelin and Jupyter if trying to submit jobs remotely.

...

Jupyter/Spark integration

There are plenty of existing examples demonstrating Jupyter integration with Spark.

...

See also Kevin's Freund case in SC17 Demo

Big picture

We're starting to define a possible bigger picture for the integration of Workbench and Spark clusters:

...