Related to
Jira | ||||||
---|---|---|---|---|---|---|
|
Background
We are currently exploring how to support integration between the Workbench system and Spark clusters. This means the ability to authenticate into, run and monitor jobs on a Spark cluster remotely. There are existing integrations with Jupyter and the Zeppelin notebook frameworks as well as Rstudio. A simple proof of concept would be to demonstrate running Zeppelin or Jupyter notebooks (or both) in Workbench connecting to a remote Spark cluster.
Zeppelin v Jupyter v RStudio v Cloud9
Zeppelin is an Apache data-driven notebook application service, similar to Jupyter. Zeppelin supports both single and multi-user installations. They both seem to support very similar features with different strengths/weaknesses. This might be another compelling case for the Workbench system – we don't care whether you use Zeppelin notebooks or Jupyter notebooks.
...
- https://dwhsys.com/2017/03/25/apache-zeppelin-vs-jupyter-notebook/
- https://www.linkedin.com/pulse/comprehensive-comparison-jupyter-vs-zeppelin-hoc-q-phan-mba-
Livy v. no-livy
As noted in
Jira | ||||||
---|---|---|---|---|---|---|
|
...
Jupyter/Spark integration
There are plenty of existing examples demonstrating Jupyter integration with Spark.
...
See also Kevin's Freund case in SC17 Demo
Big picture
We're starting to define a possible bigger picture for the integration of Workbench and Spark clusters:
...