Experimental support for Apache Zeppelin
rvitillo at mozilla.com
Mon Mar 6 14:24:06 UTC 2017
We have added Zeppelin  to our toolbox, a web-based notebook that enables interactive data analytics. There is a nice video  showcasing some of its capabilities if you have never heard of it.
Zeppelin is offered as an experimental tool and is deployed in addition to Jupyter on our ATMO clusters. While the latter is serving us generally well, Jupyter’s integration with Spark is far from perfect at this time:
- opening a second notebook causes the new notebook to freeze ;
- Spark jobs can't be cancelled ;
- Jupyter can freeze without any apparent error ;
- Scala isn't supported in our environment (kernels for it do exist though);
- the Spark progress bar doesn’t work reliably.
There are various efforts underway to improve the status quo, like , but it might take a while until they are production ready. While Zeppelin should solve the above problems, Jupyter is still king in other areas like auto-completion and matplotlib support. Furthermore, neither RTMO nor Github can render Zeppelin notebooks yet and an external service  has to be used. One major benefit of using Zeppelin is that the same SparkContext can be shared by different interpreters (SQL, Python, Scala and in the future R) within the same notebook, see .
Based on our users' feedback we will decide at the end of next quarter if Zeppelin should become a first class citizen and in that case we will add support for it to RTMO.
To try Zeppelin launch a Spark cluster from ATMO with the latest EMR release. Then tunnel to Zeppelin with:
ssh -L 8890:localhost:8890 hadoop at YOURCLUSTER
Finally open localhost:8890 in Firefox. There you will find a telemetry tutorial notebook  preloaded. Once you open it, make sure to save the suggested interpreter bindings before trying to execute cells.
 Bug 1290148
 Bug 1318706
More information about the fhr-dev