SWAN Spark: hosted Jupyter notebooks meet Spark jobs at scale
Tuesday 24 April 2018 -
16:00
Monday 23 April 2018
Tuesday 24 April 2018
16:00
SWAN Spark - a data analysis platform with hosted Jupyter notebooks and Spark on Hadoop service at CERN
-
Prasanth Kothuri
(
CERN
)
SWAN Spark - a data analysis platform with hosted Jupyter notebooks and Spark on Hadoop service at CERN
Prasanth Kothuri
(
CERN
)
16:00 - 16:40
Room: 513/1-024
The goal of this presentation is helping users of the Hadoop and Spark service in getting started with the SWAN-Spark integration and functionality for running Spark jobs at scale on SWAN. The session is also an occasion for service managers to gather feedback for future improvements. The integration of SWAN hosted notebooks with Spark and Hadoop service has recently been deployed into production. SWAN Spark allows you to run your data analysis at scale using Spark Python APIs (PySpark) on YARN/Hadoop clusters at CERN IT. This presentation will introduce the main components of the platform and walk you through its functionality and some get-started examples. See also: https://swan.cern.ch and https://swan.web.cern.ch
16:40
Q&A
Q&A
16:40 - 17:00
Room: 513/1-024