The goal of this presentation is helping users of the Hadoop and Spark service in getting started with the SWAN-Spark integration and functionality for running Spark jobs at scale on SWAN. The session is also an occasion for service managers to gather feedback for future improvements. The integration of SWAN hosted notebooks with Spark and Hadoop service has recently been deployed into production. SWAN Spark allows you to run your data analysis at scale using Spark Python APIs (PySpark) on YARN/Hadoop clusters at CERN IT. This presentation will introduce the main components of the platform and walk you through its functionality and some get-started examples.
See also: https://swan.cern.ch and https://swan.web.cern.ch