Using Spark for Physics
Wednesday 4 May 2022 -
16:30
Monday 2 May 2022
Tuesday 3 May 2022
Wednesday 4 May 2022
16:30
Investigating Apache Spark for Physics Analysis
-
Luca Canali
(
CERN
)
Investigating Apache Spark for Physics Analysis
Luca Canali
(
CERN
)
16:30 - 17:30
Apache Spark is a very successful open-source tool for data processing, over the last few years Spark and platforms built around it have seen large adoption in industry. This talk will focus on the use of Spark and its DataFrame API in the context of HEP. We will go through a few demos of some simple analyses implemented on Jupyter notebooks using Apache Spark APIs. We will also briefly review some related work on Spark DataFrames for large scale Physics data preparation/reduction. Based on those experiences we will discuss the key features of Spark and its ecosystem that can be useful for Physics analysis, and what still needs improvement, compared to the current state of the art analysis software.