11–13 Mar 2024
CERN
Europe/Zurich timezone

Utilizing RDataFrame for Data Preservation and Open Publishing Data and Analyzes Software for HEP

12 Mar 2024, 15:00
15m
503/1-001 - Council Chamber (CERN)

503/1-001 - Council Chamber

CERN

162
Show room on map
Presentation User Voice: Innovative Applications, Data Science Environments & Open Data FAIR Data Management

Speaker

Pawel Kruczkiewicz (AGH University of Krakow (PL))

Description

CERN produces, analyzes and archives vast amounts of data. To conduct an analysis a lot of software in the form of scripts and code is produced. As the time goes by and new approaches supersede the old ones, the aforementioned artifacts may become hard to understand and setting up and running them can be challenging. This may be a crucial concern when trying to publish the data in an open repository like CERN OpenData. Furthermore, an old code cannot leverage new technological advancements which could potentially enhance its performance.
To address this issue an effort to restore data and analysis from LHC Run 1 has been conducted. This work describes the process of transforming data regarding the analysis and code from the LHC Run 1 at the TOTEM experiment. It utilizes RDataFrame – a modern data processing tool – to transcribe C++ scripts into a form of a comprehensible Jupyter notebook. As a result, the number of lines of code has been greatly reduced, thus enhancing the readability. In addition, the notebook can be run on a novel serverless engine architecture.
The process described in this work shows potential applicability for further data preservation and publication efforts.

Authors

Pawel Kruczkiewicz (AGH University of Krakow (PL)) Leszek Grzanka (AGH University of Krakow (PL)) Valentina Avati (AGH University of Krakow (PL)) Kamil Krzysztof Burkiewicz (AGH University of Science and Technology (PL)) Maciej Malawski (AGH University of Krakow (PL))

Presentation materials