CS3 2024 - Cloud Storage Synchronization and Sharing

Name: CS3 2024 - Cloud Storage Synchronization and Sharing
Start: 2024-03-11T08:30:00+01:00
End: 2024-03-13T20:05:00+01:00
Location: CERN

11–13 Mar 2024

CERN

Europe/Zurich timezone

Contact

cs3-conf2024-iac@cern.ch

Utilizing RDataFrame for Data Preservation and Open Publishing Data and Analyzes Software for HEP

12 Mar 2024, 15:00

15m

503/1-001 - Council Chamber (CERN)

503/1-001 - Council Chamber

CERN

162

Show room on map

Presentation User Voice: Innovative Applications, Data Science Environments & Open Data FAIR Data Management

Pawel Kruczkiewicz (AGH University of Krakow (PL))

CERN produces, analyzes and archives vast amounts of data. To conduct an analysis a lot of software in the form of scripts and code is produced. As the time goes by and new approaches supersede the old ones, the aforementioned artifacts may become hard to understand and setting up and running them can be challenging. This may be a crucial concern when trying to publish the data in an open repository like CERN OpenData. Furthermore, an old code cannot leverage new technological advancements which could potentially enhance its performance.
To address this issue an effort to restore data and analysis from LHC Run 1 has been conducted. This work describes the process of transforming data regarding the analysis and code from the LHC Run 1 at the TOTEM experiment. It utilizes RDataFrame – a modern data processing tool – to transcribe C++ scripts into a form of a comprehensible Jupyter notebook. As a result, the number of lines of code has been greatly reduced, thus enhancing the readability. In addition, the notebook can be run on a novel serverless engine architecture.
The process described in this work shows potential applicability for further data preservation and publication efforts.

Pawel Kruczkiewicz (AGH University of Krakow (PL)) Leszek Grzanka (AGH University of Krakow (PL)) Valentina Avati (AGH University of Krakow (PL)) Kamil Krzysztof Burkiewicz (AGH University of Science and Technology (PL)) Maciej Malawski (AGH University of Krakow (PL))

Recording

Utilizing_RDataFrame_Kruczkiewicz-1.pdf

Video preview

CS3 2024 - Cloud Storage Synchronization and Sharing

Contact

Utilizing RDataFrame for Data Preservation and Open Publishing Data and Analyzes Software for HEP

503/1-001 - Council Chamber

CERN

Speaker

Description

Authors

Presentation materials

Choose timezone

CS3 2024 - Cloud Storage Synchronization and Sharing

Contact

Speaker

Description

Authors

Presentation materials