17–24 Jul 2024
Prague
Europe/Prague timezone

Enhancing CMS data analyses using a distributed high throughput platform

20 Jul 2024, 15:04
17m
Club A

Club A

Parallel session talk 14. Computing, AI and Data Handling Computing and Data handling

Speaker

Tommaso Diotalevi (Universita e INFN, Bologna (IT))

Description

A flexible and dynamic environment capable of accessing distributed data and resources efficiently, is a key aspect for HEP data analysis, especially for the HL-LHC era. A quasi-interactive declarative solution, like ROOT RDataFrame, with scale-up capabilities via open-source standards like Dask, can profit from the "HPC, Big Data and Quantum Computing" Italian Center DataLake model under development. The starting point is a prototypal CMS high throughput analysis platform, offloaded on local Tier-2.
This contribution evaluates the scalability, identifies bottlenecks and explores the interactivity of such platform, on two use-cases: a CMS physics analysis with high-rate triggered events and a study of the CMS muon detector performance in phase-space regions driven by analysis needs, accessing detector datasets. The metrics used to evaluate the scaling and speed-up performance will be reported and results will be discussed, emphasising the differences with the legacy analysis workflows.

I read the instructions above Yes

Authors

Alessandra Fanfani (Universita e INFN, Bologna (IT)) Carlo Battilana (Universita e INFN, Bologna (IT)) Prof. Daniele Bonacorsi (University of Bologna / INFN) Tommaso Diotalevi (Universita e INFN, Bologna (IT))

Presentation materials