23–28 Oct 2022
Villa Romanazzi Carducci, Bari, Italy
Europe/Rome timezone

High performance analysis with RDataFrame and the python ecosystem: Scaling and Interoperability

27 Oct 2022, 11:00
30m
Area Poster (Floor -1) (Villa Romanazzi)

Area Poster (Floor -1)

Villa Romanazzi

Poster Track 2: Data Analysis - Algorithms and Tools Poster session with coffee break

Speakers

Josh Bendavid (CERN) Kenneth Long (Massachusetts Inst. of Technology (US))

Description

The unprecedented volume of data and Monte Carlo simulations at the HL-LHC will pose increasing challenges for data analysis both in terms of computing resource requirements as well as "time to insight". Precision measurements with present LHC data already face many of these challenges today. We will discuss performance scaling and optimization of RDataFrame for complex physics analyses, including interoperability with Eigen, Boost Histograms, and the python ecosystem to enable this.

References

https://indico.fnal.gov/event/23628/contributions/237985/attachments/154987/201732/highPerfAnalysis-May11-2022.pdf

Significance

Performance optimizations in this work are critical to enable higher complexity analyses while maintaining fast turnaround time. Identification of issues and bottlenecks are driving important and ongoing improvements in Root, Numba, and other libraries. Progress and impact on performance are being tracked and incorporated into benchmarking and implementation.

Experiment context, if any CMS

Authors

Josh Bendavid (CERN) Kenneth Long (Massachusetts Inst. of Technology (US))

Presentation materials

There are no materials yet.