23–28 Oct 2022
Villa Romanazzi Carducci, Bari, Italy
Europe/Rome timezone

High performance analysis with RDataFrame and the python ecosystem: Scaling and Interoperability

27 Oct 2022, 16:10
30m
Area Poster (Floor -1) (Villa Romanazzi)

Area Poster (Floor -1)

Villa Romanazzi

Speakers

Josh Bendavid (CERN) Kenneth Long (Massachusetts Inst. of Technology (US))

Description

The unprecedented volume of data and Monte Carlo simulations at the HL-LHC will pose increasing challenges for data analysis both in terms of computing resource requirements as well as "time to insight". Precision measurements with present LHC data already face many of these challenges today. We will discuss performance scaling and optimization of RDataFrame for complex physics analyses, including interoperability with Eigen, Boost Histograms, and the python ecosystem to enable this.

Significance

Performance optimizations in this work are critical to enable higher complexity analyses while maintaining fast turnaround time. Identification of issues and bottlenecks are driving important and ongoing improvements in Root, Numba, and other libraries. Progress and impact on performance are being tracked and incorporated into benchmarking and implementation.

References

https://indico.fnal.gov/event/23628/contributions/237985/attachments/154987/201732/highPerfAnalysis-May11-2022.pdf

Experiment context, if any CMS

Primary authors

Josh Bendavid (CERN) Kenneth Long (Massachusetts Inst. of Technology (US))

Presentation materials

There are no materials yet.