This event is part of the EP Software Seminar series.
Its role is the dissemination of results from software activities in the context of EP, as well as external high-profile presentations for relevant technologies.
Its topics include software algorithms; hardware-related code aspects; and software engineering topics, from any area of data processing applications such as reconstruction, simulation, online software and triggering, or data analysis and modeling.
To submit proposals for future topics please head to Nomination page.
EP Software Seminar

Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study

by Elisabetta Manca (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P), Enrico Guiraud (CERN, University of Oldenburg (DE))

40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson


Show room on map

With the expected large increase in the amount of available data in LHC Run 3, now more than ever HEP scientists must be able to efficiently write robust, performant analysis software that can take full advantage of the underlying hardware. Multicore computing resources are commonplace, and current trends in scientific computing include increased availability of manycore architectures. The HEP community is not alone in this challenge: the data science industry developed solutions that we can learn from and adapt to HEP-specific problems.

This is the context in which the ROOT team (and here especially Enrico) developed RDataFrame, a swiss-army knife for data manipulation that provides a high-level interface, in C++ and Python, as well as transparent optimizations such as multi-thread data parallelism. This new tool supports typical HEP workflows and data formats and it has been designed to flexibly scale up from data exploration on a laptop to analysis of millions of events exploiting hundreds of CPU cores. As a result, ROOT users can now write simpler code that runs faster. The first part of the seminar will introduce RDF, showcase its most prominent features, outline current developments and several real-world use-cases.

Precision measurements are often affected by large systematic uncertainties related to the models used in simulation, and progress can be made by the extraction of features directly from data. However, the analysis of unprecedented numbers of events in a sustainable scale of time is not possible with standard techniques. The possibilities of using the ROOT RDataFrame to overcome these limitations is demonstrated within the setup of a CMS physics study in the second part of this seminar.

Organized by

Andrea Bocci (EP-CMG), Dirk Düllmann (IT-ST-AD), Peter Hristov (EP-AIP), Axel Naumann (EP-SFT), Niko Neufeld (EP-LBC), and Andreas Salzburger (EP-ADP)

Coffee will be served at 11h00