Jul 9 – 13, 2018
Sofia, Bulgaria
Europe/Sofia timezone

Pandas DataFrames for F.A.S.T. binned analysis at CMS

Jul 10, 2018, 11:30 AM
Hall 9 (National Palace of Culture)

Hall 9

National Palace of Culture

presentation Track 6 – Machine learning and physics analysis T6 - Machine learning and physics analysis


Dr Benjamin Krikler (University of Bristol (GB))


Many analyses on CMS are based on the histogram, used throughout the workflow from data validation studies to fits for physics results. Binned data frames are a generalisation of multidimensional histograms, in a tabular representation where histogram bins are denoted by category labels. Pandas is an industry-standard tool, providing a data frame implementation that allows easy access to "big data” scientific libraries, including I/O, visualisation, and machine learning tools.

F.A.S.T. (Faster Analysis Software Taskforce) runs regular hackdays and tutorials in the UK. We present a new toolkit where Pandas DataFrames are used as the basis for binned physics analysis. We demonstrate how this engenders faster, more robust, and more flexible development, using fewer lines of code, as well as improving accessibility for newcomers.

The toolkit is presented in the context of a typical CMS search analysis, where we look for evidence of new physics signal in a multidimensional parameter space, but is not analysis specific. The code structure is simple and modular, with built-in bookkeeping and documentation. We envisage its adoption by other UK binned analyses, and plan associated tools, helping physicists to focus more on “what” their analysis should do rather than technically “how” it is done.

Primary authors

Dr Benjamin Krikler (University of Bristol (GB)) Olivier Davignon (University of Bristol (GB)) Dr Lukasz Kreczko (University of Bristol (GB)) Jacob Thomas Linacre (Fermi National Accelerator Lab. (US)) Emmanuel Olatunji Olaiya (STFC-Rutherford Appleton Laboratory (GB)) Tai Sakuma (University of Bristol (GB))

Presentation materials