Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

19–24 Nov 2020
Europe/Zurich timezone

bamboo: easy and efficient analysis with python and RDataFrame

24 Nov 2020, 16:39
10m
Short Talk Software

Speaker

Pieter David (Universite Catholique de Louvain (UCL) (BE))

Description

The bamboo analysis framework [1] allows to write simple declarative analysis code (it effectively implements a domain-specific language embedded in python), and runs it efficiently using RDataFrame (RDF) - or viewed differently: it introduces a set of tools to efficiently generate large RDF computation graphs from a minimal amount of user code (in python), e.g. a simple way to specify selections and outputs, automatically filling a set of histograms with different systematic variations of some input variables.
It is currently being used for several analyses on the full CMS Run2 dataset, and thus provides an example of a very analysis description language-like approach that is compatible with the practical needs of modern HEP data analysis (different types of corrections, machine learning inference, user-provided extensions, combining many input samples and scaling out to a batch cluster etc.).

[1] https://cp3.irmp.ucl.ac.be/~pdavid/bamboo/

Primary author

Pieter David (Universite Catholique de Louvain (UCL) (BE))

Presentation materials