19–25 Oct 2024
Europe/Zurich timezone

Orchestrated columnar-based analysis with columnflow

WED 27
23 Oct 2024, 15:18
57m
Exhibition Hall

Exhibition Hall

Poster Track 4 - Distributed Computing Poster session

Speaker

Mathis Frahm (Hamburg University (DE))

Description

To study and search for increasingly rare physics processes at the LHC, a staggering amount of data needs to be analyzed with progressively complex methods. Analyses involving tens of billions of recorded and simulated events, multiple machine learning algorithms for different purposes, and an amount of 100 or more systematic variations are no longer uncommon. These conditions impose a complex data flow on an analysis workflow and render its steering and bookkeeping a serious challenge. For this purpose, a toolkit for columnar HEP analysis, called columnflow, has been developed. It is written in Python, experiment agnostic in its core, and supports any flat file format, such as ROOT-based trees or Parquet files. Leveraging on the vast Python ecosystem, vectorization and convenient physics objects representation can be achieved through NumPy, awkward arrays and other libraries. Based upon the Luigi Analysis Workflow (law) package, columnflow provides full analysis automation over arbitrary, distributed computing resources. Despite the end-to-end nature, this approach allows for persistent, intermediate outputs for purposes of debugging, caching, and exchange with collaborators. Job submission to various batch systems, such as HTCondor, Slurm, or CMS-CRAB, is natively supported. Remote files can be seamlessly accessed via various protocols using either the Grid File Access Library (GFAL2) or the fsspec file system interface. In addition, a sandboxing mechanism can encapsulate the execution of parts of a workflow into dedicated environments, supporting subshells, virtual environments, and containers. This contribution introduces the key components of columnflow and highlights the benefits of a fully automated workflow for complex and large-scale HEP analyses.

Authors

Mr Bogdan Wiederspan (Hamburg University (DE)) Daniel Savoiu (Hamburg University (DE)) Johannes Haller (Hamburg University (DE)) Marcel Rieger (Hamburg University (DE)) Mathis Frahm (Hamburg University (DE)) Matthias Schroeder (Hamburg University (DE)) Nathan Provoust (Hamburg University (DE)) Peter Schleper (Hamburg University (DE)) Philip Keicher (Hamburg University (DE))

Presentation materials