ACAT 2024

Name: ACAT 2024
Start: 2024-03-11T08:00:00-04:00
End: 2024-03-15T14:30:00-04:00
Location: Charles B. Wang Center, Stony Brook University

11–15 Mar 2024

Charles B. Wang Center, Stony Brook University

US/Eastern timezone

Contact

acat-loc2024@cern.ch

columnflow: Fully automated analysis through flow of columns over arbitrary, distributed resources

14 Mar 2024, 16:10

30m

Charles B. Wang Center, Stony Brook University

100 Circle Rd, Stony Brook, NY 11794

Poster Track 2: Data Analysis - Algorithms and Tools Poster session with coffee break

Bogdan Wiederspan (Hamburg University (DE))

To study and search for increasingly rare physics processes at the LHC, a staggering amount of data needs to be analyzed with progressively complex methods. Analyses involving tens of billions of recorded and simulated events, multiple machine learning algorithms for different purposes, and an amount of 100 or more systematic variations are no longer uncommon. These conditions impose a complex data flow on an analysis workflow and render its steering and bookkeeping a serious challenge.
For this purpose, a toolkit for columnar HEP analysis, called columnflow, has been developed. It is written in Python, experiment agnostic in its core, and supports any flat file format, such as ROOT-based trees or Parquet files. Leveraging on the vast Python ecosystem, vectorization and convenient physics objects representation can be achieved through NumPy, awkward arrays and other libraries. Based upon the Luigi Analysis Workflow (law) package, columnflow provides full analysis automation over arbitrary, distributed computing resources. Despite the end-to-end nature, this approach allows for persistent, intermediate outputs for purposes of debugging, caching, and exchange with collaborators. Job submission to various batch systems, such as HTCondor, Slurm, or CMS-CRAB, is natively supported. Remote files can be seamlessly accessed via various protocols using either the Grid File Access Library (GFAL2) or the fsspec file system interface. In addition, a sandboxing mechanism can encapsulate the execuction of parts of a workflow into dedicated environments, supporting subshells, virtual environments, and containers.
This contribution introduces the key components of columnflow and highlights the benefits of a fully automated workflow for complex and large-scale HEP analyses, showcasing an implementation of the Analysis Grand Challenge.

Experiment context, if any	CMS

Bogdan Wiederspan (Hamburg University (DE)) Marcel Rieger (Hamburg University (DE))

acat2024_rieger_columnflow.pdf

ACAT 2024

Contact

columnflow: Fully automated analysis through flow of columns over arbitrary, distributed resources

Charles B. Wang Center, Stony Brook University

Speaker

Description

Authors

Presentation materials

Choose timezone

ACAT 2024

Contact

Speaker

Description

Authors

Presentation materials