July 29, 2019 to August 2, 2019
Northeastern University
US/Eastern timezone

COFFEA - Columnar Object Framework For Effective Analysis

Jul 30, 2019, 2:00 PM
Shillman 425 (Northeastern University)

Shillman 425

Northeastern University

Oral Presentation Computing, Analysis Tools, & Data Handling Computing, Analysis Tools, & Data Handling


Nick Smith (Fermi National Accelerator Lab. (US))


The COFFEA Framework provides a new approach to HEP analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language and commodity big data technologies such as Apache Spark and NoSQL databases. To achieve this suite of improvements across many use cases, COFFEA takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will present published results from analysis of CMS data using the COFFEA framework along with a discussion of metrics and the user experience of arriving at those results with columnar analysis.

Primary authors

Lindsey Gray (Fermi National Accelerator Lab. (US)) Matteo Cremonesi (Fermi National Accelerator Lab. (US)) Nick Smith (Fermi National Accelerator Lab. (US)) Allison Reinsvold Hall (Fermilab) Bo Jayatilaka (Fermi National Accelerator Lab. (US)) Jim Pivarski (Princeton University)

Presentation materials