Jul 17 – 21, 2023
Princeton University
US/Eastern timezone

Columnar Data Analysis

Jul 20, 2023, 1:30 PM
1h 30m
407 Jadwin Hall (Princeton University)

407 Jadwin Hall

Princeton University

Princeton Center For Theoretical Science (PCTS)


Ioana Ifrim (Princeton University (US)) Jim Pivarski (Princeton University)


Data analysis languages, such as Numpy, MATLAB, R, IDL, and ADL, are typically interactive with an array-at-a-time interface. Instead of performing an entire analysis in a single loop, each step in the calculation is a separate pass, letting the user inspect distributions each step of the way.

Unfortunately, these languages are limited to primitive data types: mostly numbers and booleans. Variable-length and nested data structures, such as different numbers of particles per event, don't fit this model. Fortunately, the model can be extended.

This tutorial will introduce awkward-array, the concepts of columnar data structures, and how to use them in data analysis, such as computing combinatorics (quantities depending on combinations of particles) without any for loops.

Presentation materials