Oct 27 – 30, 2025
CERN
Europe/Zurich timezone

Wrangling Massive Task Graphs with Dynamic Hierarchical Composition

Oct 30, 2025, 3:20 PM
30m
222/R-001 (CERN)

222/R-001

CERN

200
Show room on map
"Standard talk" Plenary Session Thursday

Speaker

Benjamin Tovar Lopez (University of Notre Dame)

Description

Data analysis in High Energy Physics is constrained by the scalability of systems that rely on a single, static workflow graph. This representation is rigid, struggles with overhead when applied to workflows involving large data, and can be slow to construct (such as with Dask). To overcome this, we introduce Dynamic Data Reduction (DDR), built upon the common pattern in event processing. This pattern consists of applying an analysis function to event chunks followed by a commutative and associative reduction operation. Recognizing this property allows us to decouple decisions about data chunking and result accumulation from the global workflow definition.

DDR implements this through a hierarchical and dynamic composition of tasks, separating cluster-level and node-level concerns. For coffea applications, this means we flip the generation of events: there is one event factory per chunk at the execution nodes, rather than one factory for the whole workflow, deferring resource decisions until execution time. The scheduler manages distribution to the cluster using an abstract workflow representation, while tasks for computation on the node are generated on demand and in parallel before execution. This approach defers parallelization settings, making execution adaptive to resources.

We use Cortado, a skimming coffea application, for empirical validation. This workflow, involving 14 terabytes of data and 12 billion events, proved intractable for static graph methods, often failing after ∼20 hours of graph generation. DDR, however, reliably completed the entire analysis in only ∼5.5 hours.

Authors

Benjamin Tovar Lopez (University of Notre Dame) Jin Zhou Barry Sly-Delgado (University of Notre Dame) Kevin Patrick Lannon (University of Notre Dame (US)) Douglas Thain

Presentation materials