19–25 Oct 2024
Europe/Zurich timezone

Near-Data Computing Model for Accelerating LHC Data Filtering

23 Oct 2024, 14:06
18m
Room 2.A (Seminar Room)

Room 2.A (Seminar Room)

Talk Track 7 - Computing Infrastructure Parallel (Track 7)

Speaker

Aashay Arora (Univ. of California San Diego (US))

Description

The data reduction stage is a major bottleneck in processing data from the Large Hadron Collider (LHC) at CERN, which generates hundreds of petabytes annually for fundamental particle physics research. Here, scientists must refine petabytes into only gigabytes of relevant information for analysis. This data filtering process is limited by slow network speeds when fetching data from globally dispersed storage facilities, which leads to thousands of wasted CPU hours waiting for data to arrive.

We demonstrate a near-data computing model that optimizes data access and enhances performance by filtering LHC data close to its storage before transmission over the slow network. This model is designed to be implemented with minimal change in the existing data layout and seamless integration with the underlying storage infrastructure, ensuring compatibility and ease of adoption for current systems.

We achieve this by deploying Data Processing Units (DPUs) within the storage cluster. Our model leverages DPU's high-bandwidth connections to perform fast data retrieval and filtering near storage, significantly improving overall data processing speeds and freeing up compute node CPUs for more important tasks. Additionally, it streamlines the workflow by removing coding complexities and making programming accessible for end users. We demonstrate that our model significantly outperforms current methods using real physics data and a realistic data reduction workflow.

Primary authors

Aashay Arora (Univ. of California San Diego (US)) Diego Davila Foyo (Univ. of California San Diego (US)) Frank Wuerthwein (Univ. of California San Diego (US)) Jonathan Guiang (Univ. of California San Diego (US)) Narangerelt Batsoyol (University of California San Diego) Philip Chang (University of Florida (US)) Steven Swanson (University of California San Diego)

Presentation materials