Speaker
Description
The Next Generation Trigger (NGT) project at CERN aims to extract more physics information from the High Luminosity LHC data. To achieve this, GPUs and other accelerators are being increasingly adopted in LHC experiments, running both procedural code and AI/ML inferences.
As a result, formerly CPU-only modules in the event reconstruction frameworks now interleave their computations with asynchronous calls and synchronization to accelerators. CPU-based parallelism relies on Thread Building Blocks (TBB) task primitives across all experimental frameworks. Our project assesses strategies to ensure efficient use of these heterogeneous compute resources. We investigate current synchronicity management techniques, including the task suspension mechanism of TBB, C++20 coroutines, C++26 sender/receiver techniques, Boost Fibers, and ad-hoc conventions. In addition, the coroutines allow composition of asynchronous tools from disjoint libraries. We are actively investigation those composition methodologies. These strategies are tested on small-scale prototypes before being integrated into experimental frameworks such as CMSSW, Gaudi, and Athena.
The goal is to maximize event processing throughput for a given hardware configuration. In addition to throughput, internal timings are analysed using profiling tools. To this end, code instrumentation has been added for both NVIDIA's NVTX and AMD's ROCTX. This allows us to extract timings from production code and assess the relative impact of asynchronous calls. Event processing typically involves a mix of short and long kernels; scheduling of the short kernels significantly affects performance in synthetic tests. The overall impact on full experimental frameworks is evaluated.
This presentation will report on these prototyping efforts, instrumentation developments, and the resulting profiling and performance measurements.