Speaker
Description
The CMS experiment relies on a complex software ecosystem for detector simulation, event reconstruction, and physics analysis. As data rates and detector complexity continue to rise, scaling this software efficiently across distributed resources has become essential. We present the extension of the CMS Software (CMSSW) into a fully distributed application, enabling a single logical workflow to span multiple processes running on one or more machines. This approach leverages MPI to efficiently exploit shared memory and high-speed interconnects - like InfiniBand and RoCE - with minimal changes to the existing CMSSW code. Data movement to and from GPU memory leverages RDMA, enabling these transfers to bypass the host CPU entirely when supported by the underlying hardware. The distributed execution model is implemented through a small number of lightweight CMSSW modules responsible for establishing the MPI communication, transferring event data, and keeping track of the application’s logical state. The latter is critical for supporting the High Level Trigger (HLT) use case, where efficient real-time processing depends on the early rejection of events that do not pass the selection criteria. We will demonstrate the application of this distributed model to the 2025 CMS HLT configuration, evaluate the scalability and performance across a range of network interconnects, GPUs, and MPI implementations, and discuss its implications for future heterogeneous and distributed computing in CMS.