43rd International Conference on High Energy Physics

Name: 43rd International Conference on High Energy Physics
Start: 2026-07-30T08:00:00-03:00
End: 2026-08-05T19:00:00-03:00
Location: Natal, Brazil

30 July 2026 to 5 August 2026

Natal, Brazil

America/Sao_Paulo timezone

Contact

ichep2026@cbpf.br

Operational challenges of the Event Processing Nodes GPU farm at ALICE Experiment

Not scheduled

20m

Natal, Brazil

Via Costeira Sen. Dinarte Medeiros Mariz, 6664-6704 - Ponta Negra, Natal - RN, 59090-002

Talk Software and Computing

Collaboration ALICE

The ALICE Event Processing Node (EPN) farm, a high-density GPU HPC system, serves as the backbone for real-time data reconstruction during LHC Run 3 period (2022—2026) and it is the largest computer farm at CERN, in terms of compute capacity. Comprising 350 nodes and 2800 GPUs, with a peak performance of 48 PFLOP/s, the EPN infrastructure has been operated throughout Run 3 by a dedicated team of two to three individuals at a time.
This contribution presents the experience gained during detector operations throughout Run 3, and architectural choices that enabled a 24/7-supported, high-reliability, low-maintenance operational model. An overview of the provisioning, configuration, and observability frameworks governing a specialized GPU-accelerated HPC facility is presented. Management spans the physical layer—including infrastructure—through the software stack and experiment-specific software. A key feature of this architecture is the integration with central detector-control systems and the logical separation of synchronous and asynchronous processing modes. To conclude, a retrospective is provided on several years of continuous operation, offering a blueprint for how small teams can maintain mission-critical scientific infrastructure through robust automation and sustainable practices.

I read the instructions above	Yes

Collaboration ALICE

There are no materials yet.

43rd International Conference on High Energy Physics

Contact

Operational challenges of the Event Processing Nodes GPU farm at ALICE Experiment

Natal, Brazil

Speaker

Description

Author

Presentation materials