28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)

Name: 28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)
Start: 2026-05-25T08:00:00+07:00
End: 2026-05-29T14:00:00+07:00
Location: Chulalongkorn University

25–29 May 2026

Chulalongkorn University

Asia/Bangkok timezone

Exabyte-Scale Automation, Alarms and Monitoring at CERN

28 May 2026, 13:45

18m

MHMK 302

Oral Presentation Track 7 - Computing infrastructure and sustainability Track 7 - Computing infrastructure and sustainability

Octavian-Mihai Matei (CERN)

Over the past 70 years, CERN’s pioneering work in particle physics and more than a decade of operations at the Large Hadron Collider (LHC) has driven a dramatic transformation in data storage. With each new experimental run, the scale and complexity of data handling continue to grow. As we approach the next Long Shutdown (LS3) and the High-Luminosity LHC (HL-LHC) era, storage infrastructure demands are expected to rise exponentially, bringing significant challenges and opportunities.

Today at CERN, we operate over 800 storage nodes across eight independent EOS instances, forming the backbone of data storage for experiments, services and users. Managing this infrastructure at the Exabyte scale requires robust monitoring, smart alerting systems and a deep understanding of system performance and operational behavior.

In this talk, we will take a behind-the-scenes look at the daily operations of CERN’s storage systems, exploring what it takes to keep EOS running reliably under extreme conditions. We will highlight the evolution of our operational tools/practices and how we are preparing for future requirements in scalability, performance and reliability. Key topics will include improvements in observability, automation, fault detection and incident response, essential components to support EOS as it scales to meet the demands of HL-LHC data workflows.

Abhishek Lekshmanan (CERN) Andreas Joachim Peters (CERN) Cedric Caffy (CERN) David Smith (CERN) Elvin Alin Sindrilaru (CERN) Gianmaria Del Monte (CERN) Guilherme Amadio (CERN) Luca Mascetti (CERN) Dr Maria Arsuaga Rios (CERN) Octavian-Mihai Matei (CERN)

EOS_ALARMS_APOLLON_AND_HERMES.pdf

28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)

Exabyte-Scale Automation, Alarms and Monitoring at CERN

MHMK 302

Speaker

Description

Authors

Presentation materials