15–19 Sept 2025
CERN
Europe/Zurich timezone

Design of a Real-Time Anomaly Detection System for LHCb Operational Logs

16 Sept 2025, 09:40
5m
40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

95
Show room on map
3. AI for metadata analysis AI for metadata analysis

Speaker

Benedict Kamoni Njoki (University of Nairobi (KE))

Description

The operational control layer in the Experiment Control System (ECS) of the LHCb experiment is built on WinCC Open Architecture (OA), which generates large volumes of logs. Currently, operators and shifters examine these logs manually to identify system errors. This process is time-consuming, tedious, and requires expert knowledge.

Patterns in the logs are not easily discernible, making it difficult to identify events that lead to system failures. Machine learning can uncover such patterns, flag potential errors before they occur, and streamline error identification and communication to shifters.

We therefore propose a real-time anomaly detection system for LHCb operational logs that will support:

  • Error identification and prediction
  • Root-cause tracing
  • Alarms and notifications to shifters

This system will reduce operator workload, improve reliability, and enable proactive responses to potential failures.

CERN group/ Experiment

LHCb

Working area Area 7: Experimental Technologies
Project goals The intermediate goals are to implement log data preprocessing (PCA, clustering, exploration), labeling, and event mapping to shift DB logs; develop supervised and unsupervised ML models for anomaly detection; and build supporting software such as a monitoring UI, log streaming API, and MLOps hooks, while the final goals are to integrate the system into the LHCb framework for testing and validation and to provide a real-time anomaly detection system that supports error identification, prediction, and root-cause tracing with alarms and notifications for shifters.
Timeline In months 0–12 the focus is on data preprocessing and ML model development, in months 12–24 on backend and frontend software development, and in months 24–36 on system integration, testing, and validation.
Available person power 0.5 FTE (Doctoral Student)
Additional person power request No additional person power
Is this an already ongoing activity? Yes
Indicative hardware resources needs The project will require access to LHCb Online computing resources, including CPU nodes for log streaming and preprocessing, as well as GPU resources for ML model training and testing, with the scale depending on model complexity

Authors

Presentation materials

There are no materials yet.