15–19 Sept 2025
CERN
Europe/Zurich timezone

AI Assistant for ATLAS operations and beyond

16 Sept 2025, 16:30
5m
40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

95
Show room on map
6. Large Language Models-based assistants Large Language Models-based assistants

Speaker

Carlos Solans Sanchez (CERN)

Description

The operation and maintenance of the experiments like ATLAS require expertise across many domains, particularly during interventions or unexpected events. While much knowledge is documented in CERN’s Engineering Data Management Service (EDMS), the system is fragmented, with limited metadata and diverse formats that hinder quick access. To address this, the Expert System tool (https://doi.org/10.1051/epjconf/201921405035) was developed to centralize and simplify access to the experiment’s knowledge base. It provides intuitive navigation, highlights interdependencies between subsystems, and simulates the interactions of approximately 13000 objects through 89000 relationships that are stored in the ATLAS TDAQ object-oriented configuration database, referred to as OKS (https://doi.org/10.1109/23.710971).

Any triggered DSS alarm requires operators to identify its cause, point of failure, and criticality, which depends on factors such as the affected subsystem and the experiments’ operational mode. Alarm recovery is resource-intensive and often involves multiple stakeholders, with the Shift Leader in Matters of Safety (SLIMOS) acting as first responder. Their main task is to determine whether an alarm stems from an error or from an intervention, as this dictates the follow-up. For example, a cooling plant shutdown may signal a system fault or scheduled maintenance. While true errors demand expert recovery procedures, intervention-related alarms can be resolved more easily but still consume resources and reduce overall alertness. To address this, an Alarm Helper tool was introduced in 2023 for LS2 following an alarm analysis that showed that most of the alarms were caused by interventions.

The HL-LHC upgrades will bring significant enhancements to the ATLAS detector and its infrastructure, particularly from the new Inner Tracker and the CO2 cooling system that will result in changes in established concepts, and a considerable number of interventions that might slow down the restart of operations. Since the LEP era, efforts to improve detector operations have combined automation with the accumulated expertise of operators. A natural evolution of this approach is the development of language-based AI assistants that focus on usability and explainability. Rather than replacing human decision-making, such tools would harness operator knowledge while lowering the barrier to accessing complex information.

This project aims to abide itself into a CERN-wide effort to transform how operators interact with the detector, ensuring that expertise is more widely accessible, decision-making remains human-driven, and operational efficiency and safety are enhanced as the HL-LHC era begins. The objective is to use an open-source Large-Language-Model (LLM) as the generative transformer, evaluate different inference engines for the infrastructure including commercial solutions and those developed at CERN, and use Retrieval-Augmented-Generation (RAG) to feed the documentation into the LLM. By building on the Expert System’s structured descriptions and graph-based algorithms, enriching them with the time trends and correlations captured by the Alarm Helper, and introducing the current status of the detector based on Detector Safety System (DSS) and Detector Control System (DCS) information, this assistant could be implemented in ATLAS in first instance, and provide intuitive, natural-language explanations of alarms and subsystem behaviour setting a new standard in detector operations. We aim to use the ATLAS expert system as a real life deployment target, however dedicated care during the development of the assistant model will be taken to allow the abstraction of feed-in data sources and model interconnects, in order to allow generalization of the developed technology to other use cases at CERN.

CERN group/ Experiment

CERN ATLAS Team

Working area Area 6: Large Language Models-based assistants
Project goals Enhance operational efficiency and safety. Increase the usability and explainability. Lower the barrier to access complex information. Transform how operators interact with the detector. Maintain decision-making human-driven. Build on expert system’s structured descriptions. Use DSS alarms and DCS status data points. Generalize these assistents to future detectors and experiments.
Timeline This project as an initial phase of 3 years, but is intended to lead into a longer-term initative accross CERN. Year 1: Identification of related knowledge domains. Discussion of inference engines. Year 2: Evaluation of different LLMs and inference engines. Engineering of interfaces. Year 3: Prototype for use during commissioning of the detector. Collect feedback from real case scenarios. Potential continuation: Year 4: Integration of real-time detector information. Engineering of project for operations. Year 5: Release of project for operations. Integration into control room procedures. Documentation.
Available person power 0.25 FTE
Additional person power request 36 GRAP months, 36 ORIG months, 36 TECH months
Is this an already ongoing activity? Yes
Indicative hardware resources needs Appropriate training hardware for the model training.

Author

Presentation materials

There are no materials yet.