25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

Towards Autonomous Computing Operations with AI Assistance for Belle II Experiment.

26 May 2026, 17:45
18m
Chulalongkorn University

Chulalongkorn University

Oral Presentation Track 4 - Distributed computing Track 4 - Distributed computing

Speaker

Mr Dhiraj Kalita (KEK (High Energy Accelerator Research Organization))

Description

The Belle II experiment at KEK, Japan, operates with data volume reaching over 30 petabytes, with datasets distributed and processed worldwide using DIRAC and Rucio. With the globally distributed computing infrastructure, and expecting an order of magnitude larger data volume, we face operational challenges for both computing experts and end-users. The end-users frequently struggle with multiple issues (e.g. problem with job submission, locating relevant documentation) generating load on experts who provide support.
This contribution reports on ongoing research and development of an intelligent, automated assistance system. The proposed system is designed to optimize experiment workflows, diagnose common failures, and provide continuous 24/7 monitoring to reduce service downtime and accelerate incident response. Our work leverages recent advances in open-source Large Language Models (LLMs) combined with Retrieval-Augmented Generation (RAG) to incorporate experiment-specific documentation such as software guides, troubleshooting resources, and FAQs for authoritative, context-aware assistance. In parallel, we explore AI-Agents for automated analysis of grid job logs, failure classification, and root-cause suggestion.
This research proposes a local LLM infrastructure for enhanced privacy, security, and sustainability by keeping sensitive data internal. The self-contained deployment allows for task-specific fine-tuning, integration with Model Context Protocol (MCP) tools, and long-term cost control. The contribution details the prototype architecture, preliminary evaluation, and a roadmap to improve Belle II Experiment operations and user experience.

Authors

Cedric Serfon (Brookhaven National Laboratory (US)) Mr Dhiraj Kalita (KEK (High Energy Accelerator Research Organization)) I Ueda (KEK IPNS) Michel Hernandez Villanueva (Brookhaven National Laboratory (US))

Co-authors

Mr Paul Gebeline (University of Mississippi | Ole Miss) Quinn Campagna

Presentation materials

There are no materials yet.