We present MITRA, a Retrieval-Augmented Generation (RAG) system developed to assist particle physicists in navigating collaboration documentation. The system facilitates the extraction of relevant information from analyses and shift documentation and provides answers to user queries with direct citations to the original sources. While the current implementation uses CMS analysis documentation, the underlying workflow is designed to be experiment-agnostic. The pipeline: including document ingestion, embedding generation, and retrieval, is modular, allowing for adaptation to the documentation of other collaborations.
To meet the privacy requirements of HEP experiments, MITRA operates entirely on local collaboration infrastructure using self-hosted, language models. This ensures that internal documents such as unpublished results remain within approved servers and are not exposed to external APIs. This also reduces cost, allowing the system to scale across the user size of HEP collaborations and across time. We describe the motivation for building MITRA, performance of the retrieval pipeline, and outline ongoing work toward more complex, multi-step reasoning workflows.
https://arxiv.org/abs/2603.09800