Speaker
Description
The LHCbFinder project proposes the development of an advanced semantic search and natural language knowledge retrieval system to transform information discovery within the LHCb experiment. It is designed to transform knowledge discovery within the LHCb experiment by tackling fragmented knowledge, undocumented institutional knowledge, and steep learning curves for newcomers. By integrating various sources of knowledge and documentation at LHCb spanning from published papers to internal notes, LHCbFinder uses Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) to deliver context-aware natural language search and interaction capabilities. This allows easier and better access to information, reduce entry barrier and help with preserving institutional knowledge and expand it for more users. Our work presents the technical architecture incorporating vector embeddings and neural encoders for semantic matching, demonstrated through functional search examples using published LHCb papers. We highlight our summer 2025 development plan to expand coverage through specialized scraping pipelines for additional knowledge sources. Then we discuss the work needed to integrate with LLMs to provide interactive way of obtaining knowledge. We address implementation challenges including computational resource optimization and embargoed content management.
| Would you like to be considered for an oral presentation? | Yes |
|---|