19–23 May 2025
CERN
Europe/Zurich timezone

LHCbFinder - Advancing Knowledge Discovery at LHCb with Semantic Search and LLMs

Not scheduled
20m
61/1-201 - Pas perdus - Not a meeting room - (CERN)

61/1-201 - Pas perdus - Not a meeting room -

CERN

10
Show room on map
Poster 4 LLMs and foundation models Poster Session

Speaker

Mohamed Elashri (University of Cincinnati)

Description

The LHCbFinder project proposes the development of an advanced semantic search and natural language knowledge retrieval system to transform information discovery within the LHCb experiment. It is designed to transform knowledge discovery within the LHCb experiment by tackling fragmented knowledge, undocumented institutional knowledge, and steep learning curves for newcomers. By integrating various sources of knowledge and documentation at LHCb spanning from published papers to internal notes, LHCbFinder uses Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) to deliver context-aware natural language search and interaction capabilities. This allows easier and better access to information, reduce entry barrier and help with preserving institutional knowledge and expand it for more users. Our work presents the technical architecture incorporating vector embeddings and neural encoders for semantic matching, demonstrated through functional search examples using published LHCb papers. We highlight our summer 2025 development plan to expand coverage through specialized scraping pipelines for additional knowledge sources. Then we discuss the work needed to integrate with LLMs to provide interactive way of obtaining knowledge. We address implementation challenges including computational resource optimization and embargoed content management.

Would you like to be considered for an oral presentation? Yes

Author

Mohamed Elashri (University of Cincinnati)

Co-authors

Conor Henderson (University of Cincinnati (US)) Michael David Sokoloff (University of Cincinnati (US))

Presentation materials