Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

11–15 Mar 2024
Charles B. Wang Center, Stony Brook University
US/Eastern timezone

Retrieval Augmented Generation for Particle Physics: A Case Study with the Snowmass White Papers and Reports

13 Mar 2024, 16:15
30m
Charles B. Wang Center, Stony Brook University

Charles B. Wang Center, Stony Brook University

100 Circle Rd, Stony Brook, NY 11794
Poster Track 1: Computing Technology for Physics Research Poster session with coffee break

Speaker

Gordon Watts (University of Washington (US))

Description

Particle physics faces many challenges and opportunities in the coming decades, as reflected by the Snowmass Community Planning Process, which produced about 650 reports on various topics. These reports are a valuable source of information, but they are also difficult to access and query. In this work, we explore the use of Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) to answer questions based on the Snowmass corpus. RAG is a technique that combines LLMs with document retrieval, allowing the model to select relevant passages from the corpus and generate answers. We describe how we indexed the Snowmass reports for RAG, how we compared different LLMs for this task, and how we evaluated the quality and usefulness of the answers. We discuss the potential applications and limitations of this approach for particle physics and beyond.

Significance

LLM's are new - and we are figuring out how to apply them in our field in ways that leverage their power. Search and reasoning over local document collections is one such approach.

This is new work, and hasn't been presented before.

Experiment context, if any None, though both of us are doing this with IRIS-HEP in mind, which isn't actually an experiment...

Primary authors

Benjamin Galewsky (Univ. Illinois at Urbana Champaign (US)) Gordon Watts (University of Washington (US))

Presentation materials