PHYSTAT

PHYSTAT informal review: Signal extraction via Gaussian Processes

by Abhijith Gandrakota (Fermi National Accelerator Lab. (US)), Abhijith Gandrakota (FNAL), Marcin Jurek (SMU)

Europe/Zurich
Description

This is a PHYSTAT Informal Review event*. Today, Abhijith Gandrakota (physicist) and Marcin Jurek (statistician) will review the topic "Signal extraction via Gaussian Processes".

Agenda:

  • 3.30 pm Opening:
  • 3.30 pm Physicists Presentation (20'+10')
  • 4 pm Statisticians Presentation (25'+10')
  • 4.35 pm General Discussion and Closing (25')

 

*PHYSTAT informal reviews: In this virtual format, a Tandem consisting of a physicist and a statistician will review a statistical method introduced by one of the parties or a general critical analysis topic from the Physicist's and Statistician's perspectives. The virtual events comprise: two 20+10 min. complementary presentations followed by ~30 minutes of general discussion.

 

Abstract:

Extracting deviations from smooth backgrounds is fundamental to discoveries beyond the standard model and measurements of rare standard model processes at the Large Hadron Collider. Traditional approaches using ad-hoc functional forms for background modeling face increasing challenges with the high statistics expected from Run 3 and the High-Luminosity LHC, where functional form misspecification can compromise the accurate modeling of underlying distributions. Gaussian process (GP) regression has emerged as a promising methodology offering robust background estimation with natural uncertainty quantification, providing a more rigorous statistical framework for density modeling and hypothesis testing.

In the first part, Abhijith Gandrakota will review current GP-based approaches for LHC analyses, highlighting their advantages over parametric methods in terms of flexibility and uncertainty quantification. He will discuss significant limitations that remain in existing implementations, including their efficacy primarily for localized resonances with limited validation for broad or non-localized signal topologies. More critically, he will address how existing implementations do not fully respect the Poissonian nature of LHC data, instead relying on Gaussian approximations that break down in low-statistics regions, and how many approaches employ computational approximations that compromise the theoretical rigor of the Bayesian framework. Essential developments needed for widespread adoption will be outlined, including proper Poisson likelihood treatment within the GP framework, standardized kernel selection protocols, and integration with established frequentist inference procedures used by experimental collaborations.

In the second part, Marcin Jurek will discuss the broader statistical perspective on using Gaussian processes for modeling the density of expected distributions in particle physics. He will examine how differences between expected and observed distributions in specific regions of support are used to test hypotheses about particle generation models. The presentation will explore the opportunities offered by the GP paradigm as a unified framework for density modeling and hypothesis testing, while critically examining the challenges that must be addressed to realize its full potential in the context of modern collider physics analyses.

Organised by

S. Algeri, O. Behnke, L, Brenner, L. Lyons, N. Wardle

Zoom Meeting ID
68793225561
Host
Olaf Behnke
Alternative host
Nicholas Wardle
Passcode
07630691
Useful links
Join via phone
Zoom URL