8โ€“12 Sept 2025
Hamburg, Germany
Europe/Berlin timezone

Running Experience with $\texttt{Optuna}$ for the Extraction of a HEP Signal by $\texttt{XGBoost}$

Not scheduled
30m
Hamburg, Germany

Hamburg, Germany

Poster Track 2: Data Analysis - Algorithms and Tools Poster session with coffee break

Speaker

Umit Sozbilir (Universita e INFN, Bari (IT))

Description

Hyperparameter optimization plays a crucial role in achieving high performance and robustness for machine learning models, such those used in complex classification tasks in High Energy Physics (HEP).
In this study, we investigate and experience the usage of $\texttt{Optuna}$, a rather new, modern and scalable optimization tool in the framework of a realistic signal-versus-background classification scenario carried out by applying $\texttt{XGBoost}$ on CMS Open Data.

The chosen classification task consists in extracting the signal associated to the decay mode of $\mathrm{B}_s \rightarrow \mathrm{J}/\psi(\mu^+\mu^-)~\phi(K^+K^-)$ by means of a gradient boost tree ($\texttt{XGBoost}$) trained on both Monte Carlo simulated signal sample and a background one taken from the data as invariant mass sidebands, in the $\phi(\to K^+K^-)$ spectrum ?. The optimization process of $\texttt{XGBoost}$ is guided by $\texttt{Optuna}$ with the aim to maximize the area under the ROC curve (AUC) while applying an overfitting control mechanism, whereas the Punzi Figure of Merit is used for a performant extraction of the signal within $\texttt{XGBoost}$.

This work demonstrates how $\texttt{Optuna}$ is a suitable tool that enables efficient and effective exploration of the hyperparameter space in commonly used HEP workflows, while providing valuable diagnostics on the automated model optimization.

Significance

This contribution discusses the use of Optuna, a modern hyperparameter optimization framework, within a realistic HEP classification workflow (using CMS Open Data). Although commonly adopted in other data science domains, Optuna is not widely used in HEP yet. We demonstrate that its integration with XGBoost allows an efficient and effective performance when applied to a typical signal-versus-background task to extract the physical signal, also highlighting its efficiency, flexibility, and diagnostic capabilities.

Authors

Adriano Di Florio (CC-IN2P3) Alexis Pompili (Universita e INFN, Bari (IT)) Umit Sozbilir (Universita e INFN, Bari (IT)) Vincenzo Mastrapasqua (Universita e INFN, Bari (IT))

Presentation materials

There are no materials yet.