ACAT 2025

Name: ACAT 2025
Start: 2025-09-08T08:00:00+02:00
End: 2025-09-12T16:30:00+02:00
Location: Hamburg, Germany

8–12 Sept 2025

Hamburg, Germany

Europe/Berlin timezone

Hyperparameter Transfer for Graph Transformers

9 Sept 2025, 17:20

20m

ESA B

Oral Track 2: Data Analysis - Algorithms and Tools Track 2: Data Analysis - Algorithms and Tools

Gage DeZoort (Princeton University (US))

Modern machine learning (ML) algorithms are sensitive to the specification of non-trainable parameters called hyperparameters (e.g., learning rate or weight decay). Without guiding principles, hyperparameter optimization is the computationally expensive process of sweeping over various model sizes and, at each, re-training the model over a grid of hyperparameter settings. However, recent progress from the ML theory community has given a prescription for scaling hyperparameters with respect to model size such that (1) the optimal hyperparameters identified for small models of a fixed architecture are the same for their larger counterparts (hyperparameter transfer) and (2) larger models perform better than their smaller counterparts (limiting behavior). When satisfied, these desiderata yield large computational savings and stable performance useful for computing, for example, neural scaling laws. In this talk, we will present a recipe for achieving hyperparameter transfer and limiting behavior in graph transformers, transformer variants combining simple message passing with sparse attention computed over the edges of each input graph. Though relatively new, graph transformers have been shown to outperform simple GNNs and transformers on a variety of benchmark tasks, and have particular relevance to scientific datasets where edges may encode known physical interactions and measurements. We will demonstrate the promise of these principled graph transformers on benchmark datasets and encourage discussion about how these results may be extended to tackle more challenging scenarios in particle physics.

Significance

These results are novel and represent the first time principles for hyperparameter transfer and limiting behavior have been applied to graph transformers. These results will make significant impact on the particle physics community because (1) scientists have not widely adopted these powerful model scalings and (2) particle physicists have a particular emphasis on graph-structured data due to the sparse and irregular nature of collider data.

Experiment context, if any	N/A

Gage DeZoort (Princeton University (US))

Prof. Boris Hanin (Princeton University)

hp_transfer_gts-ACAT_2025.pdf

ACAT 2025

Hyperparameter Transfer for Graph Transformers

ESA B

Speaker

Description

Significance

Author

Co-author

Presentation materials

Choose timezone

ACAT 2025

Speaker

Description

Significance

Author

Co-author

Presentation materials