ACAT 2024

Name: ACAT 2024
Start: 2024-03-11T08:00:00-04:00
End: 2024-03-15T14:30:00-04:00
Location: Charles B. Wang Center, Stony Brook University

11–15 Mar 2024

Charles B. Wang Center, Stony Brook University

US/Eastern timezone

Contact

acat-loc2024@cern.ch

Scalable GNN Training for Track Finding

13 Mar 2024, 16:15

30m

Charles B. Wang Center, Stony Brook University

100 Circle Rd, Stony Brook, NY 11794

Poster Track 2: Data Analysis - Algorithms and Tools Poster session with coffee break

Alina Lazar (Youngstown State University)

Graph Neural Networks (GNNs) have demonstrated significant performance in addressing the particle track-finding problem in High-Energy Physics (HEP). Traditional algorithms exhibit high computational complexity in this domain as the number of particles increases. This poster addresses the challenges of training GNN models on large, rapidly evolving datasets, a common scenario given the advancements in data generation, collection, and increase in storage capabilities. The computational and GPU memory requirements present significant roadblocks in efficiently training GNNs on large graph structures. One effective strategy to reduce training time is distributed data parallelism on multi-GPUs, which involves averaging gradients across the devices used for training.
This poster will report the speed-up of GNN training time when using distributed data parallelism with different numbers of GPUs and computing nodes. Running GNN training with distributed data parallelism leads to a decrease in accuracy. We are investigating the relationship between the number of devices and model accuracy degradation and strategies to mitigate it. Preliminary results on the TrackML dataset will be reported. GPU nodes from Perlmutter at NERSC will be used to run the experiments.

References

Ju, X., Murnane, D., Calafiura, P., Choma, N., Conlon, S., Farrell, S., ... & Lazar, A. (2021). Performance of a geometric deep learning pipeline for HL-LHC particle tracking. The European Physical Journal C, 81, 1-14.
Lazar, A., Ju, X., Murnane, D., Calafiura, P., Farrell, S., Xu, Y., ... & Lucas, A. (2023, February). Accelerating the Inference of the Exa. TrkX Pipeline. In Journal of Physics: Conference Series (Vol. 2438, No. 1, p. 012008). IOP Publishing.

Significance

As the availability of HPC platforms with multi-GPUs increases, distributed deep learning training becomes an essential tool for exploring and experimenting with cutting-edge deep learning architectures and methodologies. By handling larger datasets and complex models, researchers and HEP scientists can push the boundaries of AI capabilities to improve the physics performance of track-finding experiments.

Experiment context, if any	We report the results of training GNN models on the TrackML dataset. Even if this dataset is based on a simulation of a generic HL-LHC experiment tracker, the results could be extended to design and evaluate particle tracking algorithms for any of the experiments.

Ivan Laduska (Youngstown State University) Brenden Reeves (Youngstown State University) Caroline Manjerovic (Youngstown State University) Alina Lazar (Youngstown State University) Minh-Tuan Pham (University of Wisconsin Madison (US)) Jay Chan (Lawrence Berkeley National Lab. (US)) Daniel Thomas Murnane (Lawrence Berkeley National Lab. (US)) Xiangyang Ju (Lawrence Berkeley National Lab. (US)) Paolo Calafiura (Lawrence Berkeley National Lab. (US))

ACAT24Poster_last.pdf

JPCS_2024_Training_GNN_with_DDP__ACAT_2024.pdf

ACAT 2024

Contact

Scalable GNN Training for Track Finding

Charles B. Wang Center, Stony Brook University

Speaker

Description

References

Significance

Authors

Presentation materials

Peer reviewing

Paper

Choose timezone

ACAT 2024

Contact

Speaker

Description

References

Significance

Authors

Presentation materials

Peer reviewing

Paper