Nov 4 – 8, 2019
Adelaide Convention Centre
Australia/Adelaide timezone

MPI-based tools for large-scale training and optimization at HPC sites

Nov 5, 2019, 11:00 AM
Riverbank R4 (Adelaide Convention Centre)

Riverbank R4

Adelaide Convention Centre

Oral Track 9 – Exascale Science Track 9 – Exascale Science


Vladimir Loncar (University of Belgrade (RS))


MPI-learn and MPI-opt are libraries to perform large-scale training and hyper-parameter optimization for deep neural networks. The two libraries, based on Message Passing Interface, allows to perform these tasks on GPU clusters, through different kinds of parallelism. The main characteristic of these libraries is their flexibility: the user has complete freedom in building her own model, thanks to the multi-backend support. In addition, the library supports several cluster architectures, allowing a deployment on multiple platforms. This generality can make this the basis for a train & optimise service for the HEP community. We present scalability results obtained from two typical HEP use-case: jet identification from raw data and shower generation from a GAN model. Results on GPU clusters were obtained at the ORNL TITAN supercomputer ad other HPC facilities, as well as exploiting commercial cloud resources and OpenStack. A comprehensive comparisons of scalability performance across platforms will be presented, together with a detailed description of the libraries and their functionalities.

Consider for promotion Yes

Primary authors

Vladimir Loncar (University of Belgrade (RS)) Jean-Roch Vlimant (California Institute of Technology (US)) Dr Sofia Vallecorsa (CERN) Gul Rukh Khattak (University of Peshawar (PK)) Maurizio Pierini (CERN) Thong Nguyen (California Institute of Technology (US)) Federico Carminati (CERN)

Presentation materials