Speaker
Description
Due to the large computing resources spent on the detailed (full) simulation of particle transport in the HEP experiments, many efforts have been undertaken to parametrise the detector response. In particular, particle showers developing in the calorimeters are typically the most time-consuming component of simulation, hence their parameterisation is of primary focus.
Fast shower simulation has been explored by different researchers, with several machine learning (ML) models proposed on different shower datasets, including those published in the context of the Calo Challenge. Most of those models are developed and validated against the published shower datasets, without deployment in the experiments frameworks. Speed-up of those models with respect to the full simulation cannot be estimated in such conditions, especially if large batch size is used at the ML inference.
This study presents the basic aspects that create an overhead of the fast shower simulation that should be taken into account for realistic performance calculations. It is based on the Geant4 example, Par04, that was used to produce datasets 2 and 3 in the Calo Challenge. Placement of the energy deposits originating from single showers in the calorimeter is discussed, giving results for several methods, and detailing how it may differ between HEP experiments. Then a second important factor is presented: realistic ML inference batch sizes. A study of benchmark physics events has been done to determine the average number of showers in the proton-proton, as well as electron-positron collisions at future accelerators. Those two factors are important (although not the only ones) in any estimation of the ultimate speedup ML models can achieve once deployed in the experiments' frameworks.