Soon Yung Jun (Fermi National Accelerator Lab. (US))
Performance evaluation and analysis of large-scale computing applications is essential for optimizing the use of resources. As detector simulation is one of the most compute-intensive tasks and Geant4 is the simulation toolkit most widely used in contemporary high energy physics (HEP) experiments, it is important to monitor Geant4 through its development cycle for changes in computing performance and to identify problems and opportunities for code improvements. All development and public releases are being profiled with a set of applications that utilize different input event samples, physics lists, and detector configurations. Results from multiple benchmarking runs are compared to previous public and development reference releases to monitor CPU and memory usage. Observed changes are evaluated and correlated with code modifications. Besides the full summary of call stack data and memory footprint, a detailed call graph analysis is available to Geant4 developers for further analysis. The set of software tools used in the performance evaluation procedure, both in sequential and multi-threaded modes, include FAST, IgProf and Open$\mid$Speedshop. The scalability of the CPU time and memory utilization of multi-threaded applications is evaluated by measuring event throughput and memory usage as a function of the number of threads for selected event samples. We will describe the procedure of Geant4 computing performance profiling and benchmarking and present recent results.