The Beam Longitudinal Dynamics (BLonD) code was developed at CERN since 2014. Since then, BLonD has assisted in identifying and overcoming existing machine limitations, optimizing critical operational parameters, and also exploring the design space of the upcoming upgrades and future projects. BLonD simulations are computationally demanding, to simulate the most complex beam longitudinal dynamics phenomena and ensure the finest prediction accuracy.
In the scope of my PhD thesis, I worked on the runtime performance optimization of the BLonD suite. Together with the BLonD Developers team, we walked a long way from a serial, single-threaded, Python code to a hybrid, distributed code, scalable in 100s of cores and incorporating advanced HPC concepts such as Dynamic Load Balancing, approximate calculations, and even heterogeneous hardware. In this presentation, I will try to summarize our most important achievements and most successful optimization strategies together with the key lessons learned. The BLonD example will be used throughout this presentation, but most of the techniques and tips that I will present are not BLonD-specific, and can be applied to essentially every HPC workload with some minor modifications and tuning.