Speaker
Description
In high performance computing, we strive for algorithms on large arrays to be as performant as possible. However, the performance of such an algorithm is also affected by the memory layout of these arrays. The most natural memory layout is Array-of-Structures (AoS), which performs well for strided access patterns and for large classes. On the other hand, Structures-of-Array (AoS) allows for efficient vectorization upon sequential access.
Switching between different memory layouts usually requires significant changes to the surrounding code. Thus, as part of CERN’s “Next Generation Triggers” project, we have implemented two lightweight C++ libraries to abstract the memory layout from the algorithms operating on it. The first approach uses C++17 template metaprogramming while the second approach uses “reflection”, a new C++26 metaprogramming feature.
The goal of both approaches is to incur zero runtime overhead. Thus, we present benchmarks comparing these approaches to hard-coded memory layouts. Moreover, we use the second approach to show the impact of memory layouts on the performance of the ALICE O2 TPC track reconstruction on CPU and GPU.