Portable and efficient vectorization is a significant challenge in large
software projects such as Geant, ROOT, and experiment frameworks.
Nevertheless, taking advantage of the expression of parallelism through
vectorization is required by the future evolution of the landscape of
particle physics, which will be characterized by a drastic increase in
the amount of data produced.
In order to bridge the widening gap between data processing and analysis
needs, and available computing resources, the particle physics scientific
software stack needs to be upgraded to fully exploit SIMD. While
libraries exist that wrap SIMD intrinsics in a convenient way, they
don't always support every available architecture, or perform well only
in a subset of them. This situation needs an improvement.
VecCore provides a solution. It features a simple API to express
SIMD-enabled algorithms that can be dispatched to one or more backends,
such as CUDA, or other widely adopted SIMD libraries such as Vc or
UME::SIMD. In this talk we discuss the programming model associated to
VecCore, the most relevant details of its implementation, and some use
cases in HEP software packages such as ROOT and GeantV. Outlooks on
possible usage in experiments' software are also highlighted.
Perfomance figures from benchmarks on NVidia GPUs, and on Intel Xeon and
Xeon Phi processors are discussed that demonstrate nearly optimal gains
from SIMD parallelism.