The 2020 upgrade of the LHCb detector will vastly increase the rate of collisions the Online system needs to process in software in order to filter events in real time. 30 million collisions per second will pass through a selection chain where each step is executed conditional to its prior acceptance.
The Kalman filter is a process of the event reconstruction that, due to its time characteristics and early execution in the selection chain, consumes 40% of the whole reconstruction time in the current trigger software. This makes it a time-critical component as the LHCb trigger evolves into a full software trigger in the Upgrade.
The algorithm Cross Kalman allows execution and performance tests across a variety of architectures, including multi and many core platforms and has been successfully integrated and validated in the LHCb codebase. Since its inception, new hardware architectures have become available exposing features that require fine-grained tuning in order to fully utilize their resources.
In this paper we present performance benchmarks and explore the Intel Skylake and latest generation Intel Xeon Phi architectures in depth. We determine the performance gain over previous architectures and show that the efficiency of our implementation is close to the maximum attainable given the mathematical formulation of our problem.