Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

6–9 Mar 2017
LAL-Orsay
Europe/Zurich timezone

Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

8 Mar 2017, 17:15
30m
LAL-Orsay

LAL-Orsay

1: Parallel and discrete pattern recognition

Speaker

Matthieu Lefebvre (Princeton University (US))

Description

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization.
One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem in the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offline.
Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. We continue to make progress toward the understanding of these processors while progressively introducing more realistic physics. These processors, in particular Xeon Phi, provided a good foundation for porting these algorithms to NVIDIA GPUs, for which parallelization and vectorization is of utmost importance. The challenge lies mostly in the ability to feed these graphical devices with enough data to keep them busy. We also discuss strategies for minimizing code duplication while still being able to keep the previously cited algorithms as close to the hardware as possible.

Primary authors

Dr Giuseppe Cerati (Fermilab (US)) Dr Peter Elmer (Princeton University (US)) Dr Slava Krutelyov (Univ. of California San Diego (US)) Dr Steven R Lantz (Cornell University (US)) Matthieu Lefebvre (Princeton University (US)) Mr Kevin McDermott (Cornell University (US)) Dr Daniel Sherman Riley (Cornell University (US)) Dr Matevz Tadel (Univ. of California San Diego (US)) Prof. Peter Wittich (Cornell University (US)) Prof. Frank Wurthwein (Univ. of California San Diego (US)) Prof. Avi Yagil (Univ. of California San Diego (US))

Presentation materials

Peer reviewing

Paper