CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Kalman filter tracking on parallel architectures

12 Oct 2016, 11:15

15m

GG C1 (San Francisco Mariott Marquis)

GG C1

San Francisco Mariott Marquis

Oral Track 2: Offline Computing Track 2: Offline Computing

Daniel Sherman Riley (Cornell University (US))

Limits on power dissipation have pushed CPUs to grow in parallel processing capabilities rather than clock rate, leading to the rise of "manycore" or GPU-like processors. In order to achieve the best performance, applications must be able to take full advantage of vector units across multiple cores, or some analogous arrangement on an accelerator card. Such parallel performance is becoming a critical requirement for methods to reconstruct the tracks of charged particles at the Large Hadron Collider and, in the future, at the High Luminosity LHC. This is because the steady increase in luminosity is causing an exponential growth in the overall event reconstruction time, and tracking is by far the most demanding task for both online and offline processing. Many past and present collider experiments adopted Kalman filter-based algorithms for tracking because of their robustness and their excellent physics performance, especially for solid state detectors where material interactions play a significant role. We report on the progress of our studies towards a Kalman filter track reconstruction algorithm with optimal performance on manycore architectures. The combinatorial structure of these algorithms is not immediately compatible with an efficient SIMD (or SIMT) implementation; the challenge for us is to recast the existing software so it can readily generate hundreds of shared-memory threads that exploit the underlying instruction set of modern processors. We show how the data and associated tasks can be organized in a way that is conducive to both multithreading and vectorization. We demonstrate very good performance on Intel Xeon and Xeon Phi architectures, as well as promising first results on NVIDIA GPUs. We discuss the current limitations and the plan to achieve full scalability and efficiency in collision data processing.

Primary Keyword (Mandatory)	Reconstruction
Secondary Keyword (Optional)	Parallelizarion
Tertiary Keyword (Optional)	Algorithms

Avi Yagil (Univ. of California San Diego (US)) Daniel Sherman Riley (Cornell University (US)) Frank Wuerthwein (Univ. of California San Diego (US)) Giuseppe Cerati (Univ. of California San Diego (US)) Kevin McDermott (Cornell University (US)) Matevz Tadel (Univ. of California San Diego (US)) Matthieu Lefebvre (Princeton University (US)) Peter Elmer (Princeton University (US)) Peter Wittich (Cornell University (US)) Slava Krutelyov (Univ. of California San Diego (US)) Steven R Lantz (Cornell University (US))

Oral-115-v8.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Kalman filter tracking on parallel architectures

GG C1

San Francisco Mariott Marquis

Speaker

Description

Authors

Presentation materials