Ludovico Bianchi (Forschungszentrum Jülich)
The PANDA experiment is a next generation particle detector planned for operation at the FAIR facility, currently under construction in Darmstadt, Germany. PANDA will detect events generated by colliding an antiproton beam on a fixed proton target, allowing studies in hadron spectroscopy, hypernuclei production, open charm and nucleon structure. The nature of hadronic collisions means that signal and background events will look very similar, making a conventional approach, based on a hardware trigger signal generated by a subset of the detectors to start the data acquisition, unfeasible. Instead, data coming from the detector are acquired continuously, and all online selection is performed in real-time. A rejection factor of about 1000 is needed to reduce the data rate for offline storage, making the data acquisition system computationally very challenging. Adoption of Graphical Processing Units (GPUs) in many computing applications is increasing, due to their cost-effectiveness, performance, and accessible and versatile development using high-level programming paradigms such as CUDA or OpenCL. Applications of GPU within HEP include Monte Carlo production, analysis, low- and high-level trigger. Online track reconstruction of charged particles plays an essential part in the event reconstruction and selection process. Our activity within the PANDA collaboration is centered on the development and implementation of particle tracking algorithms on GPUs, and on studying the possibility of performing online tracking using a multi-GPU architecture. Three algorithms are currently under development, using information from the PANDA tracking system: a Hough Transform; a Riemann Track Finder; and a Triplet Finder algorithm, a novel approach finely tuned for the PANDA STT detector. The algorithms are implemented on the GPU in the CUDA C language, utilizing low-level optimizations and non-trivial data packaging in order to exploit to the maximum the capabilities of GPUs. This talk will present details of the implementation of these algorithms, together with first performance results, and solutions for data transfer to and from GPUs based on message queues for a deeper integration of the algorithms with the FairRoot and PandaRoot frameworks, both for online and offline applications.