The GAP Project - GPU for Realtime Applications in High Level Trigger and Medical Imaging

Not scheduled
15m
Beurs van Berlage

Beurs van Berlage

Poster Technology transfer: 5b) Health and healthcare

Speaker

Massimiliano Fiorini (Universita di Ferrara (IT))

Description

The aim of the GAP project is the deployment of Graphic Processing Units (GPUs) in real-time applications, ranging from online event selection (trigger) in high energy physics (HEP) experiments to medical imaging reconstruction. The final goal of the project is to demonstrate that GPUs have a positive impact in sectors different for rate, bandwidth, and computational intensity. The relevant aspects under study are the analysis of the latency of the system, the optimisation of the computational algorithms, and the integration with the data acquisition system. As a benchmark application we consider the trigger algorithms of two HEP experiments: NA62 and Atlas, different for event complexity and processing latency requirements. In particular we discuss how specific algorithms can be parallelized and thus benefit from the implementation on the GPU architecture, in terms of increased execution speed and more favourable dependency on the complexity of the analyzed events. Such improvements are particularly relevant for the foreseen LHC luminosity upgrade where highly selective algorithms will be crucial to maintain a sustainable trigger rate with the many multiple pp interactions per bunch crossing. We give details on how these devices are integrated in typical trigger systems and benchmark their performances. GPUs can provide a feasible solution also to accelerate the reconstruction of medical images. We discuss the implementation of new computational intense algorithms boosting the performances of Nuclear Magnetic Resonance and Computed Tomography. The deployment of GPUs can significantly reduce the processing time, making it suitable for the use in realtime diagnostic.

Summary

In this contribution we report on the activity of the GAP project, which aims to investigate the deployment of Graphic Processing Units (GPU) in different context of realtime scientific applications. The different areas of interest span across various rates of data processing, bandwidth and computational intensity of the executed algorithms. In this contribution we focus in particular on the applications of GPUs in asynchronous systems such as software trigger systems of particle physics experiment, and reconstruction of nuclear magnetic resonance images. All these application can benefit from the implementation on the massively parallel architecture of GPUs, optimizing different aspects.

As a first application we discuss how specific trigger algorithms can be naturally parallelized and thus benefit from the implementation on the GPU architecture, in terms of execution speed and complexity of the analyzed events. Two benchmark application environment under investigation are the NA62 and Atlas experiments at CERN.
The NA62 experiment aims at the measurement of ultra-rare kaon decays, recording data from the SPS high intensity hadron beam. A selective trigger, based on sequential hardware and software layers, is very important in order to identify in realtime interesting events produced at the level of 1/10-10. The GPUs can be exploited to build offline reconstruction quality trigger primitives, that allow the definition of highly pure and efficient selection criteria. Even if the NA62 collaboration is considering the application of GPUs both in the hardware and software trigger, in this contribution we focus on their implementation on this latter, devoted to reduce the data collection rate from 1 MHz to ~10 kHz. We discuss the benefits achievable from the implementation on GPU of the ring reconstruction algorithms in the NA62 RICH detector and tracking spectrometer. In both cases innovative algorithms have been designed to specifically benefit from the massive parallelism of the GPU architecture.

The Atlas experiment register data from the LHC pp collisions through an hybrid multi-stage trigger. A first synchronous level is based on custom electronics, while the subsequent is asynchronous and based on software algorithm ran on commodity PC farm. The benchmark activity we are carrying out involves the software trigger algorithms used for muon reconstruction in the detector. This is based on the execution for a large number of times of the same algorithms that reconstruct and match segments of particle trajectories in the detector, hence can benefit from a massively parallel execution on GPUs.
We will discuss in details the implementation of such algorithms on a GPU based system. We will characterize the performance of this new implementation, and benchmark it against the present ATLAS muon algorithm performances. The integration of the GPU within the current data acquisition system is done through a server-client structure [1] that can manage different tasks and their execution on a given device, such as the GPU. This element is flexible, able to deal with different computation devices, and is adding almost no overhead on the total latency of the algorithm execution. With the help of this structure it is possible to isolate the muon trigger algorithm itself, and optimize it for the execution on GPU. This will imply the translation to the CUDA programming language and the optimization of the different task that can be naturally parallelized. In such a way the dependency of the execution time on the complexity of the processed events will be reduced. A similar approach has been investigated in the past for the deployment of GPUs in different Atlas trigger algorithms with promising results [2]. The evolution of the foreseen Atlas trigger system, that will merge the higher level trigger layers in a unique software processing stage, can take event more advantage from the use of GPUs. More complex algorithm, with offline- like resolution can be implemented on a thousand-core device with significant speedup factors. The timing comparison between the serial and the parallel implementation of the trigger algorithm is done on the data collected in the past year, and also on simulated data that reproduces the
foreseen data taking conditions with the LHC luminosity upgrade, with increased number of multiple interactions in the collisions.

A similar improvement can be obtained exploiting GPU in medical imaging. This diagnostic techniques, as the Nuclear Magnetic Resonance (NMR) allows to visualize images of the body part through information on diffusion of water molecules. The most advanced elaboration techniques are based on calculation of ~1M non-linear functions, naturally parallelizable and computationally demanding algorithms. In this project we are focusing on the kurtosis diffusion method K [3], that currently takes ~20 hours to precisely reconstruct a brain image. These algorithms, currently implemented in Matlab, can be converted to a parallel version for GPU thanks to available compatibility libraries. Performance measurements will be presented on the parallel implementation of the image reconstruction algorithms and of the Monte Carlo simulation techniques.

[1] The client-server structure is obtained using APE, an Atlas tool developed independently from this project.

[2] D. Emeliyanov, J. Howard, J. Phys.: Conf. Ser. 396 012018, 2012.

[3] J.H. Jensen, J.A. Helpern, NMR Biomed; 23 (7): 698-710, 2010.

Authors

Andrea Messina (CERN) Gianluca Lamanna (Sezione di Pisa (IT)) Dr Giovanni Di Domenico (Universita' di Ferrara) Jacopo Pinzino (Sezione di Pisa (IT)) Marco Corvo (Universita e INFN (IT)) Marco Rescigno (Universita e INFN, Roma I (IT)) Massimiliano Fiorini (Universita di Ferrara (IT)) Matteo Bauce (Universita e INFN, Roma I (IT)) Silvia Capuani (CNR) Stefano Giagu (Universita e INFN, Roma I (IT)) marco palombo (sapienza universita di roma)

Presentation materials

There are no materials yet.