14-18 October 2013
Amsterdam, Beurs van Berlage
Europe/Amsterdam timezone

High Energy Electromagnetic Particle Transportation on the GPU

14 Oct 2013, 13:53
22m
Effectenbeurszaal (Amsterdam, Beurs van Berlage)

Effectenbeurszaal

Amsterdam, Beurs van Berlage

Oral presentation to parallel session Software Engineering, Parallelism & Multi-Core Software Engineering, Parallelism & Multi-Core

Speaker

Philippe Canal (Fermi National Accelerator Lab. (US))

Description

We will present massively parallel high energy electromagnetic particle transportation through a finely segmented detector in the Graphic Processor Unit (GPU). Simulating events of energetic particle decay in a general-purpose high energy physics (HEP) detector requires intensive computing resources, due to the complexity of the geometry as well as physics processes applied to particles copiously produced by primary collisions and secondary interactions. The recent advent of hardware architectures of many-core or accelerated processors provides the variety of concurrent programming models applicable not only for the high performance parallel computing, but also for the conventional computing intensive application such as the HEP detector simulation. The component of the transportation prototype consists of a transportation process under a non-uniform magnetic field, a geometry navigation with a set of solid shapes and materials, electromagnetic physics processes for electrons and photons, and an interface to a framework that dispatches bundles of tracks in a highly vectorized manner optimizing for spatial locality and throughput. Core algorithms and methods are excerpted from the Geant4 toolkit, and are modified and optimized for the GPU application. Programs written in C/C++ are designed to be compatible with CUDA and openCL and generic enough for future variations of programming models and hardware architectures. Used with multiple streams, asynchronous kernel executions are overlapped with concurrent data transfers of streams of tracks to balance arithmetic intensity and memory bandwidth. Issues with floating point accuracy, random number generation, data structure, kernel divergences and register spills are also considered. Performance evaluation for the relative speedup compared to the corresponding sequential execution on CPU will be presented as well.

Primary author

Soon Yung Jun (Fermi National Accelerator Lab. (US))

Co-authors

Jim Kowalkowski (Fermilab) John Apostolakis (CERN) Dr Marc Paterno (Fermilab) Philippe Canal (Fermi National Accelerator Lab. (US)) Victor Daniel Elvira (Fermi National Accelerator Lab. (US))

Presentation Materials