14-18 October 2013
Amsterdam, Beurs van Berlage
Europe/Amsterdam timezone

Synergia-CUDA: GPU Accelerated Accelerator Modeling Package (video conference)

14 Oct 2013, 14:16
22m
Effectenbeurszaal (Amsterdam, Beurs van Berlage)

Effectenbeurszaal

Amsterdam, Beurs van Berlage

Oral presentation to parallel session Software Engineering, Parallelism & Multi-Core Software Engineering, Parallelism & Multi-Core

Speaker

Qiming Lu (Fermi National Accelerator Laboratory)

Description

Synergia is a parallel, 3-dimensional space-charge particle-in-cell code that is widely used by the accelerator modeling community. We present our work of porting the pure MPI-based code to a hybrid of CPU and GPU computing kernels. The hybrid code uses the CUDA platform, in the same framework as the pure MPI solution. We have implemented a lock-free collaborative charge-deposition algorithm for the GPU, as well as other optimizations, including local communication avoidance for GPUs, customized FFT, and fine-tuned memory access patterns. On a small GPU cluster (up to 4 Tesla C1070 GPUs), our benchmarks exhibit both superior peak performance and better scaling, when compared to a CPU cluster with 16 nodes and 128 cores. We have further compared the code performance on different GPU architectures, including C1070 Tesla, M2070 Fermi, and K20 Kepler. We show 10 to 20% performance increases with optimizations addressing each specific hardware architectures.

Primary author

Qiming Lu (Fermi National Accelerator Laboratory)

Co-author

James Amundson (Fermi National Accelerator Laboratory)

Presentation Materials