29 November 2021 to 3 December 2021
Virtual and IBS Science Culture Center, Daejeon, South Korea
Asia/Seoul timezone

Demonstration of FPGA Acceleration of Monte Carlo Simulation

contribution ID 731
30 Nov 2021, 19:00
20m
S221-A (Virtual and IBS Science Culture Center)

S221-A

Virtual and IBS Science Culture Center

55 EXPO-ro Yuseong-gu Daejeon, South Korea email: library@ibs.re.kr +82 42 878 8299
Oral Track 1: Computing Technology for Physics Research Track 1: Computing Technology for Physics Research

Speaker

Marco Barbone (Imperial College London)

Description

We present results from a stand-alone simulation of electron single coulomb scattering as implemented completely on an FPGA architecture and compared with an identical simulation on a standard CPU. FPGA architectures offer unprecedented speed-up capability for Monte Carlo simulations, however with the caveats of lengthy development cycles and resource limitation particularly in terms of on-chip memory and DSP blocks. As a proof of principle of acceleration on an FPGA we chose a single scattering process of electrons in water at an energy of 6 MeV. The initial code-base was implemented in c++ and optimised for CPU processing. To measure the potential performance gains of FPGAs compared to modern multi-core CPUs we computed 100M histories of a 6 MeV electron interacting in water. The FPGA bit-stream is implemented using MaxCompiler 2021.1 and Vivado 2019.2. MaxCompiler is a High-Level Synthesis (HLS) language that facilitates implementation between CPU and FPGAs; it greatly reduces the development time but does not achieve the same performance as manually optimised VHDL. We did not perform any hardware specific optimisation. We also limited the clock frequency to only 200 MHz, which is easily achievable by any HLS implementation on a modern FPGA. The same arithmetic precision was applied to the FPGA as the CPU implementation. The system configuration comprises an AMD Ryzen 5900x 12-cores CPU running at 3.7 GHz and boosting up to 4.8GHz with a Xilinx's Alveo U200 Data Center accelerator card. The Alveo U200 incorporates a VU9P FPGA device, with a capacity of 1,182,240 LUTs, 2,364,480 FFs, 6,840 DSPs, 4,320 BRAMs and 960 URAMs. The results shows that the FPGA implementation is over 110 times faster than an optimised parallel implementation running on 12-cores and over 270x faster than a sequential single core implementation. For today's market prices, this shows a cost equivalent speed-up of more than 10. The results on both architectures were statistically equivalent. The successful implementation and measured acceleration is very encouraging for future exploits of more generic Monte Carlo simulation on FPGAs for High Energy Physics applications.

Significance

Monte Carlo simulation on an FPGA of electrons scattering in water has not been demonstrated before, together with a direct comparison of the same codebase on a conventional CPU. In addition a significant speed up has been measured which equates to greater than a factor of 10 vs. CPU in a like-for-like cost comparison.

Speaker time zone Compatible with Europe

Primary authors

Marco Barbone (Imperial College London) Dingyu Chen (Imperial College London) Alexander Howard (Imperial College London) Mihaly Novak (CERN) Wayne Luk (Imperial College London)

Presentation materials