10–14 Jul 2023
University of Washington
US/Pacific timezone

Accelerating CNNs on FPGAs for Particle Energy Reconstruction

10 Jul 2023, 19:00
2h
Oak Hall Denny Room

Oak Hall Denny Room

Speaker

Alexander Joseph Schuy (University of Washington (US))

Description

Given the recent advances of machine learning techniques, the Large Hadron Collider (LHC) at CERN is incorporating deep learning (DL) models, such as DeepCalo, to enhance the quality of data analysis of particle experiments. However, the need for in-time inference to keep up with data generation rates, as well as the dynamics of the experiments, require that the data processing feature short processing latency as well as flexibility to quickly implement different DL models. The LHC plans to use FPGAs (Field Programmable Gate Arrays) to provide timely data analysis via the highly parallel dataflow-based processing and short latency enabled by customized logic. A high level synthesis tool, hls4ml, is also adopted to facilitate design and synthesis of the fully on-chip dataflow architecture which avoids long-latency DRAM accesses. However, the current hls4ml framework has limited support for very large CNN models due to suboptimal data streaming schemes and inefficient processing architectures. The dataflow architecture also requires proper data quantization to efficiently utilize the limited resources within the FPGA. In this paper, we present the first automated design and optimization workflow based on hls4ml to implement DeepCalo models on FPGAs. The current DeepCalo framework is extended and integrated with QKeras layers to perform quantization-aware training to minimize resource consumption while retaining good model quality. A comprehensive exploration is performed on various key design factors, and observations have been summarized as useful design guidelines for future applications. With the proposed workflow, we have shown that the design on a Xilinx Alveo U50 FPGA can significantly outperform the implementations on Ryzen-5600H CPUs and Tesla V100 GPUs by up to 14.1x and 7.9x respectively, and meet the latency requirement of the HLT (High Level Trigger) within the particle
experiment.

Authors

Alexander Joseph Schuy (University of Washington (US)) Bo-Cheng Lai ChiJui Chen Dylan Ranklin (University of Pennsylvania) Ling Chi Yang (National Yang Ming Chiao Tung University) Philip Coleman Harris (Massachusetts Inst. of Technology (US)) Scott Hauck Shih-Chieh Hsu (University of Washington Seattle (US)) Yan Lun Huang (National Yang Ming Chiao Tung University) Ziang Yin (University of Washington (US))

Presentation materials