A3D3 all-hands: High-Throughput AI Methods and Infrastructure Workshop

Name: A3D3 all-hands: High-Throughput AI Methods and Infrastructure Workshop
Start: 2023-07-10T01:30:00-07:00
End: 2023-07-14T12:00:00-07:00
Location: University of Washington

10–14 Jul 2023

University of Washington

US/Pacific timezone

Accelerating CNNs on FPGAs for Particle Energy Reconstruction

10 Jul 2023, 19:00

Oak Hall Denny Room

Poster Working dinner

Alexander Joseph Schuy (University of Washington (US))

Given the recent advances of machine learning techniques, the Large Hadron Collider (LHC) at CERN is incorporating deep learning (DL) models, such as DeepCalo, to enhance the quality of data analysis of particle experiments. However, the need for in-time inference to keep up with data generation rates, as well as the dynamics of the experiments, require that the data processing feature short processing latency as well as flexibility to quickly implement different DL models. The LHC plans to use FPGAs (Field Programmable Gate Arrays) to provide timely data analysis via the highly parallel dataflow-based processing and short latency enabled by customized logic. A high level synthesis tool, hls4ml, is also adopted to facilitate design and synthesis of the fully on-chip dataflow architecture which avoids long-latency DRAM accesses. However, the current hls4ml framework has limited support for very large CNN models due to suboptimal data streaming schemes and inefficient processing architectures. The dataflow architecture also requires proper data quantization to efficiently utilize the limited resources within the FPGA. In this paper, we present the first automated design and optimization workflow based on hls4ml to implement DeepCalo models on FPGAs. The current DeepCalo framework is extended and integrated with QKeras layers to perform quantization-aware training to minimize resource consumption while retaining good model quality. A comprehensive exploration is performed on various key design factors, and observations have been summarized as useful design guidelines for future applications. With the proposed workflow, we have shown that the design on a Xilinx Alveo U50 FPGA can significantly outperform the implementations on Ryzen-5600H CPUs and Tesla V100 GPUs by up to 14.1x and 7.9x respectively, and meet the latency requirement of the HLT (High Level Trigger) within the particle
experiment.

Alexander Joseph Schuy (University of Washington (US)) Bo-Cheng Lai ChiJui Chen Dylan Ranklin (University of Pennsylvania) Ling Chi Yang (National Yang Ming Chiao Tung University) Philip Coleman Harris (Massachusetts Inst. of Technology (US)) Scott Hauck Shih-Chieh Hsu (University of Washington Seattle (US)) Yan Lun Huang (National Yang Ming Chiao Tung University) Ziang Yin (University of Washington (US))

DeepCalo A3D3 Workshop 2023 Poster.pdf

A3D3 all-hands: High-Throughput AI Methods and Infrastructure Workshop

Accelerating CNNs on FPGAs for Particle Energy Reconstruction

Oak Hall Denny Room

Speaker

Description

Authors

Presentation materials

Choose timezone

A3D3 all-hands: High-Throughput AI Methods and Infrastructure Workshop

Speaker

Description

Authors

Presentation materials