Fast Machine Learning for Science Conference 2024

Name: Fast Machine Learning for Science Conference 2024
Start: 2024-10-15T08:00:00-04:00
End: 2024-10-18T21:00:00-04:00
Location: Purdue University

15–18 Oct 2024

Purdue University

America/Indiana/Indianapolis timezone

An Efficient Multiply Accumulate Tree for Real-time Quantized Neural Networks

15 Oct 2024, 16:15

Steward Center 306 (Third floor) (Purdue University)

Steward Center 306 (Third floor)

Purdue University

128 Memorial Mall Dr, West Lafayette, IN 47907

Lightning 5 min talk + poster Lighting talks

Chang Sun (California Institute of Technology (US))

Neural networks with a latency requirement at the order of microseconds, like the ones used at the CERN Large Hadron Colliders, are typically deployed on FPGAs fully unrolled. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the number of Multiply Accumulate (MAC) operations in matrix-vector multiplications.

In this work, we present the Multiply Accumulate Tree (MAC tree), an algorithm that optimizes the area usage of fully parallel vector-dot products on chips by exploiting self-similar patterns in the network's weights.

We implement the algorithm with the hls4ml library, a FOSS library for running real-time neural network inference on FPGAs, and compare the resource usage and latency with the original hls4ml implementation on different networks. The results show that the proposed MAC tree can achieve a reduction of LUT utilization by up to 50% in realistic quantized neural networks, while reducing the latency by up to a few folds. Furthermore, the proposed MAC tree provides an accurate estimation of the post-P&R resource utilization (error within ~10%) and reasonably good latency estimation, which can be used during the design phase to optimize the neural networks.

Chang Sun (California Institute of Technology (US))

Jennifer Ngadiuba (FNAL) Prof. Maria Spiropulu (California Institute of Technology)

da4ml-fml2024.pdf

Fast Machine Learning for Science Conference 2024

An Efficient Multiply Accumulate Tree for Real-time Quantized Neural Networks

Steward Center 306 (Third floor)

Purdue University

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

Fast Machine Learning for Science Conference 2024

Speaker

Description

Author

Co-authors

Presentation materials