Fast Machine Learning for Science Conference 2024

Name: Fast Machine Learning for Science Conference 2024
Start: 2024-10-15T08:00:00-04:00
End: 2024-10-18T21:00:00-04:00
Location: Purdue University

15–18 Oct 2024

Purdue University

America/Indiana/Indianapolis timezone

A gradient-based hardware-aware neural architecture search framework for hls4ml

15 Oct 2024, 16:20

Steward Center 306 (Third floor) (Purdue University)

Steward Center 306 (Third floor)

Purdue University

128 Memorial Mall Dr, West Lafayette, IN 47907

Poster Lighting talks

ChiJui Chen

In software-hardware co-design, balancing performance with hardware constraints is critical, especially when using FPGAs for high-energy physics (HEP) applications with hls4ml. Limited resources and stringent latency requirements exacerbate this challenge. Existing frameworks such as AutoQKeras use Bayesian optimization to balance model size/energy and accuracy, but they are time-consuming, rely on early-stage training that can lead to inaccurate configuration evaluations, and often require significant trial and error. In addition, these metrics often do not reflect actual hardware usage.

In this work, we present a gradient-based Neural Architecture Search (NAS) framework tailored for hardware-aware optimization within the hls4ml workflow. Our approach incorporates practical hardware resource metrics into the search process and dynamically adapts to different HLS designs, tool versions, and FPGA devices. Unlike AutoQKeras, our design is fully trained during the search process, requiring only minimal fine-tuning afterwards. This framework allows users to efficiently explore trade-offs between model performance and hardware usage for their specific tasks in a single shot. Key contributions include: (1) a user-friendly interface for easy customization of the search space; (2) deep integration with hls4ml, allowing users to define and experiment with their own HLS synthesis configurations for FPGA; and (3) flexibility, allowing users to define custom hardware metrics for optimization, such as combinations of multiple FPGA resources.

We demonstrate the effectiveness of our approach using a 1.8M parameter convolutional neural network for an energy reconstruction task in calorimeters. Compared to the baseline model, the searched model achieved a 48.01% reduction in parameter count, as well as reductions in LUT usage of 29.73%, FF of 31.62%, BRAM of 16.06%, and DSP of 23.92%, with only a 0.84% degradation in MAE. The entire search process took approximately 2 GPU hours, demonstrating the potential of our framework to accelerate FPGA deployment in resource-constrained environments. Furthermore, this method can be extended beyond HEP to enable more efficient and scalable FPGA deployments in various fields, such as edge computing and autonomous systems.

Focus areas	HEP

ChiJui Chen

Bo-Cheng Lai

FastML2024_GradientNASHLS4ML_ChiJuiChen.pdf

Fast Machine Learning for Science Conference 2024

A gradient-based hardware-aware neural architecture search framework for hls4ml

Steward Center 306 (Third floor)

Purdue University

Speaker

Description

Author

Co-author

Presentation materials

Choose timezone

Fast Machine Learning for Science Conference 2024

Speaker

Description

Author

Co-author

Presentation materials