A3D3 All-hands meeting

Name: A3D3 All-hands meeting
Start: 2024-10-14T08:00:00-04:00
End: 2024-10-14T21:35:00-04:00
Location: Convergence Center @ Purdue University

14 October 2024

Convergence Center @ Purdue University

US/Eastern timezone

ConNAS4ML: Gradient-Based NAS Framework with Hardware Constraints for hls4ml

14 Oct 2024, 15:43

13m

Innovation Room (Convergence Center @ Purdue University)

Innovation Room

Convergence Center @ Purdue University

101 Foundry Dr, West Lafayette, IN 47906

talk Afternoon Plenary

ChiJui Chen

In software-hardware co-design, balancing performance with hardware constraints is critical, especially when using FPGAs for real-time applications in scientific fields with hls4ml. Limited resources and stringent latency requirements exacerbate this challenge. Existing frameworks such as AutoQKeras use Bayesian optimization to balance model size/energy, and accuracy, but they are time-consuming, rely on early-stage training, and often lead to inaccurate configuration evaluations, requiring significant trial and error. Additionally, these metrics often fail to reflect actual hardware usage.

In this work, we present ConNAS4ML, a gradient-based, constraint-aware Neural Architecture Search (NAS) framework for hardware-aware optimization within the hls4ml workflow. Our approach incorporates practical hardware resource metrics into the search process and dynamically adapts to different HLS designs, tool versions, and FPGA devices. Unlike AutoQKeras, ConNAS4ML performs simultaneous training and searching, requiring only minimal fine-tuning afterward. Users can either explore trade-offs between model performance and hardware usage or apply user-defined hardware constraints to ensure selected architectures stay within resource limits while maximizing performance. Key contributions include: (1) a user-friendly interface for customizing search space, hardware metrics, and constraints; (2) deep integration with hls4ml, allowing users to define and experiment with their own HLS synthesis configurations for FPGA; and (3) efficient hardware-aware optimization, exploring architectures under hardware constraints in a single shot manner, avoiding the time-consuming trial and error.

Preliminary results show our approach's effectiveness in two tasks. The first task optimized filter numbers of a 1.8M parameter CNN for energy reconstruction in calorimeters, achieving a 48.01% parameter reduction, and reductions in LUT usage (29.73%), FF (31.62%), BRAM (16.06%), and DSP (23.92%), with only a 0.84% increase in MAE after fine-tuning. The second task focuses on Jet Tagging classification using a precision search under various constraints. Even without fine-tuning, models stayed within constraints, with accuracy differences less than 0.37% from the baseline. Both tasks were efficient, with the architecture search taking 2 GPU hours and the precision search taking 0.26 GPU hours on one GPU. This framework can greatly accelerate FPGA deployment in resource-constrained environments, benefiting various fields beyond HEP such as edge computing and autonomous systems.

ChiJui Chen

Bo-Cheng Lai

A3D32024_ConNAS4ML_ChiJuiChen.pdf

A3D3 All-hands meeting

ConNAS4ML: Gradient-Based NAS Framework with Hardware Constraints for hls4ml

Innovation Room

Convergence Center @ Purdue University

Speaker

Description

Author

Co-author

Presentation materials

Choose timezone

A3D3 All-hands meeting

Speaker

Description

Author

Co-author

Presentation materials