1–5 Sept 2025
ETH Zurich
Europe/Zurich timezone

End-to-End Neural Network Compression and Deployment for Hardware Acceleration Using PQuant and hls4ml

2 Sept 2025, 13:20
20m
ETH Zurich

ETH Zurich

HIT E 51, Siemens Auditorium, ETH Zurich, Hönggerberg campus, 8093 Zurich, Switzerland
Standard Talk Contributed talks

Speaker

Roope Oskari Niemi

Description

As the demand for efficient machine learning on resource-limited devices grows, model compression techniques like pruning and quantization have become increasingly vital. Despite their importance, these methods are typically developed in isolation, and while some libraries attempt to offer unified interfaces for compression, they often lack support for deployment tools such as hls4ml. To bridge this gap, we developed PQuant, a Python library designed to streamline the process of training and compressing machine learning models. PQuant offers a unified interface for applying a range of pruning and quantization techniques, catering to users with minimal background in compression while still providing detailed configuration options for advanced use. Notably, it features built-in compatibility with hls4ml, enabling seamless deployment of compressed models on FPGA-based accelerators. This makes PQuant a versatile resource for both researchers exploring compression strategies and developers targeting efficient implementation on edge devices or custom hardware platforms. We will present the PQuant library, the performance of several compression algorithms implemented with it, and demonstrate the conversion flow of a neural network model from an uncompressed state to optimized firmware for FPGAs.

Author

Co-authors

Chang Sun (California Institute of Technology (US)) Anastasiia Petrovych (CERN) Enrico Lupi (CERN, INFN Padova (IT)) Dimitrios Danopoulos (CERN) Arghya Ranjan Das (Purdue University (US)) Sebastian Dittmeier (Ruprecht-Karls-Universitaet Heidelberg (DE)) Michael Kagan (SLAC National Accelerator Laboratory (US)) Miaoyuan Liu (Purdue University (US)) Vladimir Loncar (CERN)

Presentation materials