19–23 May 2025
CERN
Europe/Zurich timezone

PQuant: A Tool for End-to-End Hardware-Aware Model Compression

23 May 2025, 11:30
20m
222/R-001 (CERN)

222/R-001

CERN

200
Show room on map
Contributed talk 5 Fast ML: Application of ML to DAQ/Trigger/Real Time Analysis/Edge Computing Contributed Talks

Speaker

Roope Oskari Niemi

Description

Machine learning model compression methods such as pruning and quantization are critical for enabling efficient inference on resource-constrained hardware. Compression methods are developed independently, and while some libraries attempt to unify these methods under a common interface, they lack integration with hardware deployment frameworks like hls4ml. To bridge this gap, we present PQuant, a Python library that streamlines the training and compression of machine learning models. PQuant offers an interface for applying diverse pruning and quantization methods, making it accessible to users without deep expertise in compression while still supporting advanced configuration. Notably, integration with hls4ml is ongoing, which will enable deployment of compressed models to FPGA-based accelerators. This will make PQuant a practical tool for both researchers exploring compression strategies and engineers aiming for efficient inference on edge devices and custom hardware platforms.

Would you like to be considered for an oral presentation? Yes

Author

Co-authors

Chang Sun (California Institute of Technology (US)) Anastasiia Petrovych (CERN) Dr Enrico Lupi (CERN, INFN Padova (IT)) Dimitrios Danopoulos (CERN) Sebastian Dittmeier (Ruprecht-Karls-Universitaet Heidelberg (DE)) Michael Kagan (SLAC National Accelerator Laboratory (US)) Vladimir Loncar (CERN)

Presentation materials