25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

PQuantML: A Tool for End-to-End Hardware-aware Model Compression

28 May 2026, 14:57
18m
Chulalongkorn University

Chulalongkorn University

Oral Presentation Track 2 - Online and real-time computing Track 2 - Online and real-time computing

Speaker

Roope Oskari Niemi

Description

We present PQuantML, an open-source library for end-to-end hardware-aware model compression that enables the training and deployment of compact, high-performance neural networks on resource-constrained hardware in physics and beyond. PQuantML abstracts away the low-level details of compression by letting users compress models with a simple configuration file and an API call. It enables the use of pruning and quantization methods and supports layer-wise customization for compression parameters such as how many quantization bits are used for data, weights or biases, granularity of quantization, which pruning method to use, and whether pruning is disabled for a particular layer. It utilizes a global switch to enable or disable pruning or quantization, allowing the user to experiment with both, either jointly or individually. We demonstrate PQuantML on tasks such as the jet substructure classification (JSC) at the LHC using the hls4ml jet-tagging dataset, achieving substantial parameter reduction and bit-width reduction while maintaining accuracy.

Author

Co-authors

Anastasiia Petrovych (CERN) Arghya Ranjan Das (Purdue University (US)) Enrico Lupi (CERN, University of Padova) Chang Sun (California Institute of Technology (US)) Dimitrios Danopoulos (CERN) Marlon Helbing Miaoyuan Liu (Purdue University (US)) Michael Kagan (SLAC National Accelerator Laboratory (US)) Vladimir Loncar (University of Belgrade (RS)) Maurizio Pierini (CERN)

Presentation materials

There are no materials yet.