Speaker
Description
We present PQuantML, an open-source library for end-to-end hardware-aware model compression that enables the training and deployment of compact, high-performance neural networks on resource-constrained hardware in physics and beyond. PQuantML abstracts away the low-level details of compression by letting users compress models with a simple configuration file and an API call. It enables the use of pruning and quantization methods and supports layer-wise customization for compression parameters such as how many quantization bits are used for data, weights or biases, granularity of quantization, which pruning method to use, and whether pruning is disabled for a particular layer. It utilizes a global switch to enable or disable pruning or quantization, allowing the user to experiment with both, either jointly or individually. We demonstrate PQuantML on tasks such as the jet substructure classification (JSC) at the LHC using the hls4ml jet-tagging dataset, achieving substantial parameter reduction and bit-width reduction while maintaining accuracy.