Speaker
Description
Machine learning model compression methods such as pruning and quantization are critical for enabling efficient inference on resource-constrained hardware. Compression methods are developed independently, and while some libraries attempt to unify these methods under a common interface, they lack integration with hardware deployment frameworks like hls4ml. To bridge this gap, we present PQuant, a Python library that streamlines the training and compression of machine learning models. PQuant offers an interface for applying diverse pruning and quantization methods, making it accessible to users without deep expertise in compression while still supporting advanced configuration. Notably, integration with hls4ml is ongoing, which will enable deployment of compressed models to FPGA-based accelerators. This will make PQuant a practical tool for both researchers exploring compression strategies and engineers aiming for efficient inference on edge devices and custom hardware platforms.
Would you like to be considered for an oral presentation? | Yes |
---|