Fast Machine Learning for Science Conference 2024

Name: Fast Machine Learning for Science Conference 2024
Start: 2024-10-15T08:00:00-04:00
End: 2024-10-18T21:00:00-04:00
Location: Purdue University

15–18 Oct 2024

Purdue University

America/Indiana/Indianapolis timezone

S-QUARK: A Scalable Quantization-Aware Training Framework for FPGA Deployment based on Keras-v3

15 Oct 2024, 15:30

Steward Center 306 (Third floor) (Purdue University)

Steward Center 306 (Third floor)

Purdue University

128 Memorial Mall Dr, West Lafayette, IN 47907

Lightning 5 min talk + poster Lighting talks

Chang Sun (California Institute of Technology (US))

In this work, we present the Scalable QUantization-Aware Real-time Keras (S-QUARK), an advanced quantization-aware training (QAT) framework for efficient FPGAs inference built on top of Keras-v3, supporting all Tensorflow, JAX, and PyTorch backends.

The framework inherits all perks from the High Granularity Quantization (HGQ) library, and extends it to support fixed-point numbers with different overflow modes and different parametrization of the fixed-point quantizers. Furthermore, it extends the HGQ library to support bit-accurate softmax and multi-head attention layers. Bit-exact minifloat quantizer with differentiable mantissa and exponent bits, as well as the exponent bias, are also supported.

On the TensorFlow and JAX backend, all layers provided by the framework support JIT compilation, which can significantly speed up the training process when the training process is io-bound. The speedup ranges from 1.5x to more than 3x compared to the HGQ framework, and has 10% to 100% overhead in training performance over the native TensorFlow or JAX with Keras implementation, depending on the exact model, dataset, and the hardware used.

The library is available under the LGPLv3 license at https://github.com/calad0i/s-quark.

Chang Sun (California Institute of Technology (US))

Jennifer Ngadiuba (FNAL) Prof. Maria Spiropulu (California Institute of Technology) Vladimir Loncar (Massachusetts Inst. of Technology (US))

squark-fml2024.pdf

Fast Machine Learning for Science Conference 2024

S-QUARK: A Scalable Quantization-Aware Training Framework for FPGA Deployment based on Keras-v3

Steward Center 306 (Third floor)

Purdue University

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

Fast Machine Learning for Science Conference 2024

Speaker

Description

Author

Co-authors

Presentation materials