NGT - Openlab "Optimising Floating Point Precision" Workshop

Name: NGT - Openlab "Optimising Floating Point Precision" Workshop
Start: 2025-07-01T13:00:00+02:00
End: 2025-07-02T18:00:00+02:00
Location: CERN

1–2 Jul 2025

CERN

Europe/Zurich timezone

Contact

Adaptive Floating-Point Quantization for Efficient Neural Networks

2 Jul 2025, 10:30

20m

40/S2-B01 - Salle Bohr (CERN)

40/S2-B01 - Salle Bohr

CERN

100

Show room on map

Nicolo Ghielmetti (CERN)

The rapid growth of deep learning models, particularly Large Language Models (LLMs), which have increased their parameter counts nearly tenfold annually since 2018, has intensified the need for more efficient, power-aware deployment strategies. Quantization is a widely adopted technique for reducing the computational and memory footprint of neural networks by lowering numerical precision.
This work investigates a floating-point quantization approach to adaptively reduce bitwidths for weights and activations while preserving model accuracy. A quantization-oriented methodology is presented, which analyzes the distribution of tensor values to guide the design of custom floating-point formats. Experimental results on Recurrent Neural Networks demonstrate that this approach achieves an average 3.5× reduction in bit usage, with only a 0.5% drop in top-1 accuracy, using quantization-aware training (QAT).
Building on this work, a follow-up contribution extended the AMD/Xilinx deployment flow by enabling support for arbitrary floating-point in the Quantized Neural Network format QONNX, complementing the existing support in the QAT library Brevitas and completing the quantization path toward hardware acceleration with the AMD FPGA NN library FINN.

float-workshop-openlab.pdf

Nicolo_Ghielmetti.mp4

NGT - Openlab "Optimising Floating Point Precision" Workshop

Contact

Adaptive Floating-Point Quantization for Efficient Neural Networks

40/S2-B01 - Salle Bohr

CERN

Speaker

Description

Presentation materials

Choose timezone

NGT - Openlab "Optimising Floating Point Precision" Workshop

Contact

Speaker

Description

Presentation materials