25–28 Sept 2023
Imperial College London
Europe/London timezone

Efficient Quantization of Deep Learning Models for Hardware Acceleration

27 Sept 2023, 13:45
15m
Blackett Laboratory, Lecture Theatre 1 (Imperial College London)

Blackett Laboratory, Lecture Theatre 1

Imperial College London

Blackett Laboratory
Standard Talk Contributed Talks Contributed Talks

Speakers

Cheng ZHANG (Imperial College London)Mr Jianyi Cheng (University of Cambridge)

Description

Today’s deep learning models consume considerable computation and memory resources, leading to significant energy consumption. To address the computation and memory challenges, quantization is often used for storing and computing data as few as possible. However, exploiting efficient quantization for computing a given ML model is challenging, because it affects both the computation accuracy and hardware efficiency. In this work, we propose a fully automated toolflow, named Machine-learning Accelerator System Explorer (MASE), for exploration of efficient arithmetic for quantization and hardware mapping. MASE takes a deep learning model and represents it as a graph representation of both the software model and the hardware accelerator architecture. , This enables both coarse-grained and fine-grained optimization in both software and hardware. MASE implements a collection of arithmetic types, and supports mixed-arithmetic quantization search in mixed precisions. We evaluate our approach on OPT, an open-source version of the GPT model, and show that our approach achieves a 19$\times$ arithmetic density and a 5$\times$ memory density compared to the float32 baseline, surpassing the prior art 8-bit quantisation by 2.5$\times$ in arithmetic density and 1.2$\times$ in memory density.

Authors

Cheng ZHANG (Imperial College London) Mr Jianyi Cheng (University of Cambridge) George Constantinides (Imperial College London) Yiren Zhao

Presentation materials