November 30, 2020 to December 3, 2020
Large and compressed Convolutional Neural Networks on FPGAs with hls4ml

Nov 30, 2020
Vladimir Loncar (CERN)


We present ultra low-latency Deep Neural Networks with large convolutional layers on FPGAs using the hls4ml library. Taking benchmark models trained on public datasets, we discuss various options to reduce the model size and, consequently, the FPGA resource consumption: pruning, quantization to fixed precision, and extreme quantization down to binary or ternary precision. We demonstrate how inference latencies of O(10) micro seconds can be obtained while high accuracy is maintained

