AI and deep learning have been widely used and shown great promise in recent scientific research activities. Deep neural network (DNN) models are proven to be highly efficient in big data analytic application for scientific experiments. However, traditional CPU-based sequential computing can no longer meet the requirements of applications which are compute intensive and requiring low latency and high throughput. Heterogeneous computing (HGC), with CPUs integrated with accelerators such as GPUs and FPGAs, offers unique capabilities to accelerate DNNs. Collaborating researchers at SHREC at the University of Florida, NERSC at Lawrence Berkeley National Lab, CERN openlab, Dell EMC, and Intel are studying the application of HGC to scientific problems using DNN models. Our current work focuses on the use of FPGAs to accelerate the inferencing stage of the HGC workflow, using case studies of three state-of-the-art DNN models: HEP-CNN and CosmoGAN developed by NERSC, and 3DGAN developed by CERN openlab.
Based on the Intel Deep Learning Acceleration (DLA) suite from Intel, we developed custom FPGA primitives and optimized the existing architecture for maximizing inferencing performance. Using Intel distribution of OpenVINO, we are able to accelerate the case study models running on an Intel Programmable Acceleration Card (PAC) equipped with an Arria 10 GX FPGA. In the ISC19 IXPUG Workshop, we presented our HGC framework and initial results for both HEP-CNN and CosmoGAN, using the native implementation of OpenVINO. With the help of the custom FPGA primitives in the DLA, we were able to improve the inferencing result for HEP-CNN and make a prediction of the optimal inference performance for CosmoGAN. We achieved a speedup from 3x to 6x for a single Arria 10 GX FPGA against a single core (single thread) of a server-class Intel Skylake CPU.
For the IXPUG Annual Conference 2019, we will present our recent customization of the DLA architecture to implement the 3D convolution FPGA primitives for the 3DGAN model. We will also demonstrate additional improvements in the inference performance for HEP-CNN and CosmoGAN with the new DLA implementation. The details of our DLA customization, along with results in terms of comparison against the Skylake CPU, will be presented in this work.