Sep 24 – 27, 2019
CERN
Europe/Zurich timezone

Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit

Sep 24, 2019, 11:15 AM
15m
80/1-001 - Globe of Science and Innovation - 1st Floor (CERN)

80/1-001 - Globe of Science and Innovation - 1st Floor

CERN

60
Show room on map

Speaker

Dr Iosif Meyerov (Lobachevsky State University)

Description

Deep learning is widely used in many problem areas, namely computer vision, natural language processing, bioinformatics, biomedicine, and others. Training neural networks involves searching the optimal weights of the model. It is a computationally intensive procedure, usually performed a limited number of times offline on servers equipped with powerful graphics cards. Inference of deep models implies forward propagation of a neural network. This repeated procedure should be executed as fast as possible on available computational devices (CPUs, embedded devices). A large number of deep models are convolutional, so increasing the performance of convolutional neural networks (CNNs) on Intel CPUs is a practically important task. The Intel Distribution of OpenVINO toolkit includes components that support the development of real-time visual applications. For the efficient CNN inference execution on Intel platforms (Intel CPUs, Intel Processor Graphics, Intel FPGAs, Intel VPUs), the OpenVINO developers provide the Deep Learning Deployment Toolkit (DLDT). It contains tools for platform independent optimizations of network topologies as well as low-level inference optimizations.

In this talk we analyze performance and scalability of several toolkits that provide high-performance CNN-based deep learning inference on Intel platforms. In this regard, we consider two typical data science problems: Image classification (Model: ResNet-50, Dataset: ImageNET) and Object detection (Model: SSD300, Dataset: PASCAL VOC 2012). First, we prepare a set of trained models for the following toolkits: Intel Distribution of OpenVINO toolkit, Intel Caffe, Caffe, and TensorFlow. Then, a sufficiently large set of images is selected from each dataset so that the performance analysis gives accurate results. For each toolkit built using the optimizing Intel compiler, the most appropriate parameters (the batch size, the number of CPU cores used) are experimentally determined. Further, computational experiments are carried out on the Intel Endeavor supercomputer using high-end Skylake and CascadeLake CPUs.

The main contributions of this talk are as follows:
1. Comparison of performance of the Intel Distribution of OpenVINO toolkit and other similar software for CNN-based deep learning inference on Intel platforms.
2. Analysis of scaling efficiency of the OpenVINO toolkit using dozens of CPU cores in a throughput mode.
3. Exploring the results of Intel AVX512 VNNI performance acceleration in Intel CascadeLake CPUs.
4. Analysis of modern CPUs utilization in CNN-based deep learning inference using the Roofline model by means of Intel Advisor.

Presentation materials