PulseDL: A reconfigurable deep learning array processor dedicated to pulse characterization for high energy physics detectors

18 Dec 2019, 10:00
20m
Sun: B1F-Meeting rooms#4-6; Mon-Wed: B2F-RAN (International Conference Center Hiroshima)

Sun: B1F-Meeting rooms#4-6; Mon-Wed: B2F-RAN

International Conference Center Hiroshima

Peace Memorial Park, Hiroshima-shi
ORAL ASICs Session11

Speaker

Pengcheng Ai (Central China Normal University)

Description

CR-RCn shaping circuits and analog-to-digital converters (ADCs) are widely used to process the front-end pulse from detectors in high energy physics. Recovering the information from ADC sampling points can be formulated as a regression problem. Traditional methods (least square fitting, Kalman filtering, etc.) are statistically optimal with linear model and Gaussian noise, whereas non-ideal characteristics of the shaped pulse and detector-dependent drift and fluctuations pose challenge to these methods. In contrast, neural networks exhibit great advantages because of its universal approximation property and its insensitivity to system bias and non-Gaussian noise, which gives a $20\%$ increase in accuracy compared to curve fitting according to a recent paper.
In this work, we implemented a multi-functional neural computing chip for pulse shaping in high energy physics. We adopted a structure of RISC CPU and customized Processing Units (PEs) for balanced power and performance. A $4 \times 4$ PE array was proposed to perform neural computation with concurrency, and each PE performed multiply-accumulate operations with minimum area and latency. A structure combining the spatial and the temporal adder tree for post-PE operations reduced off-chip efforts. Buffers were inserted in the pipeline, and the global control distributed configuration signals. The entire chip was interfaced with the ICB bus as a standalone IP core.
Based on the chip, we co-designed the network architecture to best utilize the logic functions. The network was made up of a ten-layer denoising autoencoder and a three-layer fully-connected network. Convolution, transpose convolution and fully-connected operations were fitted into the hardware with on-chip reconfiguration. This network architecture could effectively suppress non-ideal characteristics of the input time series and improve the precision of extracted information.
Finally, we designed the chip layout following the standard digital ASIC flow. The automatic placement and routing were made under the GSMCR013 130nm process, with $4.9mm \times 4.9mm$ area, at least 25MHz working frequency and 1.2V core voltage. Measured by post-layout simulations, the power efficiency of the chip was estimated to be about 7GOPS/W.

Submission declaration Original and unpublished

Primary authors

Pengcheng Ai (Central China Normal University) Dong Wang (Central China Normal University CCNU (CN)) Prof. Guangming Huang (Central China Normal University) Mr Fan Shen (Central China Normal University) Mr Ni Fang (Central China Normal University) Mr Deli Xu (Central China Normal University) Mr Hui Wang (Central China Normal University) Junling Chen (Central China Normal University CCNU (CN))

Presentation materials