# A 3D Reconstruction Algorithm for the Pre-research of STCF MDC L1 Trigger

Yidi Hao<sup>1,2</sup>, Changqing Feng<sup>1,2</sup>, Wenhao Dong<sup>1,2</sup>, Zixuan Zhou<sup>1,2</sup>, Zhujun Fang<sup>1,2</sup>, Hang Zhou<sup>1,2</sup>, Shubin Liu<sup>1,2</sup>

## Introduction



## Preprocessing

$$\Delta z_0|_{res} = \frac{2\sigma_z}{\sqrt{(N+1)(N+2)}}\sqrt{\left(N+\frac{1}{2}\right) + \frac{3Nr_0}{L_0} + \frac{3Nr_0^2}{L_0^2}}$$

- Stereo TSs near azimuthal angles calculated by 2D



## MLP Training

#### **Basic Structure**

MLP: 24 inputs and 1 output



Training loss of z-vertex with different MLP structures

### **Activation Function**

ReLU & LeakyReLU

• Highly suitable for FPGA implementation when choosing parameter "a" appropriately

 $f(y) = \begin{cases} y & y \ge 0\\ ay & y < 0 \end{cases}$ 

Z-vertex resolution

- MLP with identical structure but different parameters for each  $p_t$  interval
- Training: 60k tracks Test: 100k tracks

|   | - ReLU |  |
|---|--------|--|
| - |        |  |

#### Qkeras

Quantization aware training:

- Train neural networks with *fixed-point* numbers
- Implement in FPGA without further precision loss

| bitwidth*     | 8_1  | 12_4 | 16_6 | 20_8 |
|---------------|------|------|------|------|
| $\Delta z/cm$ | 2.93 | 2.84 | 2.53 | 2.51 |

\*W\_I represents ' $ap_fixed\langle W, I \rangle$ ', indicating a W-bit fixed-point number with I integer bits (including one sign bit).

#### Pruning

Set less salient parameters to zero:

- Significantly reduces the model's size with minor accuracy loss
- *Large-sparse* Models outperform small-dense ones







• Resolution of z-vertex ranges from 1.1cm to 2.5cm in different  $p_t$ intervals

| Structure               | Sparsity | NNZ Params | $\Delta z/cm$ |
|-------------------------|----------|------------|---------------|
|                         | 0        | 2.53k      | 2.43          |
| $\mathbf{A}^1$          | 0.2      | 2.04k      | 2.46          |
| A                       | 0.4      | 1.52k      | 2.56          |
|                         | 0.6      | 1.06k      | 2.90          |
|                         | 0.8      | 0.58k      | 5.05          |
| $\mathbf{B}^2$          | 0.4      | 2.11k      | 2.31          |
| <sup>1</sup> 24-32-32-1 | 6-8-1.   | •          |               |
| <sup>2</sup> 24-48-32-1 | 6-8-1.   |            |               |

Implementation

#### Hls4ml

Hls4ml utilizes High-Level Synthesis (HLS) to convert neural networks trained by Python into the models described in HDL.

- A short development period
- Automatically optimize the logic design based on pruned structure and clock frequency

## **FPGA Resource Utilization**

FPGA: XCKU060 clk: 400MHz

| Structure               | Resource Utilization |      |          |            |            |
|-------------------------|----------------------|------|----------|------------|------------|
| Suuciure                | Sparsity             | BRAM | DSP      | FF         | LUT        |
| $\mathbf{A}^1$          | 0                    | 0    | 610(22%) | 80769(12%) | 87014(26%) |
| A                       | 0.4                  | 0    | 369(13%) | 58293(8%)  | 52650(15%) |
| $\mathbf{B}^2$          | 0.4                  | 0    | 503(18%) | 77026(11%) | 72034(21%) |
| 124-32-32-1             | 6-8-1.               |      |          |            |            |
| <sup>2</sup> 24-48-32-1 | 16-8-1.              |      |          |            |            |



#### Simulation

Dead time: 4 clks Latency: 37 clks (unpruned structure A); 30 clks (pruned structure B)

## Conclusion

- The resolution of z-vertex is better than 3cm in all  $p_t$  intervals and the algorithm is supposed to reject beam background with a  $\pm 3\sigma$  interval (about 10cm).
- With Qkeras and Pruning, the fixed-point MLP achieved comparable resolution with fewer parameters, thus reducing the consumption of FPGA resources.
- The MLP IP generated by hls4ml and HLS perfectly meets our need with a latency less than 100ns.
- The algorithm still needs further optimization to implement all required networks into a single FPGA.



<sup>1</sup>State Key Laboratory of Particle Detection and Electronics, USTC, Hefei 230026, China <sup>2</sup>Department of Modern Physics, USTC, Hefei 230026, China E-mail: ydhao@mail.ustc.edu.cn