

# Design and implementation of Neural Network based conditions for the CMS Level-1 Global Trigger upgrade for the HL-LHC

<u>Gabriele Bortolato</u><sup>1,2</sup>, Maria Cepeda<sup>3</sup>, Jaana Heikkilä<sup>4</sup>, Benjamin Huber<sup>1,5</sup>, Elias Leutgeb<sup>1,5</sup>, Dinyar Rabady<sup>1</sup>, Hannes Sakulin<sup>1</sup> on behalf of the CMS Collaboration

<sup>1</sup>CERN, <sup>2</sup>Universitá degli Studi di Padova, <sup>3</sup>CIEMAT, <sup>4</sup> Universität Zürich, <sup>5</sup> Technische Universität Wien



## Overview

At the CMS experiment, a two-layer trigger system is used to decide which collision events to store for later analysis. To ensure the physics performance is maintained or even improved under the new high-luminosity conditions during Phase-2 operation, the CMS Level-1 Trigger is being entirely redesigned. Besides cut-based triggers, the Global Trigger will also apply novel machine-learning-based conditions on trigger objects identified by the upstream systems. These triggers rely on the full event topology to trigger on previously inaccessible events.



Model Evaluation

Neural Network development workflow

Step 1: Model definitionModel definition and training with thecommonly used frameworks.



#### Step 2: Optimizations

Hyperparameter quantization, connection pruning and knowledge distillation.



Step 3: FPGA porting

finally to FPGA language [1]

Python model translation to HLS and

From high level (Python) to hardware level (VHDL/Verilog) language to FPGA fabric.

## Anomaly detection vs. signature based models

Two different flavours of neural networks are considered: deep binary classifiers and deep auto-encoders. The first is designed to distinguish a specific signal signature, while the second aims to characterize as much as possible the background and identify anything that does not resemble it marking it as anomalous.



Illustration: the binary classifier efficiency at a given rate is taken as reference, while the autoencoder efficiency is expressed relative to it.

| Model              | Framework    | Prune | Quant <sup>1</sup>     | LUT[k] | FF[k] | DSP | Lat [ns] | Eff/Ef<br>HH | <sup>f</sup> BinaryB<br>t <del>T</del> | <i>aseline</i><br>VBF |
|--------------------|--------------|-------|------------------------|--------|-------|-----|----------|--------------|----------------------------------------|-----------------------|
| Baseline AE        | TensorFlow   | 0%    | FP32                   | _      | -     | -   | -        | 70.6%        | 60.7%                                  | 36.7%                 |
| hls4ml AE          | <u>hls</u> m | 50%   | <8,1/2> <sup>2</sup>   | 42     | 15    | 301 | 70.8     | 70.5%        | 60.7%                                  | 36.7%                 |
| Baseline HH        | TensorFlow   | 0%    | FP32                   | -      | -     | -   | -        | 100.0%       | -                                      | -                     |
| hls4ml HH          | <u>hls</u> m | 50%   | <6/8,1/4> <sup>2</sup> | 4.6    | 2.3   | 19  | 33.3     | 98.3%        | -                                      | _                     |
| Baseline <i>tt</i> | TensorFlow   | 0%    | FP32                   | -      | -     | -   | -        | -            | 100.0%                                 | -                     |
| hls4ml <i>tī</i>   | <u>hls</u> m | 50%   | <6/8,1/4> <sup>2</sup> | 5.4    | 2.4   | 20  | 33.3     | -            | 98.9%                                  | _                     |
| Baseline VBF       | TensorFlow   | 0%    | FP32                   | _      | -     | -   | -        | -            | -                                      | 100.0%                |
| hls4ml VBF         | hls ml       | 50%   | <6/8,1/4> <sup>2</sup> | 7.7    | 3.4   | 45  | 33.3     | -            | _                                      | 95.0%                 |

As proof of principle four different signal signatures were considered:

- Minimum bias (as background)
- HHightarrow2b2au
  - **Binary classifier approach**

| L1T Objects       | Subsystem | Variabl        | es     |
|-------------------|-----------|----------------|--------|
| First 6 jets      | CL2       | р <sub>Т</sub> | $\eta$ |
| First 4 electrons | CL2       | $p_T$          | $\eta$ |
| First 4 muons     | GMT       | $p_T$          | $\eta$ |
| First 2 taus      | CL2       | $p_T$          | $\eta$ |
| Missing energy    | CL2       | $E_T^{miss}$   | -      |



**Supervised training**: background and signal labels are known from the start

- VBF $\rightarrow au au$ -  $t \overline{t}$  decay

#### Auto-encoder approach

| L1T Objects       | Subsystem | Variables                   |
|-------------------|-----------|-----------------------------|
| First 6 jets      | CL2       | $p_T$ $\eta$ $\phi$         |
| First 4 electrons | CL2       | $p_T$ $\eta$ $\phi$         |
| First 4 muons     | GMT       | $p_T$ $\eta$ $\phi$         |
| First 2 taus      | CL2       | $p_T$ $\eta$ $\phi$         |
| Missing energy    | CL2       | $ {E}_T^{miss} $ - $ \phi $ |



**Unsupervised training** + **knowledge distillation**: Teacher is trained with only the background, while the student uses background and random samples

Multiple optimizations take place during and after training: hyperparameter quantization, pruning of synapses, knowledge distillation (only for auto-encoder) and input selection. Each signal signature requires its own trained binary classifier model, while the auto-encoder model is trained with only the minimum bias sample and for this reason it's

<sup>1</sup>In terms of <total,integer> bit width; <sup>2</sup> Weights and biases have two different quantizations

### Hardware implementation

The neural network block is deployed on a Serenity [2] board equipped with a Virtex Ultrascale+ (VU9P) FPGA.

The neural-network based algorithms have been integrated in the Global Trigger (GT) pre-production firmware [3] that is based on the EMP framework [4].

| Site Type | Synth      | Impl       |
|-----------|------------|------------|
| CLB LUTs  | 218k (22%) | 320k (27%) |
| CLB Regs  | 509k (22%) | 452k (19%) |
| BRAM      | 475 (22%)  | 723 (33%)  |
| DSPs      | 150 (2%)   | 1290 (19%) |

The GT firmware demultiplexes data received from EMP data region buffers and distributes the data collections to all SLRs. For testing purposes one anomaly detection trigger and the three binary classifier models are placed once per each SLR alongside their input interfaces.



model independent.

### **Custom interface to the Phase-2 Global Trigger framework**

Serial data from upstream systems is streamed at 480 MHz in collections of 12 objects. These data need to be deserialized, re-scaled and re-mapped in order to be fed into the NN module resulting in one wide bit-vector every 25 ns. NN block runs at 240 MHz, which is a good compromise between register usage and latency.



The input interface module is entirely written in VHDL and it's model specific, e.g. bitwidth, number of inputs and re-scale parameters.

GT demultiplexers and distributionAnomaly detectionEMP TTC & DMANeural Network interfaceBinary classifiersEMP link buffers

### Reference

- [1] Javier Duarte et al. "*Fast inference of deep neural networks in FPGAs for particle physics*", DOI: 10.1088/1748-0221/13/07/P07027
- [2] Andrew Rose et al. "Serenity: An ATCA prototyping platform for CMS Phase-2", DOI: 10.22323/1.343.0115
- [3] Hannes Sakulin et al. "Architecture and prototype of the CMS Global Level-1 Trigger for Phase-2", DOI: 10.1088/1748-0221/18/01/C01034
- [4] EMP Framework https://serenity.web.cern.ch/serenity/emp-fwk/



# Contacts

gabriele.bortolato@cern.ch

cms-l1t-p2gt@cern.ch