# The Level 1 Scouting system of the CMS experiment

Rocco Ardino<sup>ab</sup>, Christian Deldicque<sup>a</sup>, Marc Dobson<sup>a</sup>, Sabrina Giorgetti<sup>ab</sup>, Gaia Grosso<sup>ab</sup>, Thomas James<sup>a</sup>, Emilio Meschi<sup>a</sup>, Matteo Migliorini<sup>c</sup>, Leyla Naz Candoğan<sup>d</sup>, Giovanni Petrucciani<sup>a</sup>, Dinyar Rabady<sup>a</sup>, Attila Racz<sup>a</sup>, Hannes Sakulin<sup>a</sup>, Petr Zejdl<sup>a</sup> for the CMS collaboration

- <sup>a</sup> CERN, Esplanade des Particules 1, Meyrin, 1211, Switzerland
- <sup>b</sup> Universita di Padova, Via VIII Febbraio, 2, Padova, 35122, Italy
- c Istituto Nazionale di Fisica Nucleare Sezione di Padova INFN, Via Francesco Marzolo, 8, 35131, Padova PD, Italy
- <sup>d</sup> Bilkent University, Üniversiteler, 06800 Çankaya/Ankara, Türkiye

E-mail: tom.james@cern.ch

Abstract. A novel data collection system, known as Level-1 (L1) Scouting, is being introduced as part of the L1 trigger of the CMS experiment at the CERN Large Hadron Collider. The L1 trigger of CMS, implemented in FPGA-based hardware, selects events at 100 kHz for full read-out, within a short 3 microsecond latency window. The L1 Scouting system collects and stores the reconstructed particle primitives and intermediate information of the L1 trigger processing chain, at the full 40 MHz bunch crossing rate. This system will provide vast amounts of data for detector diagnostics, luminosity measurements, and the study of otherwise inaccessible signatures, either too common to fit in the L1 accept budget, or with requirements orthogonal to the standard physics triggers. Demonstrator systems consisting of PCIe-based FPGA stream-processing boards and associated host PCs have been deployed at CMS to capture data from a multitude of L1 trigger sub-systems. In addition, a neural-network based re-calibration and fake identification engine has been developed using the Micron Deep Learning Accelerator FPGA framework. An overview of the new system, and the first results from 2022 data taking are shown. Plans and development progress towards the continued expansion of the L1 Scouting system for the High-Luminosity LHC, are also presented.

#### 1. Introduction and motivation

The CMS experiment [1] at the LHC [2] relies on a two-stage trigger system to select the most interesting collision events for read-out and analysis. The first of these, the level 1 (L1) trigger [3] receives only a sub-set of the detector data at each bunch crossing (BX), and must determine whether to trigger the detector for the full readout within around 3  $\mu s$ .

The L1 data scouting (L1DS) system is a novel data collection scheme that acquires intermediate data from the L1 trigger at the full bunch crossing rate of 40 MHz. This allows for analysis of event types that are too frequent to be part of the nominal L1 menu. The scouting system is independent of the L1 trigger, in that it does not feed back to the trigger decision. Using the scouting data, a semi-real time analysis using the L1 trigger objects could be possible, or the storage of a tiny event record ( $\sim$  Kb per event, less than 10% of the standard DAQ data in total). Additionally, diagnostic and monitoring capabilities such as BX-to-BX correlations and independent per-bunch luminosity measurements will be made available.

Within the L1 system, the global muon trigger (GMT) selects the eight best muon candidates from the barrel muon track finder (BMTF), overlap muon track finder (OMTF), and endcap muon track finder (EMTF) for sending to the global trigger (GT). L1DS was first demonstrated in 2018 [4] with the capture of the GMT muon candidates, and has since been scaled up with new FPGA-based boards and inputs.

# 2. The L1 Scouting demonstrator

The L1DS demonstrator consists of a series of FPGA-based boards that receive trigger data over optical data links, and are connected to a host PC over a PCIe interface. As of the beginning of 2023 running, the system receives data from the GMT, the calorimeter trigger, the GT, and the BMTF. The inputs to the L1DS demonstrator are given in Table 1.

**Table 1.** Inputs to the L1DS demonstrator. Muon trigger *super primitives* are track segments made from combining hits in both the drift tube and resistive plate chamber sections of the detector\*.

| Input system        | N 10Gb/s links  | Objects                                                                                      |
|---------------------|-----------------|----------------------------------------------------------------------------------------------|
| GMT                 | 8 + duplicate 8 | Up to 8 GMT final muons, & 8 BMTF intermediate muon candidates                               |
| Calorimeter trigger | 7 + 1 spare     | $e/\gamma$ , tau candidates, jets and energy sums including $E_{\mathrm{T}}^{\mathrm{miss}}$ |
| BMTF                | 24              | BMTF input super primitives*                                                                 |
| GT                  | 18              | Algorithm bits                                                                               |

Currently the L1DS demonstrator uses three different varieties of boards: the Xilinx KCU1500, Xilinx VCU128, and Micron SB852. The hardware features of these boards are described in Table 2. Data from the GMT and calorimeter trigger come through eight 10 Gb/s optical links respectively to a pair of Xilinx KCU1500 boards, which use QSFP interfaces and host a Xilinx KU115 FPGA for connectivity and data preprocessing. The boards also decode the trigger link protocol, align the links with respect to each other, and perform firmware zero-suppression (ZS). For GMT the data rate is reduced by a factor of about 10 during proton-proton collisions with this first stage of ZS, which discards data from any bunch crossings where no muons have been found. Before sending the data to the host PC via PCIe Gen3, using a DMA

engine, the data are buffered in FIFOs. On the host PC, more fine-grained zero suppression is performed in software, reducing the data rate further. Before sending the data to the network over 10 Gb Ethernet, the data are written to a RAM disk. The data are then received on another server where the BZIP2 algorithm compresses the data by another factor of two. The data are then transferred to the Lustre global file system. A duplicate set of GMT muons are sent to the SB852 board, which is used to prototype on-the-fly muon histogramming for use in luminosity measuring, and neural network approaches to the re-calibration and classification of L1 trigger objects, see Section 3. The BMTF super primitives and GT algorithm bits are sent over 24 and 18 links respectively to the VCU128 boards, which is capable of both DMA readout and a unidirectional TCP/IP implementation over 100 Gb/s QSFP that utilises the High Bandwidth Memory (HBM) of the VU37P chip to send data directly to a standard commercial switch, NIC or PC.

**Table 2.** FPGA-based boards used in the L1DS demonstrator. A mezzanine can extend the VCU128 I/O with an additional six QSFP 100Gb/s\*.

| Board         | FPGA  | Optical I/O                       | HBM          | DRAM                   |
|---------------|-------|-----------------------------------|--------------|------------------------|
| Micron SB852  | VU9P  | 2 QSFP 40 Gb/s<br>2 QSFP 100 Gb/s | -            | -<br>Up to 256 GB DDR4 |
| Xilinx VCU128 | VU37P | $4(+6^*)$ QSFP 100 Gb/s           | 8 GB on-chip | -                      |

Figure 2 is generated from a tiny sub-set of the collision data collected by the L1DS in 2022. It depicts the  $p_{\rm T}$  distribution of L1 muons from each track finder, and is an example of what can be collected by the L1DS over just a few seconds of running.

#### 3. Machine learning inference for L1 Scouting

L1 trigger objects are calibrated to give a certain efficiency at an energy/momentum threshold. For this reason, they are not suitable for a direct physics analysis. Neural networks have been developed and trained to re-calibrate the parameters of these objects, such that they may be used in a semi-online analysis. One such neural network uses the L1 muon objects as inputs ( $\eta$ ,  $\phi$ ,  $p_{\rm T}$ , charge, quality), and outputs a best estimate of the true  $\eta$ ,  $\phi$ , and  $p_{\rm T}$  of the muon. The network is trained on L1 muons that are matched to offline reconstructed muon tracks; whereby the offline muon parameters are used as a target for the network. The L1-reco matching requires  $\Delta R < 0.1$ , when measured at the 2nd muon station. The network architecture is shown in Table 3. In practice the network outputs a  $\Delta \eta$ ,  $\Delta \phi$ , and  $\Delta p_{\rm T}$ ; a set of corrections to apply to the L1 muons to improve their precision. The NN is trained with ZeroBias data (data taken without any trigger selection bias) collected by CMS during 2022. ZeroBias data contains both offline reconstructed and L1 muon objects. For training, selection cuts were applied on the L1 muon  $p_{\rm T} < 45~{\rm GeV}$ , because events with muons with  $p_{\rm T}$  above this threshold would likely be triggered by the standard L1 trigger, and so are not of as much interest for L1 scouting.

While the L1 scouting system is not constrained to the strict latency requirement of the L1 trigger pipeline, it must still handle a large throughput of around 2M muons per second in 2023 LHC conditions. In order to accomplish this, we have implemented the neural network in a PCIe-based FPGA board that is also responsible for receiving the data over optical fibres from the L1 system. With the neural network described, the FPGA is capable of around 2.7M



**Figure 1.** The  $p_{\rm T}$  distribution of L1 muons, from the BMTF(red), OMTF(black), and EMTF(pink). Data captured by the L1DS demonstrator from 100k orbits (about 9 seconds of data taking) in 2022 with 2448 colliding bunches at 6.8 TeV beam energy. The coarse binning for the OMTF values is a result of the quantisation in the FPGA logic for estimating the  $p_{\rm T}$ . [5]

**Table 3.** The neural network architecture for the muon recalibration and clasification models. All models use batch normalisation between layers and the relu activation function in each layer. The final layer of the classifier uses the sigmoid function instead.

| Network        | Platform | Inputs                                                                     | Outputs                                                 | N nodes            |
|----------------|----------|----------------------------------------------------------------------------|---------------------------------------------------------|--------------------|
| Recalibration  | HLS4ML   | $\phi, \eta, p_{\mathrm{T}}, \text{ sign, quality}$                        | $\Delta \phi$ , $\Delta \eta$ , $\Delta p_{\mathrm{T}}$ | 128, 128, 128, 128 |
| Recalibration  |          | $\phi, \eta, p_{\mathrm{T}}, \text{ sign}$                                 | $\Delta \phi$ , $\Delta \eta$ , $\Delta p_{\mathrm{T}}$ | 32, 32, 32         |
| Classification |          | $\{\phi, \eta, p_{\mathrm{T}}, \text{ sign, quality}\}^{\mu_{1}, \mu_{2}}$ | prediction                                              | 28, 12, 20, 1      |

inferences per second. Although the use of Verilog or VHDL language can be more efficient in terms of on-chip resource utilisation, the implementation of DL models can be done using alternative and simpler methods. The Micron Deep Learning Accelerator (MDLA) [6] contains a software compiler that converts the neural networks into hardware instructions for an FPGA processor. After training in Tensorflow, the models are converted to Open Neural Network Exchange (ONNX) format and the MDLA API is used to execute the models on hardware. The results of this inference are shown in Figure 3.

In addition to the MDLA approach, a similar neural network for muon re-calibration has been implemented for the VU37P FPGA using the python API and command line tool HLS4ML [11], which translates trained neural networks to synthesizable FPGA firmware. This implementation of a slightly smaller network with three fully-connected layers of 32 nodes each is able to process four muons per bunch crossing. The pruning of the 50% weakest connections, and Q6.12 fixed-point precision were applied in order to reduce FPGA resource utilisation.

A neural network was also designed to detect/reject misidentified muon pairs from the GMT. For training, a pair is classified as true if and only if both L1 muons match a unique offline reconstructed global muon. The same selections are used as for the recalibration network,



Figure 2. The distribution of differences between the MDLA prediction (or GMT) values, and the offline reconstructed muon tracks, for matched muons. These plots were produced with a subset of the ZeroBias data that was not used for training, with the same selection cuts. A significant improvement in track parameter resolution is observed in the MDLA result (blue) when compared to the GMT output (red). The convention  $\Delta X = X_{\rm L1} - X_{\rm offline\ reco}$  is used throughout. Plots generated with 1.4M muons from Zero Bias data taken in 2022. [5]

however pairs of L1 muons with identical  $\eta$ ,  $\phi$ , and  $p_{\rm T}$  are removed. The architecture of this model is also depicted in Table 3. The precision-recall and ROC curves of the classifier are shown in Figure 3. The area under curve (AUC) score of the network for the barrel pairs is lower than the others as the barrel muons are already of a higher purity than the other regions, so the network can not give as much of an improvement.

### 4. L1 scouting for CMS at the High-Luminosity LHC

To coincide with the upgrade of the LHC to the High-Luminosity LHC [7], the CMS detector will be extensively upgraded during LHC Long Shutdown 3 (planned to begin in 2025). The newly upgraded CMS detector (CMS Phase 2) will have an entirely new L1 trigger system [8], capable of a 750 kHz L1 trigger rate (increased from 100 kHz, and with a latency window of up to 12  $\mu s$ . The new trigger will also have access to a previously unprecedented level of information, such as tracks from an all-new silicon tracker. The L1DS of CMS Phase 2 will exploit the potential of these new objects to perform close-to-offline levels of analysis at the full BX rate [9].

L1DS will utilise the DAQ800 board [10]; a custom ATCA blade designed for the CMS data acquisition system providing at least 800 Gb/s of total throughput. The DAQ800 will host two Xilinx VU35P FPGAs, 12x4 Samtec Firefly for optical link input of up to 25Gb/s per link, and ten QSFP outputs, capable of up to 100Gb/s each. A small run of similar DAQ400 prototype boards has already been produced.

L1 scouting at CMS Phase 2 is designed with a stageable architecture in mind, allowing the system to be scaled up as and when required. The baseline proposal, depicted in Figure 4



**Figure 3.** The precision-recall (left) and ROC (right) curves of the fake muon pair classifier for all muon pairs in the dataset (All), and pairs where both muons originate from the Barrel, Endcap or Overlap track finders respectively. Plots made with 130k muon pairs from 2022 Zero Bias data. The AUC scores are shown in the legend. [5]



**Figure 4.** Diagram of connectivity of the proposed CMS Phase 2 L1 trigger and L1 scouting system. Black lines represent the dataflow within the L1 trigger, and blue lines (dashed or solid) represent the links to the proposed L1 scouting system [8].

as the scouting decision system (sDS) and the scouting global system (sGS) will require seven DAQ800 boards to accept the incoming links and data throughput from the L1 correlator, global trigger, global calorimeter trigger, global track trigger, and global muon trigger. Potential extensions could involve capturing data from the local/regional muon and calorimeter triggers (sLS), the L1 tracks (sTS), and the L1 calorimeter trigger primitives (sPS).

# 5. Summary

The demonstrator of the L1DS system for CMS is in operation, and is capable of taking data from multiple L1 trigger sources. Applying ML inference with help of the Micron DLA framework and/or HLS4ML allows for the re-calibration of parameters and fake pair rejection. The full L1 scouting system is in development for the HL-LHC era, and as a result of the upgraded L1 trigger, will have a massively expanded physics potential, such as the ability to detect heavy stable charged particles over multiple BX, long-lived leptonic decays, and any channels where the available cuts give a low efficiency at the L1 rate budget.

## Acknowledgments

We thank Micron Technology Inc. for financial contribution and technical support to the project.

# References

- [1] CMS Collaboration 2008 The CMS experiment at the CERN LHC JINST  $\bf 3$  DOI 10.1088/1748-0221/3/08/S08004
- [2] Evans L. and Bryant P. 2008 LHC machine JINST 3 DOI 10.1088/1748-0221/3/08/S08001
- [3] Sirunyan A.M. et al 2020 Performance of the CMS Level-1 trigger in proton-proton collisions at  $\sqrt{s} = 13$  TeV JINST 15 P10017
- [4] Badaro G. et al 2020 40 MHz level-1 trigger scouting for CMS EPJ Web Conf. 245 DOI 10.1051/epjconf/202024501032 URL https://cds.cern.ch/record/2754090
- [5] CMS Collaboration 2022 40 MHz scouting with deep learning in CMS CMS detector performance summary CMS-DP-2022-066 URL https://cds.cern.ch/record/2843741?ln=en
- [6] Micron Technology Inc. 2023 Micron deep learning accelerator software development kit URL https://github.com/FWDNXT/SDK
- [7] Aberle O. et al 2020 High-luminosity large hadron collider (HL-LHC): technical design report CERN Yellow Reports: Monographs DOI 10.23731/CYRM-2020-0010 URL https://cds.cern.ch/record/2749422
- [8] CMS Collaboration 2020 The phase-2 upgrade of the CMS level-1 trigger Technical Design Report. CMS CERN-LHCC-2020-004 CMS-TDR-021 URL https://cds.cern.ch/record/2714892
- [9] Ardino R. et al 2023 A 40 MHz level-1 trigger scouting system for the CMS phase-2 upgrade NIM-A 1047 DOI 10.1016/j.nima.2022.167805
- [10] Badaro G. et al 2021 The phase-2 upgrade of the CMS Data Acquisition EPJ Web Conf. 251 DOI 10.1051/epjconf/202125104023
- [11] Duarte J. et al 2018 Fast inference of deep neural networks in FPGAs for particle physics JINST 13 DOI 10.1088/1748-0221/13/07/P07027