1–6 Oct 2023
Geremeas, Sardinia, Italy
Europe/Zurich timezone

Design and implementation of Neural Network based conditions for the CMS Level-1 Global Trigger upgrade for the HL-LHC

5 Oct 2023, 17:40
1h 20m
Poster Trigger and Timing Distribution Thursday posters session

Speaker

Gabriele Bortolato (Universita e INFN, Padova (IT))

Description

The High-Luminosity LHC upgrade will have a new trigger system that utilizes detailed information from sub-detectors at the bunch crossing rate, which enables the Global Trigger (GT) to use high-precision trigger objects. Novel machine learning-based algorithms will also be included in the trigger system to achieve higher selection efficiency and detect unexpected signals. The focus of this study is on optimizing the implementation of these novel algorithms, in particular software-based to more hardware-based optimizations are studied in detail. Finally, the study will analyse how the applied optimizations affect performance degradation and the model's resource footprint.

Summary (500 words)

The new trigger system for the High-Luminosity LHC upgrade will exploit detailed information from the calorimeter, muon and tracker trigger paths at the bunch crossing rate. The final stage in the trigger, the Global Trigger (GT), will receive high-precision trigger objects from the upstream trigger channels. It will evaluate a menu of more than 1000 cut-based and neural-net based trigger algorithms in order to determine the Level-1 trigger accept decision. Traditionally, algorithms used to build the so called trigger menu have employed simple selections on one or more physics objects for instance cutting on a specific or combination of reconstructed particle properties. The Phase-2 CMS GT aims to go beyond this approach and include neural network-based conditions alongside the usual algorithms already in use today in Run-3 in order to reach higher selection efficiency and selection of unexpected signals. Implementing these neural network-based conditions in the GT algorithm chain requires meeting stringent requirements, particularly in terms of latency and resources. The upgrade targets a total latency of 1µs (40 Bunch Crossings) for the entire GT, this implies that model optimizations are essential to meet the target latency. Neural networks are typically resource-intensive, so extensive optimization is required during and after training to integrate them alongside the large number of cut-based algorithms. Two different flavours of neural networks are considered: deep binary classifiers and deep auto-encoders. A deep binary classifier is designed to distinguish a specific signal signature from unwanted background noise through supervised training. In contrast, a deep auto-encoder aims to characterize as much as possible the background and identify anything that does not resemble it marking it as anomalous. For this reason, it is also known as an anomaly detection trigger and uses unsupervised training. The Hls4ml tool has been employed to convert a Tensorflow/Keras models into hardware description language like VHDL and Verilog. To reduce the model resource footprint and latency, multiple optimizations have been applied to the code. Some of these optimizations, such as variable and synapse pruning, hyper-parameter quantization and precision tuning, can be performed without completely redesigning the model. However, others require a new model to be designed and trained from scratch, for instance in this work a technique known as knowledge distillation was used. A prototype firmware has been implemented and evaluated. It contains various models that differ in the optimization techniques applied and target signal signature. The firmware encompasses neural networks, the entire GT infrastructure (including the I/O logic, de-multiplexers, and object distribution) and the interfaces necessary for the communication between the neural network and the GT framework. GT objects are streamed at a frequency of 480 MHz, whereas the neural network accepts a single vector of objects every 25 ns (40MHz). After multiple tests the model has been tuned to run at 240 MHz with an initiation interval (II) of 1. Firmware resource usage and performance are studied to extract the best compromise between latency and logic footprint.

Author

Gabriele Bortolato (Universita e INFN, Padova (IT))

Co-authors

Artur Lobanov (Hamburg University (DE)) Benjamin Huber (Technische Universitaet Wien (AT)) Dinyar Rabady (CERN) Elias Leutgeb (Technische Universitaet Wien (AT)) Hannes Sakulin (CERN) Jaana Heikkilae (University of Zurich (CH)) Maria Cepeda (CIEMAT)

Presentation materials