Imperial College London



# Track and Vertex Finding for the CMS Level-1 Trigger

**Christopher Brown** 

on behalf of the CMS Collaboration

31st May 2022



### Current Era CMS

- LHC 40 MHz bunch crossing rate, need to select events based on physics potential, can't store everything
- Two-stage trigger
  - Level 1 hardware based trigger, quick partial event reconstruction, 100 kHz output, < 4 µs latency. Only muon and calorimeter data
  - High level trigger, full event reconstruction with full granularity detector data with all parts, 1 kHz output, CPU farm



#### High pile up HL-LHC

# High Luminosity LHC

- HL-LHC -> expected to deliver 3000 fb<sup>-1</sup>
- Good for rare physics searches and precision measurements of SM
- Will see increased number of simultaneous proton-proton interactions per bunch crossing (pile up PU).
- **High PU** (up to 200) bad for current era triggering
- Level-1 trigger in HL-LHC rate would be 4 MHz to maintain current physics sensitivity, new trigger needed for HL-LHC utilising tracker tracks for the first time





10 cm

### CMS Phase-2 Upgrade

- Extensive upgrade program to all parts of the detector, new all-FPGA L1 trigger running at **750 kHz** with increased latency to **12.5 µs** -> more complex algorithms possible
- All new tracker, larger  $\eta$  (up to 3.8) coverage with inner tracker
- **Tracker tracks** for the first time at L1 trigger -> full 40 MHz readout  $\eta < 2.4$  with outer tracker
- Track finding and L1 trigger implemented on Xilinx Ultrascale+
  FPGAs, latency and resource usage of every algorithm
  critical





**Tracker Inputs** 

**Track Finder** 

Tracklet Road Search

Kalman Filter

Track Quality

**Global Track Trigger** 

**Baseline Approach** 

Improved Baseline

End-to-end NN approach

Firmware Implementation

Demonstration

### **Track Finder Inputs**



Along the beam pipe

### **Track Finder Inputs**

- p<sub>T</sub> modules -> 2 closely spaced detector layers
  - **Tunable** on-detector  $p_T$  cut
  - **10x-20x** reduction in data
  - Online track finding possible
- > 15k stubs per bunch crossing p<sub>T</sub> > 2
  GeV, bunch crossing rate 40 MHz
- **~ 200 tracks**  $p_T > 2$  GeV per crossing to reconstruct in **4 µs**
- Exploit parallelism and regional division of outer tracker, multiple copies of track finding algorithm on 162 boards







Tracker Inputs

**Track Finder** 

Tracklet Road Search

Kalman Filter

Track Quality

**Global Track Trigger** 

**Baseline Approach** 

Improved Baseline

End-to-end NN approach

Firmware Implementation

Demonstration

#### Hybrid Track Finding Algorithm

#### **Tracklet Road Search**

• Form track candidates

#### **Track Fitting**

• Combinatorial Kalman Filter

#### **Track Quality**

 $\circ$  Calculate  $\chi^2$  from KF residuals or use a BDT

### **Tracklet Road Search**

- Find stubs in adjacent layers, tracklet
  seeds
- Create track candidate from tracklet seed and project to other layers
- Find stubs along projection and add to track candidate

Ο





- Huge combinatorics -> 15k stubs, can't consider all of them
- Split every tracker region into further slices
- Only some stubs are compatible with inner and outer slices so reduce number of candidates
- 8 different combinations of layers are used to form tracklet seeds -> good  $\eta$  efficiency with latency and resource usage within budget

#### Track Fit - Kalman Filter

- Start with track candidate from tracklet stage and iteratively add associated stubs updating track parameters and fit
- Kalman Filter written for FPGA
- Complete within **1** µs
- Final step to package tracks into 96-bit track
  word and route in η for rest of trigger



### **Track Quality**

- Not genuine or 'fake' track not matched to a monte carlo event generated track based on detector hit matching
- Represent a **significant fraction** of produced tracks at high

 $\mathbf{p}_{\mathrm{T}}$ 

- Issue for downstream algorithms
- Extra **x<sup>2</sup> cuts** performed downstream give handle on fake tracks



### **Track Quality**

- Not genuine or 'fake' track not matched to a monte carlo event generated track based on detector hit matching
- Represent a **significant fraction** of produced tracks at high

#### р<sub>т</sub>

- Issue for downstream algorithms
- Extra **x<sup>2</sup> cuts** performed downstream give handle on fake tracks



#### Track Quality - Boosted Decision Trees

• Trained **BDT** on track features:

( $\phi$ ,  $\eta$ ,  $z_0$ ,  $\chi^2_{bend}$ , #stubs, #missing layers <sub>interior</sub>,  $\chi^2_{r\phi}$ ,  $\chi^2_{rz}$ )

Event

Per

Unmatched Tracks

vg.

- Lightweight BDT, depth of 3 with 60 iterations
- **Outperforms** additional strict  $\chi^2$  cuts used in downstream trigger
- Implemented in firmware, completes inference within **33 ns**, small fraction ( < 1%) of total FPGA resource usage



**Tracker Inputs** 

**Track Finder** 

Tracklet Road Search

Kalman Filter

Track Quality

**Global Track Trigger** 

**Baseline Approach** 

Improved Baseline

End-to-end NN approach

Firmware Implementation

Demonstration

## **Baseline Vertex Finding Chain**

**Track Finding** 

Produces *O*(100) tracks per event > 2 GeV, with PU 200



## **Baseline Vertex Finding Chain**

**Track Finding** 

Produces *O*(100) tracks per event > 2 GeV, with PU 200

Track Quality

Based on  $\chi^2$  parameters from track finding, simple cuts







# **Baseline Vertex Finding Chain**

Track Finding

Produces *O*(100) tracks per event > 2 GeV, with PU 200

Track Quality

Based on  $\chi^2$  parameters from track finding, simple cuts

Vertex Finding

Histogram all tracks in  $z_0$ weighted by  $p_T$ , find 3 consecutive bins with highest  $p_T$ 

Track to Vertex Association Fixed **window in \boldsymbol{z\_0}** or multiple windows based on track  $\eta$ 

Track E<sup>T</sup><sub>Miss</sub> PF/PUPPI etc.

Downstream Algorithms







CMS Phase-2 Simulation Preliminary

14 TeV, 200 PU

0.50 0.75 1.00

z<sub>0</sub><sup>PV</sup> Residual [cm]

<sup>5</sup> 10 15 z<sub>0</sub><sup>PV</sup> Residual [cm] 10

14 TeV, 200 PU



#### **End to End Neural Network**

DNN multiple track features ( $\eta$ ,BDT, $p_T$ )



#### **End to End Neural Network**

DNN multiple track features ( $\eta$ ,BDT, $p_T$ )

Weighted Histogram



#### **End to End Neural Network**

DNN multiple track features ( $\eta$ ,BDT, $p_T$ )

Weighted Histogram

Multilayered CNN



#### **End to End Neural Network**

DNN multiple track features ( $\eta$ ,BDT, $p_T$ )

Weighted Histogram

Multilayered CNN

Peak Finder



#### **End to End Neural Network**

DNN multiple track features ( $\eta$ ,BDT, $p_T$ )

Weighted Histogram

Multilayered CNN

**Peak Finder** 

DNN with z<sub>0</sub> distance, track features and latent features

27



### End to End Neural Networks for Vertex Finding

- Network trained with 2 part loss function -> Event level PV
  regression, track level PV track classification
- End-to-end -> track to vertex association optimised, influences vertex regression
- **1000** parameter network, all parts trained in 1 cycle
- Robust to changes in track finding
- Additional vertex quality



#### **Performance - Vertex Regression**

**QNN** compressed networks, see later...



- $\circ$  Better identification of pileup vertices removing high  $p_{T}$  clusters
- Similar performance with compressed networks

c.brown19@imperial.ac.uk

#### Performance - Track to Vertex Association



- Improvement in  $E_{\tau}^{miss}$  calculation, **reduction in tails** of residual
- Returns likelihood of track belonging to vertex -> flexible threshold for downstream algorithms vs single window based baseline approach

#### c.brown19@imperial.ac.uk

#### **Firmware - Network Compression**



#### **Firmware - Network Compression**



#### **Firmware - Network Compression**



#### c.brown19@imperial.ac.uk

#### **Firmware - Network Compression**



#### Implementation

- Insert networks within existing baseline firmware
- Overall top entities controlling input output signals of networks
- Targeted <sup>1</sup>/<sub>3</sub> Xilinx VU9P running at 360 MHz
- 108 ns total algorithm latency (2x baseline approach, still faster than required latency to be passed downstream)



#### Floor plan of VU9P chip

#### Implementation

- Insert networks within existing baseline firmware
- Overall top entities controlling input output signals of networks
- Targeted <sup>1</sup>/<sub>3</sub> Xilinx VU9P running at 360 MHz
- 108 ns total algorithm latency (2x baseline approach, still faster than required latency to be passed downstream)





**Tracker Inputs** 

**Track Finder** 

Tracklet Road Search

Kalman Filter

Track Quality

**Global Track Trigger** 

**Baseline Approach** 

Improved Baseline

End-to-end NN approach

Firmware Implementation

**Demonstration** 

- Testing algorithms on physical hardware & testing communication between L1 subsystems
- Individually tested parts of Track Finder chain and Baseline Vertexing approach
- Ran board to board tests of Track Finder and Vertexing, can measure latency between subsystems
- High speed fibre optics up to 28 Gb/s



- Testing algorithms on physical hardware & testing communication between L1 subsystems
- Individually tested parts of Track Finder chain and Baseline Vertexing approach
- Ran board to board tests of Track Finder and Vertexing, can measure latency between subsystems
- High speed fibre optics up to 28 Gb/s



Track Finder Board

Vertex Board

- Testing algorithms on physical hardware & testing communication between L1 subsystems
- Individually tested parts of Track Finder chain and Baseline Vertexing approach
- Ran board to board tests of Track Finder and Vertexing, can measure latency between subsystems
- High speed fibre optics up to 28 Gb/s



Track Finder FPGA Floorplan



Vertex FPGA Floorplan

- Testing algorithms on physical hardware & testing communication between L1 subsystems
- Individually tested parts of Track Finder chain and Baseline Vertexing approach
- Ran board to board tests of Track Finder and Vertexing, can measure latency between subsystems
- High speed fibre optics up to 28 Gb/s





Future plans.... **Track Finder** Kalman Filter Track Quality **Global Track Trigger Baseline Approach** Improved Baseline End-to-end NN approach Firmware Implementation

**Demonstration** 

Expand integration tests to larger parts of L1 trigger with multi-board tests

#### End-to-end in board to board tests, vertex quality and large scale physics studies

Expand small scale tests to full track finding chain, displaced track finding at L1

Tracklet Road Search

**Tracker Inputs** 



Tracker Inputs

 $\boldsymbol{p}_{\scriptscriptstyle T}$  modules making online track finding possible

Hybrid algorithm performing online track

finding within 4 µs

**Track Finder** 

Tracklet Road Search

Kalman Filter

Track Quality

**Global Track Trigger** 

**Baseline Approach** 

Improved Baseline

End-to-end NN approach

Firmware Implementation

Demonstration

New end-to-end neural network approach to vertex finding and association outperforming previous approaches, running on an FPGA. More info -> <u>CMS-CR-2022-018</u>

First tests of Track Finder and L1 trigger subsystems with board to board communications

# Backup

#### CMS Phase-2 Upgrade

- Brand new tracker -> radiation tolerant, 200m<sup>2</sup> of silicon, coverage up to  $\eta = 3.8$
- Outer tracker for L1 trigger up to  $\eta = 2.4$
- Muon systems increased η coverage and electronics
- Barrel calorimeter new electronics and lower
  ECAL temperature
- All new HGCAL end cap calorimetry, 4D (space-time) shower measurement
  - High granularity readout 1cm<sup>2</sup>
  - **Precision timing** < 50 ps



#### c.brown19@imperial.ac.uk

#### CMS Phase-2 Upgrade - Trigger

- ATCA based cards for different trigger subsystems
- Xilinx Ultrascale+ FPGAs used throughout > 200 FPAs
- Optical link speeds up to **28 Gb/s**
- Dedicated scouting system at 40 MHz
- Full event reconstruction at L1, using particle flow algorithms, all sub-detector information used to reconstruct jets, missing  $E_{\tau}$  and leptons
- Vertex used in **Pile Up Per Particle Identification** (PUPPI) to filter particles most likely to come from primary vertex







### Track Finder System

- $\circ$  9 regions in  $\phi$
- Stubs streamed at 40 MHz to Data Trigger and Control (DTC)
- DTCs route stubs to Track Finder (TF) boards
- **18 TF boards per nonant**, processing different events
- Nonant processing occurs in **parallel**, no communication between TF boards
- Streamed to downstream trigger in **18 streams**,
  +/- η in 9 nonants
- All implemented on FPGAs



#### **Track Finding Firmware Implementation**

- Each tracklet step implemented in HLS
  - Sub chain tested in HW
  - Barrel only chain synthesised, being optimised
- KF and final trigger output written in VHDL
  - Both barrel only and full config tested in HW
- **Top level VHDL** controls overall dataflow and multiple instances of various modules
- Each module individually synthesized meeting timing and matching emulators



#### **BDT For Track Quality**



- Trained on TTbar PU200 sample, 170K events
- Using <u>Conifer</u> Package -> generate HLS code
- **Tunable fixed point precision** <10,5> used
- Targeted VU9P 240MHz, Initiation Interval = 1 cycle

| Model | Python AUC | HLS AUC | Latency<br>(cycles) | LUT % | FF %  | DSP % |
|-------|------------|---------|---------------------|-------|-------|-------|
| BDT   | 0.986      | 0.981   | 3                   | 0.140 | 0.027 | 0.0   |

### Vertex Finding Concept



CMS Phase-2 Simulation Preliminary

14 TeV, 200 PU

0.25 0.50 0.75

z<sub>0</sub><sup>PV</sup> Residual [cm]

z<sub>0</sub><sup>PV</sup> Residual [cm]

14 TeV, 200 PU

# Vertex Finding Concept



CMS Phase-2 Simulation Preliminary

14 TeV, 200 PU

# Vertex Finding Concept



CMS Phase-2 Simulation Preliminary

14 TeV, 200 PU





#### c.brown19@imperial.ac.uk

### Learning Track Weights

- Network learns ideal track weighting into histogram
- Histogram part of Network training cycle filled with:

$$h_i = \sum_j^{\text{tracks}} \delta(j \in \text{bin } i) \times w(p_{\mathrm{T},j}, \eta_j, \chi_j^2, \ldots)$$

• Differentiated to give:

$$\frac{\partial h_i}{\partial \vec{w}} = \sum_{j}^{\text{tracks}} \delta(j \in \text{bin } i) \qquad \frac{\partial h_i}{\partial \vec{z}_0} = 0$$

• Passed through convolutional network and differentiable

ArgMax to give peak

