



#### A real-time demonstrator for track reconstruction in the CMS L1 Track-Trigger system based on custom Associative Memories and high-performance FPGAs

Guido Magazzù INFN – Sezione di Pisa

# Track Finding & L1 Trigger (1)

- Luminosity = 5×10<sup>34</sup> cm<sup>-2</sup> s<sup>-1</sup> => ~ 10<sup>4</sup> Stubs/BX in Tracker detectors
- <u>L1 Trigger Rate ≤ 1MHz</u>
- L1 Trigger Latency ≤ 12.5us



Proposed solution (Track-Trigger)

O(10) data reduction (trigger data) with stub filtering in Pt modules

- Low latency (≤ 4us) track reconstruction with data from the Silicon Trackers
- Detection of high-Pt tracks (Pt ≥ 3GeV => ~ 3% of the tracks) in a real time processor for track-finding based on Associative Memory (AM) ASICs and state-of-the-art FPGAs
- Readout of Front-End data associated to high-Pt tracks only

# Track Finding & L1 Trigger (2)

- Sensor "strips" are grouped in Super-Strips (e.g. 8 strips in one Super-Strip)
- Pattern banks (one for each port/layer) are pre-loaded in AM ASICs during the configuration phase and they are simultaneously compared with input data (Super-Strips) at run mode
- Each pattern is a configuration of the Super-Strips in the different layers associated to a high-Pt track (i.e. "straight" or "slightly tilted" track)
- When a pattern and the input data match, the pattern address is stored in a data buffer
- When all the Super-Strips associated to the event have been processed, the address of the detected patterns are transmitted back by the AM ASICs





### Track-Trigger Architecture

- CMS Silicon Tracker segmented in 48 regions (trigger towers) in  $\eta$   $\phi$  (pseudo-rapidity and azimuthal angle) 6 in  $\eta$ , 8 in  $\phi$
- About 100 "Stubs" per bunch crossing received in each layer (6 or 7 depending on  $\eta$ ) in each trigger tower
- ATCA boards (Pulsar IIb) collecting data from trigger towers
- Pattern Recognition Mezzanine (PRM) boards (2 PRMs in each Pulsar IIb) performing track finding



#### Pulsar IIb Board



- General purpose ATCA board for DAQ and Trigger (FNAL)
- Xilinx Virtex 7 FPGA
  - 80 GTH transceivers
    - 40 To/From the Rear Transition Module (To/From Optical Transceivers)
    - 28 To/From the ATCA Backplane (To
    - 12 To/From 4 HPC FMC Connectors (To/From PRMs)

#### Pattern Recognition Mezzanine (PRM)



- AM06 ASICs (12x, 4 JTAG chains)
- Kintex UltraScale FPGA
- Flash RAM (1x)
- FMC Connectors(2x)



- Pattern Memory (RLDRAM)
- Data Fan-Out Network
- Power Distribution Network
- Clock Distribution Network

### Real-Time Demonstrator (1)



- Two real-time demonstrators has been developed on commercial development boards with FPGAs emulating the Pulsar board
  - Xilinx KCU105 with Kintex Ultra-Scale for the PRM05
  - HTG-V5-PCIE with Virtex-6 FPGA for the PRM06
- These demonstrators have been used for the test and characterization of the PRM boards and for the validation of the firmware implementing the the data flow management to/from the AM ASICs and the track fitting algorithms
- We then moved to the Pulsar IIb / PRM06 final demonstrator

#### Real Time Demonstrator (2)



#### Pulsar IIb – PRM06 Test Bench (INFN Pisa & CERN)

#### Pulsar IIb Firmware



- Application specific FW developed by us
- Ethernet Interface to/from the ATCA Backplane
- IP-Bus protocol between the Ethernet Interface (Embedded PHY/MAC) and the Instruction and Data Memories
- The Data Flow Manager runs the test program pre-loaded in the Instruction Memory and it manages the data transfers from Hit Memories to PRM and from PRM to Track Memories

### Test Flow in the Pulsar IIb

- 1. Load Configuration Data Memory (patterns)
- 2. Load Instruction Memory (configuration mode)
- 3. Configure AM ASICs and PM (i.e. load patterns)
- 4. Load Input Data Memory (Hits)
- 5. Load Instruction Memory (run mode)
  - a. Reset
  - b. Send N data
  - c. Send End\_Of\_Event
  - d. Wait

Very flexible and portable methodology: execution of a set of instructions that are mapped into control and data signals for all the FW components

- 6. Run test
- 7. Read Output Data Memory (Tracks)

#### **PRM Firmware**



- Parallel port (LVDS signals) for JTAG and W/R access to Control and Status Registers
- The Data Flow Manager manages the data transfers to/from AM ASICs and Pattern Memory and the modules performing data buffering and track reconstruction

### Data Flow in the PRM



### **High-Speed Link Test**

- Pulsar ⇔ PRM
  - PRBS-7 Sequence
  - Up to 12.5 Gbps
  - BER < 2 x 10<sup>-15</sup> (both directions)

- PRM ⇔ AM ASICs
  - PRBS-7 Sequence
  - 2.0 Gbps
  - BER < 8 x 10<sup>-15</sup> (both directions)



#### **Firmware Implementation**

3.934 W

28.3 °C

0.8 °C/W

Low

56.7 °C (64.7 W)



| Utilization - Post-Implementation |      |                |      |      |  |
|-----------------------------------|------|----------------|------|------|--|
| 1                                 |      |                |      |      |  |
| LUT -                             |      | 56%            |      |      |  |
| LUTRAM -                          | 21%  |                |      |      |  |
| FF -                              |      | 46%            |      |      |  |
| BRAM -                            |      | 53%            |      |      |  |
| DSP -                             | 18%  |                |      |      |  |
| IO -                              |      |                | 76%  |      |  |
| GT -                              |      |                |      | 100% |  |
| BUFG -                            | 3%   |                |      |      |  |
| MMCM -                            | 17%  |                |      |      |  |
| PLL -                             | 8%   |                |      |      |  |
| ļ                                 | ) 25 | 50 75          | 5 10 |      |  |
| Ū                                 |      | Utilization (% |      |      |  |

| Power                               |                  |  |  |  |
|-------------------------------------|------------------|--|--|--|
| Total On-Chip Power:                | 15.273 W         |  |  |  |
| Junction Temperature:               | 46.1 °C          |  |  |  |
| Thermal Margin:                     | 38.9 °C (26.7 W) |  |  |  |
| Effective dJA:                      | 1.4 °C/W         |  |  |  |
| Power supplied to off-chip devices: | 0.214 W          |  |  |  |
| Confidence level:                   | Low              |  |  |  |
| Implemented Power Report            |                  |  |  |  |
|                                     |                  |  |  |  |
|                                     |                  |  |  |  |

Power

Total On-Chip Power:

Thermal Margin:

Confidence level:

Effective dJA:

Junction Temperature:

Implemented Power Report

Power supplied to off-chip devices: 0.083 W

#### Pulsar IIb

- Xilinx Virtex-7
- Xc7vx690tffq1927-2
- Clock => 125 MHz
- GTH to ATCA Backplane (Ethernet)
  - Ref. Clock => 156.25 MHz
  - Rate => 1.0 Gbps
- GTH to PRM
  - Ref. Clock => 156.25 MHz
  - Rate => 2.5 Gbps

#### PRM06

- Xilinx Kintex UltraScale
- Xcku060-ffva1156-1-c
- GTH to from Pulsar IIb
  - Ref. Clock => 125.00 MHz
  - Rate => 2.5 Gbps
- GTH to from AM chips
  - Ref. Clock => 100.00 MHz
  - Rate => 2.0 Gbps

#### Hardware Test Results



t/tbar event (PU = 140) => max (stubs/layer = 138) => 19 matched roads => 15 fitted tracks

- AM06 clock = 100 MHz
- DO clock => 200 MHz
- TCB clock => 100 MHz
- TF clock => 200 MHz



#### Simulation Results



- TCB clock => 300 MHz
- TF clock => 500 MHz

#### Conclusions

- A real-time processor based on custom high-density Associative Memory ASICs and high-performance FPGAs has been proposed for the CMS L1-trigger
- A demonstrator based on the state-of-the-art components of the Track Trigger system (Pulsar IIb and PRM boards) has been developed
- The demonstrator has been used for the test of the custom components and for the validation of the track finding algorithms (proc. time => 3.3 usec)
- Simulations anticipate the possibility of further significant improvements (proc. time => 1.1 usec) => <u>We are working on this!</u>