





## System Design and Prototyping for the CMS Level-1 Trigger at the High-Luminosity LHC

Piyush Kumar & Bhawna Gomber (on behalf of the CMS collaboration)

CASEST, School of Physics, University of Hyderabad, Hyderabad, Telangana, India





23<sup>rd</sup> Real Time Conference

## **High-Luminosity LHC (HL-LHC)**



23<sup>rd</sup> Real Time Conference

- Luminosity: indicate the performance of an accelerator
  - Proportional to: number of collisions that occur in a given amount of time
  - higher the luminosity: the more data the experiments can gather
- Aim: to deliver a much larger dataset for physics to the LHC experiments
- Pile-up: Number of simultaneous protonproton interactions (~200)
  - With high pile-up, need more advanced selection algorithms at L1 trigger
- This increased datasets will help in the high precision measurements of:
  - Standard model (SM)
  - new territories beyond the SM (BSM)

|                                                                                                                    | Instantaneous<br>Luminosity                              | S                            | Pile-up (aver         | age)               | Integrated luminosity               |                      |
|--------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|------------------------------|-----------------------|--------------------|-------------------------------------|----------------------|
| Run-2                                                                                                              | $2.1 \times 10^{34} \text{ cm}^{-2}$                     | <sup>2</sup> S <sup>-1</sup> | 55                    |                    | 160 fb <sup>-1</sup><br>(4 years)   |                      |
| HL-LHC (baseline)                                                                                                  | 5 x 10 <sup>34</sup> cm <sup>-2</sup> s                  | S <sup>-1</sup>              | 140                   |                    | 3000 fb <sup>-1</sup><br>(10 years) |                      |
| HL-LHC (ultimate)                                                                                                  | 7.5 x 10 <sup>34</sup> cm <sup>-2</sup>                  | <sup>2</sup> S <sup>-1</sup> | 200                   |                    | 4000 fb <sup>-1</sup><br>(10 years) |                      |
| 2021 2022<br>FMAMJJJASONDJFMAMJJASONDJF                                                                            | 2023 2024 202<br>FMAMJJASONDJFMAMJJASONDJFMAMJJ<br>Run 3 |                              |                       |                    |                                     | 2029<br>DIFMAMIJIAS( |
| 2030 2031<br>FMAMJJASONDJFMAMJJASONDJF                                                                             |                                                          | 2034<br>DJFMAMJJASON<br>S4   | 2035<br>DJFMAMJJASOND | 2036<br>JFMAMJJASC | Run 5                               |                      |
| Shutdown/Technical stop<br>Protons physics<br>Ions<br>Commissioning with beam<br>Hardware commissioning/magnet tra | aining                                                   |                              |                       |                    | Last                                | updated: January 2   |
|                                                                                                                    | Fig: H                                                   | IL-LHC tim                   | neline                |                    |                                     |                      |

**NPSS** 

**IEEE** 



## **CMS HL-LHC upgrade**

- The CMS detector planned upgrade for the HL-LHC era:
  - New pixel and strip tracking detector
  - New high-granularity calorimeter (HGCAL) of the endcap
  - New frontend/backend electronics for the:
    - Barrel calorimeter
      - Electromagnetic calorimeter (ECAL)
      - Hadronic calorimeter (HCAL)
    - Muon system
      - Drift tube (DT)
      - Cathode strip chambers (CSC)
  - 40 MHz Scouting system
    - can be used to scrutinize the collision events and identify potential signatures unreachable through standard trigger selection processes
  - L1 trigger:
    - Inclusion of the tracker information
    - Extensive usage of:
      - large FPGA (Virtex UltraScale+/Kintex UltraScale)
      - high-speed optical links (28 Gbps)

## Summary of CMS HL-LHC Upgrades



#### Fig: CMS detector HL-LHC upgrade



## L1 trigger principle

- At design parameters the LHC produces:
  - ~ 10<sup>9</sup> events/second in CMS detectors.
  - each event is ~ 1 MB.
- 10<sup>9</sup> events/s x 1 Mbyte/events = 10<sup>15</sup> bytes/s = 1 PB/s (1 Petabyte/second)
- Problem:
  - It is impossible to store and process this large amount of data
- Solution:
  - a drastic rate reduction has to be achieved
    - Level-1: 40 MHz to 750 kHz
    - High level trigger (HLT): 750 kHz to 7.5 kHz
- A trigger is designed to reject the uninteresting events and keep the interesting ones for physics.



- i.e. LHC experiments (ATLAS/CMS)
- ► ~100M channels
- ► ~1-2 MB of RAW data per measurement
- ► ~40 MHz measurement rate (every 25 ns @ the LHC)

... and really FAST

Modern large-scale experiments are really BIG



Data volume is a *key issue* in modern large-scale experiments



## L1 trigger architecture

- The HL-LHC L1 trigger receives input from the backend electronics of:
  - Calorimeters
  - Muon spectrometers
  - Track finder
- Calorimeter trigger: (creating clusters from the energy deposited by the particle in the calorimeter)
  - Regional calorimeter trigger (RCT)
    - Barrel ECAL and HCAL
  - Global calorimeter trigger
    - RCT, forward hadronic (HF), and HGCAL
- Correlator trigger (CT) receives input from all the trigger sub-system:
  - Aim: identifying and reconstructing all the particles with a particle flow algorithm
- Global trigger:
  - Aim: Issues the final L1 trigger decision

28 July, 2022

- Input rate: 40 MHz
- Increased output rate: 100 kHz => 750 kHz
- Increased latency:  $3.8 \ \mu S \Rightarrow 12.5 \ \mu S$



**IEEE** 



## L1 trigger architecture



Fig: CMS Phase-2 L1 trigger design. Mentioning the time-multiplexing (TMUX) period, regional (RS) and functional segmentation (FS), and the number of FPGAs for each architecture component.





©ieee NPSS **IEEE** 

## **Technology R&D examples**

- ATCA based electronics
  - Generic high I/O (> 100) processing boards
  - One or two Virtex UltraScale+/Kintex UltraScale FPGA from Xilinx
- Wide range of testing and prototypes
  - Extensive link tests @ 28 Gb/s
  - endurance test (< 10<sup>-12</sup> BER) of the FPGA quads.
  - Thermal performance test and simulation
    - Heat sink test (in order to keep operating temperature bellow 100<sup>o</sup>C)
  - Algorithm firmware
  - Infrastructure firmware



**Ocean** 



#### BMT-L1





APx

APx 25G quad eye scans25.78125 Gbps

binary sequence

(PRBS31)

(CDR) ON

٠

Using pseudorandom

**Clock and data recovery** 

**Heat sink** 

**PSS** 

IEEE

Serenity



Fig: APxF 25G eye scans, quads 121-135





## **Trigger Algorithms Development**

- The trigger algorithms are implemented by using Xilinx Vivado-HLS (high level synthesis) tool
  - Rapid prototyping •
  - Codes are written in C++ .
  - HLS synthesizes the code to generate the RTL and
  - Provide an early estimate of latency and resource • utilization
  - Increased ease of collaboration and code sharing for algorithm design
- **Downstream:** 
  - Integration of the algo with the firmware shell (orange box) that provides
    - MGT link instantiation •
    - Timing and Control Distribution System (TCDS) ٠ connectivity
    - **DAQ** support
    - and an AXI interface to the controlling system
  - Uses HDL wrapper for integration (magenta box) .

#### Aim is to write HLS algorithms in a framework agnostic way



| *                                                                               | Summary:                        |                                    |                                              |                                             |                  |  |
|---------------------------------------------------------------------------------|---------------------------------|------------------------------------|----------------------------------------------|---------------------------------------------|------------------|--|
|                                                                                 |                                 | Target                             | Estimated                                    | d  Uncert                                   |                  |  |
|                                                                                 | p_clk                           | 4.17                               | 2.917                                        | 7                                           | 1.25             |  |
|                                                                                 | min   máx<br>32  33             | Inte<br>  min  <br>  min  <br>2  6 | rval   P<br>max  <br>6  fu                   | ipeline  <br>Type  <br>Inction              |                  |  |
| <pre>== Utilization Estimate ====================================</pre>         | es                              |                                    |                                              |                                             |                  |  |
| +Name                                                                           | BRAM_18K                        | DSP48E                             |                                              | LUT                                         | URAM             |  |
| DSP<br> Expression<br> FIFO<br> Instance<br> Memory<br>Multiplexer<br> Register | - <br>- <br>- <br>- <br>- <br>0 | - <br>- <br>- <br>- <br>-          | - <br>0 <br>- <br>49827 <br>- <br>- <br>3360 | - <br>4 <br>- <br>78752 <br>- <br>56 <br>32 | -<br>-<br>-<br>- |  |
| Total                                                                           | 01                              | 0                                  | 53187                                        | 78844                                       |                  |  |
| Available SLR                                                                   | 1440                            | 2280                               | 788160                                       |                                             |                  |  |
| Utilization SLR (%)                                                             |                                 |                                    |                                              |                                             |                  |  |
| Available                                                                       | 4320                            | 6840                               | 2364480                                      | 1182240                                     | 960              |  |
| Utilization (%)                                                                 | 0                               | 0                                  | 2                                            | 6                                           | 0                |  |

Performance Estimates

Timing (ns);

Fig: Vivado-HLS performance estimates of trigger algorithm



implementation

## **Barrel calorimeter segmentation**



Fig: Barrel calorimeter segmentation



©ieee NPSS **IEEE** 

## L1 Trigger Algorithms

#### **Calorimeter Trigger**

#### RCT geometry for the FPGA processing: $17\eta \times 4\phi$ of the barrel (total 36 APx cards)

**IEEE** 

#### Regional Calorimeter Trigger (RCT) creates electrons/photons energy clusters and towers and sends them to Global Calorimeter Trigger (GCT)

Detector Backend systems

Trigge

Timing (ns)

Summary

ap\_clk 4.17

Summary

Latency (clock cycles)

Latency Interval

230 230 6

min max min max Type

Clock Target Estimated Uncertainty

3.491

6function

BCT

- The Xilinx UltraScale+ XCVU9P FPGA supports • 3 super logic regions (SLR).
- For efficient implementation, the algorithm is partitioned SLR wise in 2 SLR (SLR2 and SLR1)
- RCT algorithm is divided in three part
  - RCT8x4: •
    - Implemented in SLR1
    - Processes the 8n x 4 $\phi$  RCT regions
    - only ECAL.
  - RCT9x4 •
    - implemented in SLR2
    - processes the 9n x 4 $\phi$  RCT regions
      - **ECAL**
      - 16n x 4 $\phi$  HCAL data.
  - RCTSUM •
    - implemented in SLR2
    - combines both the algorithm and sends the output to the GCT.



#### Fig: RCT algorithm organisation and dataflow

| E | - | Summary |
|---|---|---------|
|   |   |         |

1.25

| Name                | BRAM_18K | DSP48E | FF      | LUT     | URAM |
|---------------------|----------|--------|---------|---------|------|
| DSP                 | -        | -      | -       | -       | -    |
| Expression          | -        | -      | 0       | 24202   | -    |
| FIFO                | -        | -      | -       | -       | -    |
| Instance            | 8        | 0      | 303544  | 464948  | -    |
| Memory              | -        | -      | -       | -       | -    |
| Multiplexer         | -        | -      | -       | 16292   | -    |
| Register            | 30       | -      | 23821   | 1813    | -    |
| Total               | 38       | 0      | 327365  | 507255  | 0    |
| Available           | 4320     | 6840   | 2364480 | 1182240 | 960  |
| Available SLR       | 1440     | 2280   | 788160  | 394080  | 320  |
| Utilization (%)     | ~0       | 0      | 13      | 42      | 0    |
| Utilization SLR (%) | 2        | 0      | 41      | 128     | 0    |

#### Fig: RCT algorithm HLS results



Fig: e/gamma cluster making in RCT algorithm

The implementation is scalable for the region of  $17\eta \times 6\phi$ (can use 3 SLRs). RCT APx board will reduce from 36 to 24



## **RCT to GCT slice test**

- The GCT algorithm (merging the energies between the RCT cards in phi direction) is synthesized in Vivado-HLS
- The RCT (SLR2 and SLR1) and GCT (SLR0) is implemented together in XCVU9P FPGA.
- Tested on a single card:
  - Replicate the 4 RCT output links x5 (20 input) ~ GCT processing 5 RCT cards
- Implementation details
  - XCVU9P-FLGC2104-1-E FPGA
  - Clock: 240 MHz
  - Link bandwidth: 16 Gbps



©ieee NPSS 

#### Fig: RCTTDR and GCT algorithm implementation in three SLR

## Performance Estimates

| _ | rinnig (ns) |        |           |             |  |  |  |  |  |  |
|---|-------------|--------|-----------|-------------|--|--|--|--|--|--|
|   | Summary     |        |           |             |  |  |  |  |  |  |
|   | Clock       | Target | Estimated | Uncertainty |  |  |  |  |  |  |
|   | ap_clk      | 4.17   | 2.909     | 1.25        |  |  |  |  |  |  |
|   |             |        |           |             |  |  |  |  |  |  |

#### Latency (clock cycles)

| [                | Summary |     |     |     |          |  |  |  |
|------------------|---------|-----|-----|-----|----------|--|--|--|
| Latency Interval |         |     |     |     |          |  |  |  |
|                  | min     | max | min | max | Туре     |  |  |  |
|                  | 120     | 120 | 6   | 6   | function |  |  |  |

#### Utilization Estimates

#### Summary

| Name                | BRAM_18K | DSP48E | FF      | LUT     | URAM |
|---------------------|----------|--------|---------|---------|------|
| DSP                 | -        | -      | -       | -       | -    |
| Expression          | -        | -      | 0       | 1444    | -    |
| FIFO                | -        | -      | -       | -       | -    |
| Instance            | -        | -      | 27703   | 146555  | -    |
| Memory              | -        | -      | -       | -       | -    |
| Multiplexer         | -        | -      | -       | 56      | -    |
| Register            | 0        | -      | 82036   | 36864   | -    |
| Total               | 0        | 0      | 109739  | 184919  | 0    |
| Available           | 4320     | 6840   | 2364480 | 1182240 | 960  |
| Available SLR       | 1440     | 2280   | 788160  | 394080  | 320  |
| Utilization (%)     | 0        | 0      | 4       | 15      | 0    |
| Utilization SLR (%) | 0        | 0      | 13      | 46      | 0    |

Fig: GCT algorithm HLS results



## **RCT to GCT slice test**

- The bitstream is generated and the project passes the timing constraints.
- Following are the algorithms device placement:
  - RCT8x4: SLR1
  - RCT9x4: SLR2
  - RCTSUM: SLR2
  - GCT: SLR0
- Post implementation device utilization is within the boundary.
- Bitstream is successfully tested on the APd1 (APx demonstrator board) board
  - Test vector generated via Monte Carlo physics simulations for different physics models.



**IEEE** 

| Timing                       | Setup   Hold | Pulse Width |
|------------------------------|--------------|-------------|
| Worst Negative Slack (WNS):  | 0.019 ns     |             |
| Total Negative Slack (TNS):  | 0 ns         |             |
| Number of Failing Endpoints: | 0            |             |
| Total Number of Endpoints:   | 1434128      |             |
| Implemented Timing Report    |              |             |

Fig: Utilization and timing summary (setup)



Fig: GCT device implementation

#### $F_{max} = 1/(4.167-0.019) \sim 241 \text{ MHz}$



## **Muon trigger**

- The function of the muon trigger: ٠
  - Identification of the muon tracks
  - Measure momenta •
- inputs in the form of muon stubs (32-64 bits each) •
- through Inputs (stubs) are relaying various • electronics regions:
  - **Barrel**:
    - Drift tube (DT)
    - **Resistive plate chambers (RPC)**
  - **Endcap:** •
    - very forward extension iRPC •
    - cathode strip chambers (CSC) •
    - gaseous electron multiplier (GEM) •
- Full implementation of the barrel algorithm ٠
  - Tested on small KU040 FPGA
  - Algorithm clock: 160 MHz
  - BMT latency: 2.25 µS



Fig: Muon trigger architecture

| DSP | FF  | LUTs | BRAM |  |
|-----|-----|------|------|--|
| 10% | 17% | 37%  | 46%  |  |

<sup>©</sup>IEEE NPSS

**IEEE** 



Fig: barrel algorithm implementation

#### Stubs: position, bend angle, and timing information of the muons



#### ku040 FPGA

23<sup>rd</sup> Real Time Conference

## **Track trigger**

#### **Global track trigger algorithm:**

- Aim: •
  - **Reconstruction of the primary vertices** ٠

Whole

**SLR** 

- Identify track-only objects ٠
- Uses 6 APx and 6 serenity board •
- **Primary Vertex (PV) Finding:** •
  - Origin of tracks constrained to ~1mm ۲
  - Remove pileup to maintain manageable rates ٠
- Track-Vertex Association: •
  - Select tracks consistent with the PV ۲
- **Track-based Jet Finding:**
- Track-based missing transverse energy (MET) .
- Track-based Missing H<sub>T</sub>\* .



#### $H_T$ : scalar sum $p_T$ of jets





## **Correlator trigger** Correlator trigger layer-1

- Aim: Collect information from calorimeters/muon systems/tracker, combine them
  - reconstruct the particles and identify them.
- Employs algorithms for:
  - Particle Flow (PF) and Particle per pile-up identification (PUPPI) (barrel + endcap)
  - Jets/Missing transverse energy (MET)/H<sub>T</sub>
  - Taus, Isolation, NN MET, electron/photon (egamma)



- Correlator Layer-1: Performs full PF+PUPPI create particle-flow candidates
- Correlator Layer-2: use PF candidates to reconstruct physics objects





#### Fig: Layer-1 barrel



**NEEE** 23<sup>rd</sup> Real Time Conference

- Full working PF+PUPPI
- Barrel/endcap implemented using VU9P-2

| VU9P   | DSP | FF  | LUTs | BRAM |
|--------|-----|-----|------|------|
| Barrel | 33% | 36% | 46%  | 38%  |
| Endcap | 24% | 24% | 30%  | 32%  |

## **Global trigger**

- Final stage of the Level-1 trigger
- Aim: responsible for implementing the trigger menu
- Based on serenity board
   XCVU9P FPGA
- Flexible design:
  - can be adapted for future algorithms
- 480 MHz algorithm clock
- Total latency of the GT Algorithm
  - ~250 ns (10 Bunch-crossing)
  - Budget: 40 BX (1000 ns)









Fig: 39 algorithm placed in 1 SLR (total 117 algorithms for 3 SLR)

Resource distribution in GT algorithm board with 234 Algorithms:

Flexible System **IEEE** 





Fig: 78 algorithm placed in 1 SLR (total 238 algorithms for 3 SLR)



23<sup>rd</sup> Real Time Conference

100

#### 23<sup>rd</sup> Real Time Conference

## **Slice test**

#### Track finder (backend) => Global track trigger (GTT)

- VU7P Apollo => KU15P Serenity test
  - Apollo algo firmware: Final subcomponent of track finder
  - Serenity algo firmware: Vertexing algorithm
  - Tracks sent over 18 links
- inputs is injected into the buffers on Apollo
  - Generated via CMS software (CMSSW)
- Outputs is captured on the Serenity buffer
  - Compared with expectations: 100% agreement

### Correlator layer 1 (Serenity) $\rightarrow$ Layer 2 (Serenity)

- Layer-1 algo input: HGCAL => jets
- Layer-2 algo output: electron/photon (egamma)
- 100% agreement with emulator



**IEEE** 

TIF crate (Apollo connected to Serenity)



#### Fig: Track finder and GTT board placement in the TIF crate



## Summary

- Key technological choices to leverage the HL-LHC high data-taking environment:
  - High-speed optical links (from ~10 Gbps to ~28 Gbps)
  - Large FPGAs (from Virtex-7 to Xilinx Virtex UltraScale+/ Kintex UltraScale)
  - Modular and scalable algorithm firmware
- Several FPGA boards are being developed and various tests were performed, such as:
  - The links eye scan (@25 Gbps) and
  - endurance test (<  $10^{-12}$  BER) of the FPGA quads.
  - FPGA thermal test to explore various heat sinks options.
- Following trigger algorithms are being prepared and tested successfully on their corresponding prototyped board:
  - RCT and GCT
  - Barrel muon trigger and global muon trigger (GMT)
  - Global track trigger (GTT)
  - Correlator Layer-1 and Layer-2
  - Global track trigger
- The latency and resource utilization is well within the desired limit.
- All the testing/development is going in time with the HL-LHC schedule.



**IEEE** 

#### X20 DTH Ethernet switch

Fig: L1 trigger crate installed at CERN that houses three Serenity, X2O, and DTH (DAQ and TCDS hub) board (for multi-board testing)

Report CERN-LHCC2020-004. CMS-TDR-021, CERN, Geneva, Trigger. Technical 'record, Level-1 cern.ch/ CMS cds. 147 2020. URL http: Phase-2 Upgrade he

23<sup>rd</sup> Real Time Conference



18





# Thank you

## Acknowledgement

• Piyush Kumar and Bhawna Gomber acknowledges the support from IOE, University of Hyderabad through Grant Number UOH-IOE-RC2-21-006





# CASEST CENTRE FOR ADVANCED STUDIES IN

**ELECTRONICS SCIENCE & TECHNOLOGY** 





## BACKUP...

- The SSI technology integrate multiple Super Logic Region (SLR) components placed on a passive Silicon Interposer (fig 3).
- Each SLR contains the active circuitry common to most Xilinx FPGA (Field programmable gate array) devices. This circuitry includes large numbers of:
  - 6-input LUTs (Look-up tables)
  - Registers
  - I/O components
  - Gigabit Transceivers (GT)
  - Block memory
  - DSP blocks
  - Other blocks
- The device we are using for our synthesis and implementation is based on Xilinx SSI technology and support three SLRs.
  - Xilinx Virtex UltraScale+ xcvu9p flgc2104-
    - 1-e FPGA





Fig 3: Xilinx FPGA Enabled by SSI Technology\*

\*: UG872 Large FPGA Methodology Guide



## **Barrel Calorimeter Segmentation (New)**



Fig 2: Barrel calorimeter segmentation (new)





| LHC BC Clock [MHz]       | 40.08   |  |
|--------------------------|---------|--|
| Word Bit Size            | 66      |  |
| Line Rate [Gbps]         | 16.0000 |  |
| Max Theoretical Words/Bx | 6.04851 |  |

|                               |         | TM1     |        | TM6     |         |        | TM18    |         |        |
|-------------------------------|---------|---------|--------|---------|---------|--------|---------|---------|--------|
| Bx Frame Length (TM interval) | 1       | 1       | 1      | 6       | 6       | 6      | 18      | 18      | 18     |
| Words/Frame                   | 4       | 5       | 6      | 24      | 30      | 36     | 72      | 90      | 108    |
| Equiv. Words/Bx               | 4.00    | 5.00    | 6.00   | 4.00    | 5.00    | 6.00   | 4.00    | 5.00    | 6.00   |
| Equiv. Bits/Bx                | 256     | 320     | 384    | 256     | 320     | 384    | 256     | 320     | 384    |
| Data Rate [Gbps]              | 10.58   | 13.23   | 15.87  | 10.58   | 13.23   | 15.87  | 10.58   | 13.23   | 15.87  |
| Filler Rate [Gbps]            | 5.42    | 2.77    | 0.13   | 5.42    | 2.77    | 0.13   | 5.42    | 2.77    | 0.13   |
| Average Filler Words/Bx       | 2.05    | 1.05    | 0.05   | 2.05    | 1.05    | 0.05   | 2.05    | 1.05    | 0.05   |
| Average Filler Words/Orbit    | 7300.89 | 3736.89 | 172.89 | 7300.89 | 3736.89 | 172.89 | 7300.89 | 3736.89 | 172.89 |
| Average Filler Words/Frame    | 2.05    | 1.05    | 0.05   | 12.29   | 6.29    | 0.29   | 36.87   | 18.87   | 0.87   |
| Payload Bits/Frame            | 256     | 320     | 384    | 1536    | 1920    | 2304   | 4608    | 5760    | 6912   |
| Algo Clock @ 64b i/f[MHz]     | 160.32  | 200.4   | 240.48 | 160.32  | 200.4   | 240.48 | 160.32  | 200.4   | 240.48 |



| 40.08    |
|----------|
| 66       |
| 25.78125 |
| 9.74613  |
|          |

|                               |         | TM1     |         |         | TM6     |         | TM18    |         |         |  |
|-------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|--|
| Bx Frame Length (TM interval) | 1       | 1       | 1       | 6       | 6       | 6       | 18      | 18      | 18      |  |
| Words/Frame                   | 7       | 8       | 9       | 42      | 48      | 54      | 126     | 144     | 162     |  |
| Equiv. Words/Bx               | 7.00    | 8.00    | 9.00    | 7.00    | 8.00    | 9.00    | 7.00    | 8.00    | 9.00    |  |
| Equiv. Bits/Bx                | 448     | 512     | 576     | 448     | 512     | 576     | 448     | 512     | 576     |  |
| Data Rate [Gbps]              | 18.52   | 21.16   | 23.81   | 18.52   | 21.16   | 23.81   | 18.52   | 21.16   | 23.81   |  |
| Filler Rate [Gbps]            | 7.26    | 4.62    | 1.97    | 7.26    | 4.62    | 1.97    | 7.26    | 4.62    | 1.97    |  |
| Average Filler Words/Bx       | 2.75    | 1.75    | 0.75    | 2.75    | 1.75    | 0.75    | 2.75    | 1.75    | 0.75    |  |
| Average Filler Words/Orbit    | 9787.22 | 6223.22 | 2659.22 | 9787.22 | 6223.22 | 2659.22 | 9787.22 | 6223.22 | 2659.22 |  |
| Average Filler Words/Frame    | 2.75    | 1.75    | 0.75    | 16.48   | 10.48   | 4.48    | 49.43   | 31.43   | 13.43   |  |
| Payload Bits/Frame            | 448     | 512     | 576     | 2688    | 3072    | 3456    | 8064    | 9216    | 10368   |  |
| Algo Clock @ 64b i/f [MHz]    | 280.56  | 320.64  | 360.72  | 280.56  | 320.64  | 360.72  | 280.56  | 320.64  | 360.72  |  |



## **Project hierarchy and floor planning**



Fig 22: Project hierarchy in Vivado

Fig 23: Project floor planning



## **APx Firmware shell**

Iridis – 64b66bbased optimized signaling method and firmware cores for CMS Trigger applications







## **APx test**

Link 11

MGT X1V36/TX MGT X1V36/RX 25,776 Gbps 1,333E14

MOT X1V38/TX MOT X1V38/EX 25.781 Gbos 1.333E14

MGT\_X1Y39/TX MGT\_X1Y39/RX 25.781 Gbps 1.333E14

NOX 25.781 Gbps 1.333E14

% Unk 12 % Unk 13

% Link 14 % Link 15

| Name                      | TX                         | RX             | Status        | Bits                 | Errors | BER       | BERT Reset | TX Reset | RX Reset | RX Pola | TX Pattern                 |    | RX P               |
|---------------------------|----------------------------|----------------|---------------|----------------------|--------|-----------|------------|----------|----------|---------|----------------------------|----|--------------------|
| % Link 0                  |                            | MGT_X1Y0/RX    |               | 1.535E14             |        | 6.513E-15 | Reset      | Reset    | Reset    | - R     | PRBS 31-b#                 | v  |                    |
| S Link 1                  |                            | MGT_X1Y1/RX    |               | 1.535E14             |        | 6.514E-15 | Reset      | Reset    | Reset    | 1       | PRBS 31-bit                |    | PRB!               |
| % Link 2                  |                            | MGT_X1V2/RX    |               | 1.535E14             |        | 6.515E-15 | Reset      | Reset    | Reset    | . Ø     | PRBS 31-bit                | v  |                    |
| S Link 3                  |                            | MGT_X1Y3/RX    |               | 1.534E14             |        | 6.517E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                |    | PRB!               |
| S Link 4                  |                            | X MGT_X1Y40/RX |               | 1.632E14             |        | 6.126E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                |    | PRB!               |
| % Link 5                  |                            | X MGT_X1V41/RX |               | 1.632E14             |        | 6.127E-15 | Reset      | Reset    | Reset    |         | PABS 31-bit                |    | PRB!               |
| % Link 6                  |                            | X MGT_X1V42/RX |               | 1.632E14             |        | 6.129E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                |    | PRB!               |
| Nunk 7                    |                            | X MGT_X1Y43/RX |               | 1.63E14              |        | 6.133E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                | Ŷ  | PRB1               |
| % Link 8                  | MGT_X1Y44/T                | X MGT_X1Y44/RX | 25.784 Gbps   | 1.621E14             |        | 6.169E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                | ¥  |                    |
| No. Link 9                | MGT_X1Y45/T                | X MGT_X1Y45/RX | 25.781 Gbps   | 1.612E14             |        | 6.203E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                | ÷  | PRB1               |
| % Link 10                 | MGT_X1 ¥46/T               | X MGT_X1Y46/RX | 25.781 Gbps   | 1.603E14             |        | 6.238E-15 | Reset      | Reset    | Reset    |         | PRBS 31-br                 | v  |                    |
| % Link 11                 | MGT_X1Y47/73               | X MGT_X1Y47/RX | 25.781 Gbps   | 1.595E14             | OEO    | 6.271E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                | v  | PRB!               |
| % Link 12                 | MGT_X1Y48/T                | X MGT_X1Y48/RX | 25.781 Gbps   | 1.584E14             | 0E0    | 6.313E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                | 1  | PRB!               |
| % Link 13                 | MGT_X1 ¥49/T               | X MGT_X1Y49/RX | 25.785 Gbps   | 1.571E14             | OEO    | 6.364E-15 | Reset      | Reset    | Reset    |         | PRBS 31-bit                | ¥  | PRBS               |
| % Link 14                 | MGT_X1 150/T               | X MGT_X1Y50/RX | 25.781 Gbps   | 1.561E14             | OEO    | 6.405E-15 | Reset      | Reset    | Reset    | 0       | PRBS 31-bit                | v  | PRB!               |
| S Link 15                 | MGT XLYS1/T                | X MGT X1V51/RX | 25.781 Gbps   | 1.553E14             | OEO    | 6.44E-15  | Reset      | Reset    | Reset    |         | PRBS 31-bit                | 4  | PRB1               |
| cl Console   Message<br>Q | s Serial VO                | Links x 5      | erial UO Scan | s                    |        |           |            |          |          |         |                            | 7. | . 🗆 🛛              |
| iame                      | TX                         | RX             | Status        | Bits                 | Errors | BER       | BERT Reset | TX Reset | RX Reset | RX Pola | TX Pattern                 |    | RX Patte           |
| Ungrouped Links (0)       |                            |                |               |                      |        |           |            |          |          |         |                            |    |                    |
| S Link Group 0 (12)       |                            |                |               |                      |        |           | Reset      | Reset    | Reset    |         | PRBS 31-bit                | v  | PR85 31            |
| S Link 4                  | MGT_X1Y28/TD               | X MGT_X1Y28/RX | 25.781 Gbps   | 1.333E14             | 0E0    | 7.504E    | Reset      | Reset    | Reset    |         | PRBS 31-bit                | ~  | PR85 31            |
|                           | MGT_X1Y29/T                | X MGT_X1V29/RX | 25.775 Gbps   | 1.333E14             | 0E0    | 7.504E    | Reset      | Reset    | Reset    |         | PRBS 31-bit                | v  | PR85 31            |
| Unk 5                     |                            | X MGT X1Y30/RX | 25.776 Gbps   | 1.333E14             | OEO    | 7.504E    | Reset      | Reset    | Reset    |         | PR8S 31-bit                | v  | PRBS 31            |
| S Link 6                  |                            |                |               |                      |        |           |            |          |          |         |                            |    |                    |
|                           |                            | X MGT_X1Y31/RX | 25.781 Gbps   | 1.333614             | 0E0    | 7.504E    | Reset      | Reset    | Reset    |         | PRBS 31-bit                | ~  | PR85 31            |
| N Link 6                  | MGT_X1Y31/D                |                |               | 1.333E14<br>1.333E14 |        | 7.504E    | Reset      | Reset    | Reset    |         | PRBS 31-bit<br>PRBS 31-bit |    | PRBS 31<br>PRBS 31 |
| 🕤 Link 6<br>🕤 Link 7      | MGT_X1Y31/D<br>MGT_X1Y32/D | X MGT_X1Y31/RX | 25.781 Gbps   |                      | 0E0    |           |            |          |          |         |                            | v  |                    |

0E0 7.504E

- Using both Firefly 25X12 Alpha module sets
- 515.625 MHz refclk frequency (zero rem.)
- All 124 paths tested to ≥ 1E14 bits of PRBS31 data with zero errors
- Some tweaking of fiber connections necessary for 25X12 modules

## **THERMAL PERFORMANCE (APX)**

- •At 16 W/cm a 12.5cm heatsink provides 200W of cooling potential assuming no significant ducting of air within the card
- •APxF has 3.4 W/C heat sink performance so 200W load will increase temperature by 59 degrees (25C to 84C) with cooling at full power
- •Observations:
- •200W FPGA power limit feasible at full fan power

PRBS 31-bit

PRBS 31-bit

PRBS 31-bit

PRRS 31-bit

PRBS 3 PRBS 3

PR85

PRBS

- •Care required when balancing design tradeoffs e.g. heat sink dimensions vs MGT route length
- •Lower die temperature -> capacity to reduce fan speed





Designing to maximize slot airflow utilization

- 1. Low restriction airflow path to FPGA heat sink
- 2. Low-profile, low load flyover zone
- VU13P FPGA Heat Sink 12.5×12.5 cm, 16% fill fin pattern Measured 3.4 W/°C (0.29 °C/W) at full 450 Watt fan power (lidded A2577 package)
- 4. Significant airflow obstructions
- 5. Optical module heat sinks located for pressure balance
- 6. FPGA exhaust heat zone



#### U13P Lidless Package Option



- Xilinx Data for A2577 Θ<sub>JC</sub> (die to case): FLGA (lidded): 0.05 °C/W FSGA (lidless): 0.01 °C/W
- At 200W, up to ΔT ≈ 8 °C savings versus the lidded package
- Comments:
  - Lidless interface a more exacting design—APx has a lidless heat sink design on file
  - Would optimize other thermal design aspects first (board layout, heat sink geometries)
  - When a device is operating near the thermal limit, small °C improvements → a large % increase in thermal margin



## **Serenity tests**

# **THERMAL PERFORMANCE (SERENITY)**

- •Explored using heat pipes and vapour chambers to allow "small" heatsinks
- Vapour chambers allow 200W dissipation with expected fan speed 10 of 15 with 90mm x 90 mm heatsink (i.e. compact)





#### See large variation depending on heatsink

•Want to keep FPGA temperature at 100 degrees or lower

