### TEPX as a high-precision luminosity detector for CMS at HL-LHC BRIL Trigger Board and Online Pixel Clustering on FPGA

### Mykyta Haranko

#### TWEPP 2022 Bergen, Norway, 23rd September 2022







# **BRIL Project**

- measurement for the CMS experiment

#### **BRIL in HL-LHC** $\bigcirc$

particles



- TEPX will require dedicated timing and processing infrastructure



# Upgraded TEPX

#### **TEPX is the part of the CMS Inner Tracker**

- In preparation to the high-luminosity operation phase of the LHC the CMS Inner Tracker will be fully replaced
  - Sensors with a 25 x 100 um pitch
  - Two types of modules : 2x1 and 2x2 chips
    - Only 2x2 chip modules in TEPX
    - Each chip connects 432 columns x 336 rows of a sensor
  - The new system is designed to operate at a trigger rate of **1000kHz**, whereas the expected physics trigger rate is around **750kHz** 
    - Plenty of headroom for extending the functionality
    - Luminosity will be measured by sending additional 75kHz of triggers (10%)
  - A part of the system (Disk 4 Ring 1 D4R1) will not be used for tracking
    - Luminosity and beam-induced background will be measured by sending up to 1000kHz of triggers







# **Back-End Systems**

- Front-end modules optically connected to Int boards in ATCA format
  - IT-DTC will be based on the Apollo platform
  - Each board will perform book-keeping of Physics and Luminosity triggers
    - Data triggered for **physics** will be sent to the central **CMS DAQ**
    - Data triggered for **luminosity** will be sent to **Luminosity Processor Boards** (also Apollo)

### • The talk will cover two "back-end" topics

- Timing distribution and generation of luminosity triggers
- Luminosity processor firmware



### Front-end modules optically connected to Inner Tracker Data Trigger and Control (IT-DTC)

ninosity triggers CMS DAQ sity Processor Boards (also Apollo)



#### Apollo prototype



# **Timing Distribution**

- $\bigcirc$ 
  - Its head is the **TCDS Captain**

  - Outside of stable beams CMS clock is **not locked** to the LHC clock
- **BRIL Trigger Board (BTB)** is introduced to
  - Generate **luminosity triggers** for TEPX and TEPX D4R1



- Overcome the clocking limitation mentioned above

#### • BTB will serve the role of TCDS captain for the TEPX D4R1 crate

- This will allow D4R1 to be operated outside of stable beams and measure beam-induced background when CMS is not running
- The synchronisation to global CMS commands will be preserved



### In CMS timing and control are distributed through the **Timing and Control Distribution System (TCDS)**

- DAQ and Timing Hub (DTH) located in each ATCA crate receives the TCDS stream and forwards it to other ATCA cards

# **BRIL Trigger Board**

- BRIL Trigger Board will be based on the **Serenity platform**
- In preparation for BRIL Phase-2 TDR (CMS-TDR-023) the proof of concept for BTB has been shown
  - Implemented transcoding of CMS TCDS2 stream into the LHC clock
  - Performed a set of measurements at ramping LHC clock to proof stability of front end and back end
- Recently
  - Serenity collaboration is working on the new revision of Serenity which, among other features, will be capable to receive external LHC clock and transmit transcoded TCDS stream to the D4R1 crate
  - Development of the **BTB firmware** has been started at BRIL



### Serenity prototype





## **BTB Firmware**

- The firmware is developed in the EMP framework
- Many interfaces are provided by EMP and CMS-TCDS2 firmware
- Core BRIL-specific elements are
  - BPTX recovery
    - Generic LVDS receiving module has been implemented
  - TCDS2 transcoding (synchronisation)
    - Implemented and tested (presented last time)
  - Trigger generation
    - Implemented general infrastructure and two basic trigger algorithms
- Above elements are in the verification stage (simulation + hardware)





## **BPTX Recovery**

- Beam Pickup Timing Experiment (BPTX) provides Beam 1 and Beam 2 signals from LHC pickups
  - The signals will be recovered by BPTX back-end system and forwarded to the BRIL Trigger Board with **LVDS lines**
- Generic LVDS recovery interface has been implemented for UltraScale (UltraScale+) devices
  - Tunes phase of the incoming data using the I(O)DELAYE3 elements (automatic or manual)
    - Configurable cascading of I(O)DELAYE3 elements to allow maximum delays of N x 1.25 ns (1.1 ns)
  - 8 x Oversampling of the incoming data (if supplied with 40MHz and 160MHz clocks - results in 320MHz effective rate)
  - Word alignment (**automatic** or **manual**)
- Automatic phase tuner operates by finding a bit transition and adjusting the delay to avoid it
- The interface module is being verified











# **Trigger Generation**

#### The target of BRIL is to sample LHC orbits in the most efficient way having only 75 (1000) kHz - Different trigger algorithms are being considered for the final application N (N+1) N+2 Bunch ID 3 • Uniform - trigger for *seq\_length*, then *interval* without triggers, Config seq\_length interval offset on the next orbit Trigger • RAM - up to 64 orbits (configurable) can be pre-calculated in software,

- - stored in BRAM and replayed
  - for example prioritise filled bunches over empty prime candidate for the final algorithm
- checking functionality



• Multi-scaler - implements an OR trigger on various conditions (each can be prescaled) - allows to prioritise certain events,

• The trigger generator top-level module implements a prescaler, rate measurement and trigger rule









## **BTB Summary**

#### BRIL Trigger Board is in fairly advanced development stage

- Operation on ramping clock has shown for earlier front- and back-end revisions
- BRIL-specific algorithms are being finalised
- Interfaces to most systems will be provided by EMP framework and TCDS2 firmware
- Outlook
  - Lab tests with new Serenity revision will have to be established
  - ongoing

Current discussions about increasing luminosity trigger rate during VdM/emittance scans (negotiation with Global Trigger) are



# **Processing Architecture**

- Earlier BRIL studies have shown good linearity of per-event cluster counts vs pile-up
  - Lumi processor boards were proposed to perform **pixel hit clustering and** counting
- Same processing structure for TEPX and TEPX D4R1
  - TEPX up to 704 chips per IT-DTC, planned trigger rate 75kHz @ PU200
  - TEPX D4R1 up to 80 chips per IT-DTC, planned trigger rate 1000kHz @ **PU200**
- Above results in 24Gbps (80Gbps) rates between IT-DTCs and Lumi processor boards for TEPX (D4R1)











# **BRIL Pixel Clustering**

- Test clustering algorithm has been developed to proof the concept

  - Each instance of the algorithm decodes events from one or more chips
  - Arranged as a set of processors (state machines) containing a shallow input buffer each
  - Depending on the properties of the cluster candidates they are forwarded from one processor to another
  - The algorithm operates at **320MHz clock**



#### - Charge information is ignored, position and size of the final resulting clusters are not calculated, only cluster counts





## Performance

### • Algorithm has shown excellent performance using CMSSW simulation data (against offline algorithm)

- There is a small fraction of mismatches being investigated

#### **Resource utilisation**

- Successfully placed **176 instances** (to process 352 chips) per VU13P FPGA
- No timing failures
- Less than 50% of FPGA resources

| <b>Resource type</b>                             | Absolute | utilisation          | Available o         | on VU13P             |
|--------------------------------------------------|----------|----------------------|---------------------|----------------------|
| Logic slices                                     | 96150    |                      | 216000              |                      |
| Block RAM (BRAM) tiles                           | 704      |                      | 2688                |                      |
|                                                  | 1        |                      |                     |                      |
| SLR3<br>X7715<br>X7714<br>X7713<br>X7712<br>SLR2 | ττγτχ    | х779<br>X778<br>SLR1 | <u>х777</u><br>х776 | X7Y5<br>X7Y4<br>SLR0 |
| Х6Ү15<br>Х6Ү14<br>Х6Ү13<br>Х6Ү12                 | X6Y11    | х6 <sup>79</sup>     | х6ү7<br>Х6Ү6        | X6Y5<br>X6Y4         |
| X5Y15<br>X5Y14<br>X5Y13<br>X5Y12                 | X5Y10    | X5Y9<br>X5Y8         | X5Y7<br>X5Y6        | X5Y5                 |
|                                                  |          |                      |                     |                      |
|                                                  |          |                      |                     |                      |
|                                                  |          |                      |                     |                      |
|                                                  |          |                      |                     |                      |





## **BRIL Verification - RTL Simulation**

### • cocotb framework (Python) is used to implemented simulation test bench for BRIL algorithms



Test Bench (including stimulus generator, model and test routines) Aldec Riviera (ModelSim) simulator





## **BRIL Verification - Hardware**

- BRIL algorithms are verified on a dedicated hardware platform
- Aldec HES-XCVU9P-ZU7EV board purchased in 2020
  - Similar data flow as in BRIL Lumi Processor (Apollo) or BRIL Trigger Board (Serenity)
  - Convenient desktop solution

















## **Verification - Clustering Example**

### • **Simulation** (slower, low statistics)

- Verifying cluster counts directly
- Measuring and optimising buffer occupancies and sizes

### • **Hardware** (fast, high statistics, less debug information)

- Count checker on FPGA with an error buffer
- Error buffer stores only
  - Mismatched events
  - Events with errors or warnings





Events failed on hardware can be directly injected into the simulation and studied





# **Clustering Summary**

- Clustering algorithm has shown
  - Good agreement with the sophisticated offline reconstruction algorithm
  - Capability of handling data at 1.33 MHz (1MHz required)
  - Sufficiently low resource utilisation, margin for extending the functionality
- **Current** activities
  - Regenerating CMSSW simulation data in order include most recent geometry and data format updated
- Outlook
  - Certain fraction of unidentified mismatched events need to be studied
  - Implementing the threshold filtering on hits closer match to the offline algorithm
  - There are overlaps between different layers of TEPX
    - Coincidence counting can provide even more linear response







# **BRIL Projects**

#### • Histogramming module

- Two flavours (beam synchronous and asynchronous)
- Fully generic and parametrisable
- Supports Xilinx 7-series or newer
- Already collects data at CMS (demonstrator systems)

#### • Pixel hit clustering algorithm

- 1.33MHz event processing rate (requirement 1MHz)
- Operates at **320MHz (in a constrained FPGA region**)
- Dynamic buffer load balancing
- Extensive status monitoring
- Fully operational and validated against the advanced software algorithm

#### • Validation of algorithms on Aldec HES-XCVU9P-ZU7EV

- Bring-up of CentOS8 on Zynq MPSoC
- IPbus interface between FPGAs (AXI C2C)
- Automated test benches running on VU9P (Gitlab CI)
- Hardware access queue

#### Column 0, Row 0

|  |   | 1 |
|--|---|---|
|  |   | 1 |
|  | 1 | 1 |

Column 0, Row 1

1

1





#### Clustering illustration



#### 176 instances on VU13P FPGA



## Input Data

- Digitally CROC is split in **cores (64 pixels** each)
- Data is transferred in **quarter cores** 
  - On chip, each quarter core is **2 rows x 8 columns** of **50 x** 50 um pixels
  - CMS sensor pitch is **25** x **100** um
  - A specific chip-sensor bump-bond mapping will be applied



- As a consequence, from the processing point of view quarter cores are 4 x 4 pixels



2





## Hardware Platforms

- Serenity
  - Will act as OT-DTC, **BRIL Trigger Board (BTB)**
  - Targets **connectivity**

- - Name is self-descriptive
  - Will be placed in every ATCA crate to communicate with TCDS Captain







### • DAQ and Timing Hub (DTH)

### Apollo

- Will act as IT-DTC, OT Track Finder, ATLAS LOMDT, BRIL Lumi Processor
- Targets **processing power**



![](_page_20_Picture_15.jpeg)

![](_page_20_Picture_16.jpeg)

![](_page_20_Picture_17.jpeg)

![](_page_20_Picture_18.jpeg)

### **BRIL Lumi Processor**

- On board data flow
  - Lumi boards are fully dedicated to **pixel clustering** and cluster count **histogramming**
  - Several modules (quarter ring) are accumulated in a histogram

![](_page_21_Figure_4.jpeg)

BRIL Lumi Processor FPGA

**BRIL Lumi Processor CPU** 

# Processing Example

![](_page_22_Figure_1.jpeg)

![](_page_22_Picture_2.jpeg)

# **Known Algorithm Peculiarities**

- Does not calculate the position of the cluster  $\bigcirc$
- Does not use ToT  $\bigcirc$
- Can not distinguish merged (touching clusters)  $\bigcirc$
- Rare fail condition:  $\bigcirc$ 
  - normally longer cluster comes earlier (so first #3 then #1) still error flag just in case

![](_page_23_Figure_6.jpeg)

![](_page_23_Figure_7.jpeg)

- In the row merger: first #1 and #2 get merged, then #3 has to be appended, but it's longer, which causes addressing issues - should never happen in fact,

Most will be addressed

![](_page_23_Picture_10.jpeg)

![](_page_23_Picture_11.jpeg)