

**24<sup>th</sup> IEEE REAL TIME CONFERENCE** Quy Nhon, Vietnam

Optimization of the Upgraded Timing Distribution System of the LHCb experiment at CERN







Mauricio Féo – TU Dortmund / CERN



Maurício Féo - m.feo@cern.ch

https://indico.cern.ch/event/1109460/contributions/4893298/

#### https://ieeexplore.ieee.org/document/10115510



### The LHCb Upgrade



When:LHC Long Shutdown 2 (by 2022)<br/>For Runs 3 & 4 (2022 - 2029)Why:To increase statistics

9 fb<sup>-1</sup> (Runs 1-2)  $\rightarrow$  50 fb<sup>-1</sup> (Runs 1-4)

How: Increasing instant. luminosity  $5x \text{ higher } \rightarrow L_{inst} = 2x10^{33} \text{ cm}^{-2}\text{s}^{-1}$ Increasing readout rate  $1 \text{ MHz} \rightarrow 40 \text{ MHz}$ 

This requires some **main changes**:

- Replace many sub-detectors:
  - New Tracking System (VeLo, UT, SciFi)
  - Partially new Particle ID System (RICH1 + RICH2)
- Replace of ALL the electronics:
  - No more hardware trigger
  - Event selection in software
  - Completely new DAQ system



### From Collision to Memory



The data path LHC

- LHC beams are divided in bunches that cross in synchrony with a clock signal ~40MHz = 25ns / cross
- Not all bunches are filled → no collision
   We need to select which
   bunch crossings to save
- Particles arrive at different times depending on the subdetector We need to phase-align the clocks per subdetector and have fixed latency

Maurício Féo - m.feo@cern.ch

# The TFC System: A Real-Time Architecture







### The TTC-PON Project



Timing, Trigger and Control for Master/Follower architecture RF2TTC Passive Optical Networks OLT = Master (Link to project) ONU = Follower **FPGA-Based System for TFC** Slow Control Implemented distribution with fixed latency Everything controllable from the master 9.6 Gbps downstream with FEC (8.0 for the user) **USER PAYLOAD** FEC 200b 28h BC period (25ns) ONU 2.4 Gbps upstream 8b10b (Time Division Multiplexing) **GBT FPG** ONU1 off-detector preamble data ONU2 gap 41.7ns detectors 58.3ns 25.0ns 140b 100b 60b -----USER PAYLOAD ... HDR

56b

Total: 80b (100b with 8b10b)

BC Trigger Unit orbit OLT ONU ONU **GBT FPG**/ **GBT FPGA** GBTx **GBTx GBTx GBT**x **FE chip FE chip FE** chip **FE chip** 

OLT: Optical Line Terminal / ONU: Optical Network Unit

### **GBT** Project: The GigaBit Transceiver

- The GBT Project provides a radiation hard chipset for handling control and data acquisition on frontend boards
- Designed at CERN, it is widely used on the upgrade of its experiments
- provides firmware It also а component (GBT-FPGA) that allows FPGAs to interface directly with the GBTx chip.



For more info:



### Optimization of the TFC System

- Clock recovery uncertainty
  - Among backend electronics: ~70ps
  - Between backend and frontend: ~4ns
- The optimization aims at reducing the timing uncertainty between the Control cards and the Frontend electronics by an order of magnitude
  For all ~2000 FEE GBT ctrl links
  From 4ns to less than ~500ps





### **Control Card: Previous Clock Architecture**





### Alternative #1: keeping all clocks inside

#### LHCD THCP

#### AGAINST RECOMMENDATIONS!!

Inject the 240MHz CDR clock into the global clock network and use it for all GBT XCVR banks

- Phase uncertainty falls to ~15ps !!
- Poor clock quality causes some FEE links to lose lock eventually
- Metastability on the CDC 40→240 at the GBT banks



Recovered clock phase at the FEE after a clock loss



#### Control Card (SOL40)

### Alternative #2: PLLs in Zero-Delay Mode

- Configure the SI5345s PLLs to receive 40MHz in ZDM mode.
- Borrow an unused clean 40MHz output to be used as SYS clock.
- Carefully apply timing constraints
- Excellent phase uncertainty:
  - ~15ps after clock loss
  - ~30ps after FPGA reprogramming
- jitter significantly reduced • TIE across the whole system
- Worked perfectly!
  - until it did not... 🛞







Si5345 Rev B presenting weird symptoms...

- The ZDM option worked well across LHCb until a major test where the FE control links of a specific subdetector would not work anymore.
- FE links losing lock and other symptoms of too many bit errors
- It happens after configuring the Si5345 in ZDM mode.
- Only in the PCIe40s of a specific server. (identical to working ones)
- The PLL remains responsive. You can reset, reconfigure, etc. It appears normal but it will only work again after a full power cycle.
- Due to lack of time and difficulty in debugging, the idea was put on hold.





AN1006: Differences Between Si534x/8x Revision B and Revision D Silicon

#### SUMMARY

Compared to Revision B, Revision D silicon for Si534x/8x fixes several errata, supports higher maximum output frequency ranges, and offers several new features. This document outlines those differences.

### Alternative #3: Measure and Shift the Clocks

- We measure the clock phases using a DDMTD\* implemented in firmware and shift the PLL clocks to a specific setpoint.
- The setpoint is the middle of the stability windows found when scanning the phase space.
- The phase measurement needs to compensate for internal delays in the FPGA.
  - Timing reports need to be generated for every compilation.
- Complex operation when compared to previous alternatives.

| Phase shift resolution |       |
|------------------------|-------|
| Internal PLL           | 104ps |
| Si5345                 | 72ps  |





Maurício Féo - m.feo@cern.ch

#### 16/17

## Alternative #3: Measure and Shift the Clocks

- Phase uncertainty within the experiment requirements
- FE control links stable and reliable
- A lot of effort to automatize the phase shifting mechanism but it works
- Currently in use at LHCb



Recovered clock phase at the FEE after a clock loss









- Initially, any clock loss would cause the LHCb subdetectors to lose time alignment due to a phase change in the clock propagated to the FEE.
- Different implementations were studied to provide fixed latency and deterministic phase after the event of a clock loss.
- Alternative #2 (PLLs in ZDM mode) showed the best results, however, a mysterious
  problem when configuring the PLLs does not allowed us to use it with all subdetectors.
- Alternative #3 (clock phase shifting) proved to work reliably and has been chosen as the solution.
- Since then, the LHCb subdetectors are able to keep proper time alignment even after the eventual loss of clock or similar problem.

# Obrigado!

Questions are welcome :)

### Alternative #2: TIE Jitter



