



International School of Trigger and Data Acquisition



# Towards pico-seconds Time Digitization in FPGAs

### Jinhong Wang (jinhongwang@ustc.edu.cn)

June 22, 2024



#### 2008 Beijing Olympics, 100m results (men)



World records (by now), 100m: 9.58 (men); 10.49 (women) ~10 m/s

Milliseconds!



Particles travelling at a speed Approximating the speed of light!

- 1 ms (10^-3) → ~ 300 km
- 1 ns (10^-9) → 0.3 m (30 cm)
- 10 ps (10^-11) → 0.003 m (3mm)





 $\sim 3 \times 10^8 \, m/s$ 



BES III (Beijing Spectrometer III), separate K/ $\pi$  with 2 $\sigma$  at 1GeV/c  $\rightarrow$  100 ps timing precision for the TOF (time of Flight)  $\swarrow$ 

$$\sigma^{2} = \sigma_{detector}^{2} + \sigma_{bunch}^{2} + \sigma_{Z}^{2} + \sigma_{elec.}^{2} + \sigma_{time-walk}^{2}$$

$$\sigma_{detector} \sim 80 \ ps, \ \sigma_{Z} \sim 10 \ ps, \ \sigma_{bunch} \sim 35 \ ps \ \sigma_{time-walk} \sim 10 \ ps$$

$$\sigma_{elec.} < 25 \ ps$$

#### Measuring muon momentum through its bending trajectories



- Magnet field bends the trajectory of the muons
- Momentum is measured through "inner-middle-outer" layers



#### Trajectory reconstruction



**3D** imaging





# **Introduction:** Examples of timing techniques



See "Introduction to FPGAs" by Hannes Sakulin

# Introduction: Essentials of timing techniques



# **Time Digitization in FPGAs: Counter**



## Time Digitization in FPGAs : dual Counter



# Time Digitization in FPGAs : phase interpolation

• Four phases of 320 MHz, (0, 90, 180, 270), equivalently ~780 ps resolution



- Limited by number of finer clock phases (generated from clock managers)
   → plus, difficulty in keeping alignment of clock edges (cons)
- Pros: simple, and easy to implement with macro blocks
- Achieved resolution: ~ 100s pico-seconds

Progress by now: ~ hundreds ps

# **Pico-seconds**?



1: identify finer time intervals (e.g., if a "component"/unit/cell with ~50 ps delay is found, and there are many of them to be added together uniformly)



Pioneered by Jinyuan Wu @ Fermilab Cascade chain (2003) https://ieeexplore.ieee.org/stamp/stamp.js p?tp=&arnumber=1352025

Inspired by Wu's idea, A carry-chain version was invented @ USTC https://ieeexplore.ieee.org/stamp/stamp.js p?tp=&arnumber=1610982

Nowadays, the delay of typical carry-chain cell is around 10s picoseconds

(e.g., add as many components up to 3.125 ns, the "counter" clock period)

# With Carry-chain as the "Delay"





b) Rout in a SLICE

# **Forming a chain of "Delay"**



# Forming a chain of "Delay"

### In Xilinx (AMD) FPGAs





# **Principle:** tapped-delay-line approach



https://ieeexplore.ieee.org/document/5446507

# Determine delay time of "Delay"



# Determine delay time of "Delay"









Average Bin size: Tclk/n

Or, since bin width is proportional to its counts, calculate the bin width for each bin

# **Timing Performance Evaluation**

- Generation of a stable time interval, pico-seconds stability
  - $\rightarrow$  High resolution signal generator (e.g., AWG)
  - $\rightarrow$  Delay line approach



- Power splitter splits input pulse into two branches
- Each branch share similar time uncertainty, thus could be reduced by measuring their relative difference
- A/B are two "identical" channels in FPGA (TDC) → timing uncertainty for a single channel: 1/sqrt(2)

### Cabel Delay Test: Mean=5097.9ps



• Bin size (average)

...

- Nonlinearity (DNL/INL)
- Timing uncertainty (RMS)

# The WaveUnion Approach



Ref: https://ieeexplore.ieee.org/document/4775079

# WaveUnion A (Jinyuan Wu @Fermilab)



# WaveUnion A (Jinyuan Wu @Fermilab)



# WaveUnion A (Jinyuan Wu @Fermilab)



- Plain TDC:
  - delta t RMS width: 40 ps
  - 25 ps single hit
- Wave Union TDC A:
  deltat RMS width: <u>25 ps</u>
  <u>17 ps</u> single hit

# WaveUnion B (Jinyuan Wu @Fermilab)



Does more oscillation cycles always mean better performance?

### **Principle of the 10-ps FPGA TDC**



https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5941022

### Signal Processing of the Raw TDC time



### Signal Processing of the multi-averaging TDC

### ightarrow RMS timing precision ( $\sigma_{delay}$ ) vs. N

- Non-uniformed distribution of the carry chain delay ( $\sigma_{cell}$ )
- Random uncertainty of the oscillation period ( $\sigma_{osc}$ )
- $\bullet$  Other contributors, e.g. the steady of the clock ( $\sigma_{other})$

$$\sigma_{\text{delay}} = \sqrt{\frac{N-1}{4} \times \sigma_{\text{osc}}^2 + \frac{1}{N} \times \sigma_{\text{cell}}^2 + \sigma_{\text{other}}^2}$$

#### Three possible cases:

• Case 1: 
$$\sigma_{osc} \ll \sigma_{cell}$$
  $\sigma_{delay} \approx \frac{1}{\sqrt{N}} \times \sigma_{cell}$   
• Case 2:  $\sigma_{osc} \approx \sigma_{cell}$  The best timing @  $N = \left[ 2 \times \frac{\sigma_{cell}}{\sigma_{osc}} \right]$   
• Case 3:  $\sigma_{osc} \gg \sigma_{cell}$   $\sigma_{delay} \approx \frac{\sqrt{N-1}}{2} \times \sigma_{osc}$ 



### **Simulation and Test**



Actual implementation falls in to Case 2





LSB(ps)

### **Simulation and Test**

### >Bin size vs. N

Effective Bin size:

 $\mathbf{C}_{\mathrm{eff}} = \sum_{i=1}^{N} \mathbf{C}_{i}$ 

Scales as 1/N





### Pros and Cons

 $\checkmark$  Larger N results in smaller bin size, lower timing precision



Trade-off should be made between TDC timing performance and N

# Outlook

- sub-10 ps resolution/precision time digitization already achieved in FPGAs (15 years ago)
   → is it possible to overcome the 1 ps barrier?
- TDL approach is the most popular one in achieving sub-10 ps resolution, but not the only one, the choice of architecture is a trade off among power, resource utilization ...
- Timing resolution/precision is just one single measure of timing performance, among linearity, power, resource utilization...

→ Nowadays, FPGAs typically not limited by configuration logic resources, thus more attention on power efficiency might be necessary.



#### Top secrets before class...



### Bin-by-Bin Code Density Calibration

- TDC digitize hits with evenly spread arrival times.
- A histogram is booked.
- Number of counts in each bin is proportional to the width of the bin.



- In the auto calibration process, a bin width histogram (DNL histogram) is first booked.
- More counts are accumulated in wider bins.







- The random hits have statistical fluctuation, and the variation is large with limited calibration events.
- Hits with evenly spread arrival times are more desirable for calibration.

Jun. 2021, Wu Jinyuan, Fermilab jywu168@fnal.gov

### Generating Clocks with Smooth Phase Drift Using Cascaded PLL



- Two stages of PLL circuits are cascaded together.
  - f(CK250a) = 250 MHz
  - f(CK251c) = 250.06 MHz
  - f(CK251c) = (4096/4095)\*f(CK250a)
  - T(CK250a) T(CK251c) = 0.97 ps.

Jun. 2021, Wu Jinvuan, Fermilab jvwu168@fnal.gov

# What if TDL is not long enough?







- ACTEL FPGA
  - Flash
    - IGLOO、、PROAISC3(E)
  - Anti-fuse
    - AXCELERATOR、SX\_A、 RTAX\_SSL

### • TDC

- Flash Buffer
  - Bin Size~440ps
- Anti-fuse
  - Bin Size ~ 80ps



#### ACTEL FPGA : A3PE1500

