#### Advanced technique for high accuracy tunable ring oscillator vernier TDC in FPGAs and ASICs



*E.* Bechetoille, <u>C. Girerd</u>, H. Mathez Université de Lyon 1, Villeurbanne ; CNRS/IN2P3, Institut de Physique Nucléaire de Lyon.









Workshop on picosecond photon sensors for physics and medical applications

1

# Outlines

- Basics of vernier ring oscillator TDCs
- FPGA implemention (ALTERA)
- Tuneable ring oscillators design
- Test results
- Limitations
- Hybrid architecture
- ASIC implementation
- Conclusion

### **Ring-Oscillator based TDC Architecture**



- Simple architecture: Low Area, low consumption, can implemented in standard cells
- The TDC resolution is given by the frequency difference between oscillators
- In theory the resolution can be very small, as small as the frequency difference

| ASIC       | [1] Youngmin Park<br>Wentzloff | [2] Jianjun<br>Yu |  |
|------------|--------------------------------|-------------------|--|
| Process    | 65 nm                          | 130 nm            |  |
| Resolution | 1 ps *                         | 8 ps              |  |
| DNL/INL    | 0.5/0.8 **                     | 0.5/0.8           |  |

| FPGA       | [3] Sachin<br>S. Junnarkar | This<br>Work |  |
|------------|----------------------------|--------------|--|
| Process    | ALTERA Stratix II          | Cyclone III  |  |
| Resolution | 11.8 ps (rms)              | ~ 10 ps *    |  |
| DNL/INL    | 0.5 / 1                    | -            |  |

\* under test

\*Range 0 to 130 ps \*\* Simulation

## Ring oscillators Vernier TDC timing



Step 1 The slow oscillator is started on the START signal

Step 2 The fast oscillator is started on the STOP signal

Step 3 At each period the Fast clock get an advance of  $\Delta t = (T0-T1)$  over the slow clock

Step 4 The two oscillators are stoped when there are in phase and counters are latched

with 
$$T_1 < T_0$$
  $T = (N_0 \cdot T_0) - (N_1 \cdot T_1)$ 

The delay measurement is

$$T = T_0 \cdot (N_0 - N_1) + N_1 \cdot \Delta t$$

The TDC résolution is given by:  $T_0 - T_1 = \Delta t$ 

### Simple Ring oscillator



-The number of stages must be odd to allow oscillations -The number of stages determines the oscillator period -The OSCILLATOR is gated by an AND for start and stop.

#### Phase detector

This is the commonly used phase detector in this type of TDC



# ALTERA FPGA DEFINITIONS



#### ALTERA FPGA implementation of ring oscillator RING OSC vernier TDCs



#### TUNABLE RING OSCILLATORS IN FPGA



Goal :

- Modifying the frequency by a digital control, on line.
- Being able to target a specific  $\Delta t$  between two oscillators

#### Techniques:

- Using the propagation delay variations of logic cells
- Modifying the path of the signal in the chain
- Preference for structures with low variations



Condition : Somme Cells must be different (e.g.  $T_{pass}(n) \neq T_{pass}(n+1)$ )

- Involuntary due to silicon dispersion.
- Voluntary by choosing inverters of difference strength (propagation delay)

#### TUNEABLE OSCILLATORS Moving inverters ©



The basic element can be replaced by an XOR cell



The control word can change the position and the number of inverters



The number of selected inverters must be odd (oscillation condition)

For a chain of n Elements the number of possibilities Y is :

$$Y = \sum_{\substack{p=2k+1\\(k \in N, p \le n)}} C_n^p \qquad \qquad Y = C_8^1 + C_8^3 + C_8^5 + C_8^7 = 128$$

The number of TDCs (combination of 2 oscillators)

$$128^2 = 16384$$

But how to quantify the differences between all these combinations



#### TUNEABLE OSCILLATORS Moving inverters ©

More generally if we have **N** available family of XOR gate The list of possible differences for one change in one oscillator is **N.(N-1)** 

$$\left| (\Delta_{a,b})_{a \neq b \in [1,n]} = (Tpass_a - Tinv_a) - (Tpass_b - Tinv_b) \right|$$

These differences depends one the XOR family (a vs b) and between function passing/inverting in the same family (pass/inv)

CONCLUSION : In order to obtain frequency variations, we must use the maximum of different XOR gate in each oscillator

In ASICs it is easy as the technology offers fewer versions of XOR gates with differents strenght.

In FPGA, it is not obvious, because combinational logic is implemented in look up tables

#### Propagation delay characteristics of LCELL in ALTERA FPGA

- The combinational parts is Look Up table (65536) possibilities of equations of 4 inputs

![](_page_14_Figure_2.jpeg)

We observed that in the ALTERA FPGA the XOR propagation delays depends on the input which is used. Example with 4 XOR gates using different inputs

| Entrée | RR (rising input / Rising<br>output) ps | FF Falling input Falling<br>output ps |
|--------|-----------------------------------------|---------------------------------------|
| А      | 471                                     | 481                                   |
| В      | 494                                     | 496                                   |
| С      | 324                                     | 316                                   |
| D      | 177                                     | 155                                   |

# Example of chain with 8 XORs using inputs B,C,D input A is used for selection

Timing report for a chain of 8 XOR. Routing has been manually modified to use 3xC inputs, 3xD inputs and 2xB inputs.

| Colonne1 | Colonn | Colon | Colonn | Colo | Colonne6            | Colonne7             |                    |                    | h  |
|----------|--------|-------|--------|------|---------------------|----------------------|--------------------|--------------------|----|
| 0        | 0      | FF    | CELL   | 1    | FF_X4_Y28_N17       | inst11 q             |                    | usc                | ,u |
| 0        | 0      |       |        | 1    | FF_X4_Y28_N17       | inst11               |                    | $\mathbf{C}$       |    |
| 0.509    | 0.509  | FF    | IC     | 1    | LCCOMB_X5_Y28_N12   | inst2 inst21 datac   | $\bigcirc$         |                    |    |
| 0.812    | 0.303  | FF    | CELL   | 1    | LCCOMB_X5_Y28_N12   | inst2 inst21 combout |                    |                    |    |
| 1.222    | 0.41   | FF    | IC     | 1    | LCCOMB_X5_Y28_N18   | inst2 inst20 datac   |                    | D                  |    |
| 1.504    | 0.282  | FR    | CELL   | 1    | LCCOMB_X5_Y28_N18   | inst2 inst20 combout | $\bigvee$          |                    |    |
| 1.735    | 0.231  | RR    | IC     | 1    | LCCOMB_X5_Y28_N16   | inst2 inst17 datad   |                    | П                  |    |
| 1.884    | 0.149  | RF    | CELL   | 1    | LCCOMB_X5_Y28_N16   | inst2 inst17 combout |                    | В                  |    |
| 2.178    | 0.294  | FF    | IC     | 1    | LCCOMB_X5_Y28_N6    | inst2 inst24 datab   |                    |                    |    |
| 2.527    | 0.349  | FF    | CELL   | 1    | LCCOMB_X5_Y28_N6    | inst2 inst24 combout |                    | C                  |    |
| 2.942    | 0.415  | FF    | IC     | 1    | LCCOMB_X5_Y28_N24   | inst2 inst30 datac   | $\bigvee$          | U                  |    |
| 3.224    | 0.282  | FR    | CELL   | 1    | LCCOMB_X5_Y28_N24   | inst2 inst30 combout |                    |                    |    |
| 3.456    | 0.232  | RR    | IC     | 1    | LCCOMB_X5_Y28_N2    | inst2 inst22 datad   |                    | D                  |    |
| 3.605    | 0.149  | RF    | CELL   | 1    | LCCOMB_X5_Y28_N2    | inst2 inst22 combout | , Ļ,               |                    |    |
| 3.901    | 0.296  | FF    | IC     | 1    | LCCOMB_X5_Y28_N20   | inst2 inst19 datab   |                    | R                  |    |
| 4.25     | 0.349  | FF    | CELL   | 1    | LCCOMB_X5_Y28_N20   | inst2 inst19 combout | $\bigvee$          |                    |    |
| 4.657    | 0.407  | FF    | IC     | 1    | LCCOMB_X5_Y28_N10   | inst2 inst18 datac   |                    | •                  |    |
| 4.939    | 0.282  | FR    | CELL   | 1    | LCCOMB_X5_Y28_N10   | inst2 inst18 combout |                    | C                  |    |
| 5.169    | 0.23   | RR    | IC     | 1    | LCCOMB_X5_Y28_N0    | inst2 inst4 datad    | Ţ,                 |                    |    |
| 5.318    | 0.149  | RF    | CELL   | 1    | LCCOMB_X5_Y28_N0    | inst2 inst4 combout  |                    |                    |    |
| 5.831    | 0.513  | FF    | IC     | 1    | DDIOOUTCELL_X5_Y29_ | inst8 d              | $  \qquad \forall$ | $\boldsymbol{\nu}$ |    |
| 6.306    | 0.475  | FF    | CELL   | 1    | DDIOOUTCELL_X5_Y29_ | inst8                |                    |                    |    |
|          |        |       |        |      |                     |                      | ↓ ↓                |                    |    |

#### **TDC** bloc diagram

![](_page_16_Figure_1.jpeg)

#### Calibration results on hardware

The calibration consist in sweeping all the combination of slow and fast oscillators (in this case with 3 inverters among 8) when obtained 3146 combinations

![](_page_17_Figure_2.jpeg)

The calibration result is a list of resolutions with the corresponding TDC selection. We can choose any TDC among these 3146.

## DNL histogram (50 ps TDC selected)

This DNL histogram is obtained by Sweeping the delay of inputs signals from 0 to 1 ns by step of 10 ps (100 measures by step)

![](_page_18_Figure_2.jpeg)

![](_page_18_Figure_3.jpeg)

Differential non linearity : +/- 15 ps max < 0.3 LSB

![](_page_18_Figure_5.jpeg)

### Test of 10 ps TDC

DNL histogramme : Input delay variation step is 10 ps 100 measurements for each step. The range is 830 ps.

Mean Bin width =11,8 ps

![](_page_19_Figure_3.jpeg)

![](_page_19_Figure_4.jpeg)

20

#### Problems and limitations e.g. cumulated jitter on a 37 ps TDC selection

![](_page_20_Figure_1.jpeg)

We observed that the TDC jitter increases with the delay range. This means that for low resolutions TDC the delay range must be reduced to preserve a low jitter regarding the resolution. Visualization of the cumulated jitter for a R.O. of T=3.3 ns (7x inverters) 1 ns / div period jitter= 9.4 ps cycTocyc=12.5 ps (40GSps)

![](_page_21_Figure_1.jpeg)

# Cumulated jitter vs. number of cycle periods for different ring oscillators

![](_page_22_Figure_1.jpeg)

#### HYBRID Architecture: Vernier + Delay Chain

Goal : dividing the measurement range into small parts in order to reduce the impact of jitter

![](_page_23_Figure_2.jpeg)

#### Hybrid architecture (preliminary results)

![](_page_24_Figure_1.jpeg)

The maximum oscillation cycles is about 12 (excepted for the widest bin)

![](_page_25_Figure_0.jpeg)

The loop chain by itself measure 75µm x 6µm (considering only the NAND and the 10-XOR) Power consumption = SUM( $C_{load}$ )\*U<sup>2\*</sup>freq Asic has less interconnection than FPGA =>  $C_{load}$  is smaller

Choice : *Interleaved* layout. But could have been *folded* easily

![](_page_25_Figure_3.jpeg)

![](_page_25_Figure_4.jpeg)

http://micrhau.in2p3.fr/spip/spip.php?article117

#### Schematic Simulation verified

- The XOR is wether an inverter or a buffer depending on the selection bit.
- Rising and falling edges are different for each selection (inv or buf)
- The resulting oscillating period are around 2.8 and 2.9ns.

![](_page_26_Figure_4.jpeg)

2.82

2.775

27

2.725

23

0.0

Period range ~ [2.725 2.825

300.0

(tris :

400.0

500.0

600.0

period histo

100.0

200.0

27

![](_page_27_Figure_0.jpeg)

## Conclusions

- We presented a new technique of tuneable oscillators to be used in vernier TDC or other applications with high sensitivity in frequency adjustment.
- We found that the cumulated jitter prevent us to exploit all the potential of this oscillators
- We started to test an hybrid architecture with delay chain and multiple phase detector stage to reduce the jitter effect
- An ASIC implemention has been designed and sent in foundry. The main difference with FPGA is that more XOR families can be inserted in the ring oscillator chain and that the routing can be optimized. The test will allow to quantify the gain performance, especially for the jitter performance, can be achieved in this configuration.

## Références

- [1] Implementation of sub-nanoseconds TDC in FPGA: applications to time-of-flight analysis in muon radiography
- [2] FPGA-Based High Area Efficient Time-To-Digital IP Design
- [3] Performance and area tradeoffs in space-qualified FPGA-based time-of-flight systems
- [4] An FPGA wave union TDC for time-of-flight applications
- [5] FPGA-Based Self-Calibrating Time-to-Digital Converter for Time-of-Flight Experiments
- [6] Upgrading of Integration of Time to Digit Converter on a Single FPGA
- [7] Area efficient time to digital converter (TDC) architecture with double ring-oscillator technique on FPG
- [8] FPGA based self calibrating 40 picosecond resolution, wide range Time to Digital Converter
- [9] The 10-ps wave union TDC: Improving FPGA TDC resolution beyond its cell delay
- [10] Implementation of High-Resolution Time-to-Digital Converters on two different FPGA devices
- [11] Several Key Issues On implementing delay line Based TDCs using FPGAs
- 12-14 march 2014 Workshop on picosecond photon sensors for physics and medical applications

## ANNEXES

# Calibration results minimum frequency differences

![](_page_31_Figure_1.jpeg)

We observed that period differences (between slow and fast oscillator) as low as < 1 ps can be obtained with the moving inverter method in FPGA.

But it does not guarantee that TDC will work well for these very low resolutions.

#### Consequence on period variations in the ring oscillator

As seen before the list of possible  $(\Delta_{a,b})_{a\neq b\in[1,n]} = (Tpass_a - Tinv_a) - (Tpass_b - Tinv_b)$ variation is in theory given by Tiny-Tpass vs. Cell familly 0.18 Remark : It is possible to have an estimation 0.16 using the min/max delay as the timing 0.14 analyser make a transitions analysis 0.12 ns 0.1 5008 0.06  $(T_{max} - T_{min})$  for each 8 cells in the chain 0.04 0.02 0 р D D С С С В В Cell familly (8 XOR chain) All possible dT for 8 XOR (56 possibilities) 0.2 0.15 Then we calculate all the possible delays 0.1 N.(N-1) = 7\*8 = 560.05 dT ns 0 23 25 27 29 31 35 37 39 52 -0.05 -0.1 -0.15 -0.2 **Reordered delays** 

12-14 march 2014 Workshop on picosecond photon sensors for physics and medical applications

### Test of 10 ps TDC

RMS jitter of TDC measurement (range 1 ns)

![](_page_33_Figure_2.jpeg)

Conclusion : The major limitations is due to jitter,

- The measurement range must be limited (it is proportional to resolution)
- The oscillator frequency must be as high as possible

# Example of chain with 8 XORs using inputs B,C,D

![](_page_34_Figure_1.jpeg)

We observed that the difference between different CELL families is relatively constant This confirms the input choice is a good way to control the CELL propagation delay.

### Test of 10 ps TDC

TDC response input delay variation by step of 10 ps range 1 ns.

![](_page_35_Figure_2.jpeg)

The cumulated jitter effect limits the range of measurement.

#### Hybrid architecture (preliminary results)

![](_page_36_Figure_1.jpeg)

The histogram of the latch values gives the width of each delay chain element

# Other example of TDC response with input step of 10 ps (50 ps TDC selected)

![](_page_37_Figure_1.jpeg)

The TDC measure the time difference of two clock signal provided by an External generator (Agilent 81150 A input jitter ~ 25 ps) ove a range of 1 ns by step of 10 ps.

## Test setup

![](_page_38_Figure_1.jpeg)

Remark : For the moment test are made with a not optimal setup

- Standard FPGA board not dedicated for fast timing measurement
- TDC inputs are LVCMOS 3.3V
- Output jitter of generator (25 ps rms)

#### BASIC PRINCIPLE OF VERNIER

Example : 
$$T_{slow} - T_{fast} = \Delta t$$
  $T_{slow} = 8 \cdot \Delta t$ 

![](_page_39_Figure_2.jpeg)

- Starting from 0, each period the FAST clock take a advance of  $\Delta T$
- In this example after 8 clock periods the two clocks are in phase.
- But the the FAST clock has on period more.

#### TDC CASE

![](_page_40_Figure_1.jpeg)

- In the TDC case the 2 clock starts with any phase shift
- The initial phase shift is proportional to the counters
- Dead time depend on T0/dT ratio (8 dans l'exemple)
- In this example the dead time is 5\*T1.
- Delay measurement less than T0 only depend on dT
  - $T_0$  and  $\Delta t$  are obtained by calibration

![](_page_40_Figure_8.jpeg)

![](_page_40_Figure_9.jpeg)

## Jitter (50 ps TDC selected)

![](_page_41_Figure_1.jpeg)

Remark : This include the input jitter which is about 30 ps.

#### **FPGA IMPLEMENTATION**

- Example of Ring oscillator schematic

![](_page_42_Figure_2.jpeg)

- For the synthesis tool it is a COMBINATIONAL LOOP which a bad design practice
- Also without explicit LCELL instantiation the synthesis tool will optimize the reduce the inverter chain to only one inverter
- LCELL buffers prevent the synthesis tool to optimize the design
- A VHDL description is also possible

In practice : It is better to synthesize the ring oscillator in open loop in order to allow the timing analysis.

![](_page_43_Figure_0.jpeg)

## Post compilation edition

It is possible to change the input used (A,B,C or D) by the look up table and then To slightly modify the propagation delay

![](_page_44_Figure_2.jpeg)

12-14 march 2014 Wo

#### **REGIONS IMPORTATION in TOP DESIGN**

![](_page_45_Figure_1.jpeg)

## Clock reference as stop signal

![](_page_46_Figure_1.jpeg)

#### CLOCK REFERENCE FASTER THAN SLOW OSCILLATOR

![](_page_47_Figure_1.jpeg)

## $T_{slow}$ calibration

![](_page_48_Figure_1.jpeg)

- Tslow can be calibrate using an external and stable clock reference

$$\boxed{N_{calib} \cdot T_{slow} + E = N_{ref} \cdot T_{ref}} \begin{cases} T_{slow} = \frac{T_{ref} \cdot N_{ref}}{N_{calib}} - \left(\frac{E}{N_{calib}}\right) \\ Err_{max} = \frac{T_{slow}}{N_{calib}} \end{cases}$$

- Error on Tslow is minimized by  $\mathrm{N}_{\mathrm{calib}}$ 

![](_page_49_Figure_0.jpeg)

- If SLOW and FAST are free running they will periodically reach a minimum phase shift

-The number of clocks periods between this coincidences gives the period difference

gives 
$$\Delta t = \frac{T_{slow}}{N_1}$$

Vernier definition

We can also measure directly Tfast with the clock reference as for Tslow

![](_page_50_Figure_0.jpeg)

## Calibration results Phase detector measurement

![](_page_51_Figure_1.jpeg)

The consistency is checked by comparing the difference of slow and fast oscillator Frequencies (blue curve) with the measure obtained with the phase detector (red curve)

## **Dispersion between FPGAs**

![](_page_52_Figure_1.jpeg)

## Cumulated jitter vs Time

![](_page_53_Figure_1.jpeg)

- The cumulmated jitter does not depend on the frequency oscillator
  Cumulated jitter increases linearly with time.
- A periodic component could exists depending on setup configuration
- (i.e. in our case a 40 MHz oscillator entering in the FPGA)