A Low-Power Wave Union TDC Implemented in FPGA

Wu, Jinyuan
Fermilab
Yanchen Shi and Douglas Zhu
Illinois Mathematics and Science Academy
Sept. 2011
Imperfect designs degrade performance of ICs, including CPU/GPU considerably.

ASIC devices are built using older technology and suffering similar design degrading.

FPGA internal structure causes extra performance degrading in addition to design degrading.

Design modification in FPGA is easier so that design degrading can be minimized.

Carefully designed FPGA may have better performance than typical ASIC.
A 32-channel Wave Union TDC firmware has been implemented in an Altera Cyclone III FPGA device (EP3C25F324C6N, $73.90) and has been tested on a Cyclone III evaluation card.

Low-power design practice has been applied for applications in vacuum.

Time measurement function is tested on 16 channels and typical delta t RMS resolution between two channels is 25-30 ps.

Power consumption is measured for 32 channels at ~27 mW/channel.
The Wave Union TDC Implemented in FPGA
This scheme uses current FPGA technology 😊

- Low cost chip family can be used. (e.g. EP2C8T144C6 $31.68) 😊

- Fine TDC precision can be implemented in slow devices (e.g., 20 ps in a 400 MHz chip). 😊
Two Major Issues In a Free Operating FPGA

1. Widths of bins are different and varies with supply voltage and temperature.
2. Some bins are ultra-wide due to LAB boundary crossing
Auto Calibration Using Histogram Method

- It provides a bin-by-bin calibration at certain temperature.
- It is a turn-key solution (bin in, ps out)
- It is semi-continuous (auto update LUT every 16K events)

![Diagram showing auto calibration using histogram method](image)

- In (bin)
- DNL Histogram
- LUT
- Out (ps)
- 16K Events
- 

Sept. 2011, Wu Jinyuan, Fermilab jyw168@fnal.gov

A Low-Power Wave Union TDC Implemented in FPGA
Good, However

- Auto calibration solved some problems 😊
- However, it won’t eliminate the ultra-wide bins 😞
Wave Union Launcher A

Wave Union TDC records multiple transitions.

Regular TDC records only one transition.

Wave Union Launcher A

In

CLK

0: Hold

1: Unleash

Sept. 2011, Wu Jinyuan, Fermilab jyw168@fnal.gov

A Low-Power Wave Union TDC Implemented in FPGA
Wave Union Launcher A: 2 Measurements/hit
Sub-dividing Ultra-wide Bins

Device: EP2C8T144C6
- **Plain TDC:**
  - Max. bin width: 160 ps.
  - Average bin width: 60 ps.
- **Wave Union TDC A:**
  - Max. bin width: 65 ps.
  - Average bin width: 30 ps.

Plain TDC:
- Max. bin width: 160 ps.
- Average bin width: 60 ps.

Wave Union TDC A:
- Max. bin width: 65 ps.
- Average bin width: 30 ps.
Low-power Design Practices
Intrinsically the Wave Union TDC is a low-power scheme.

Multiple measurements are made with one set of delay line, register encoder etc. yielding finer resolution that otherwise needs several regular TDC blocks to achieve.
The Sampling Register Arrays are clocked at 250 MHz.
All other stages are clocked at 62.5 MHz.
When a valid hit is sampled, the Sampling Register Array is disabled so that the registered pattern is stable for 64 ns.
The Data Load/Transfer Registers are enabled to load input 64 ns, so that a valid hit is guaranteed to be load once and only once.
The Data Load/Transfer Registers are enabled to load input 64 ns, (i.e., 4 clock cycles at 62.5 MHz).

The Data Load/Transfer Registers transfer data from other channels when they are not enabled to load.

Four channels share an Encoder and a Buffer with Zero Suppression.
The hit time for each of the 16 channel inputs is digitized and encoded.

Data from 4 channels are buffered and data from 4 groups of 4 channels are merged together.

Raw hit times are converted to fine time through automatic calibration block.

Data from all 16 channels are buffered and sent out via 4 pairs of LVDS ports @250 M bits/s.
Test Results
## The Test Hardware

### 2008
- **Altera Cyclone II + VME (~$1k)**
  - FPGA: EP2C8T144C6 ($28.80)
  - 16 channel: 25 ps
  - 2 channel: 10 ps
  - 81 mW/channel

Ref: Search “Wave Union TDC”

### 2011
- **Altera Cyclone III Starter Kit ($211+$50)**
  - FPGA: EP3C25F324C6N ($73.90)
  - 32 channel: 30 ps (25 ps with linear power supply)
  - 27 mW/channel

[www.altera.com](http://www.altera.com)
Test Setup
Output Raw Data and Typical Delta T Histogram Between Two Channels

![Histogram Image]

- RMS of this histogram is 25 ps.

<table>
<thead>
<tr>
<th>CH[3..0]</th>
<th>Coarse Time, LSB=4ns, TC[11..0]=96-4095, full range = 4000*4ns=16us</th>
<th>Fine Time, LSB=4000 ps/256=15.625ps</th>
</tr>
</thead>
<tbody>
<tr>
<td>23</td>
<td>22</td>
<td>21</td>
</tr>
<tr>
<td>00003C</td>
<td>C064A6</td>
<td>F064B8</td>
</tr>
</tbody>
</table>
Delta T Between NIM Inputs

- TDC channels internally ganged together has smallest standard deviation of time differences.
- Typical channel pairs sharing same fan-out unit has 30 ps RMS.
- Timing jitters of the fan-out units add to the measurement errors.
Time Measurement Errors Due to Power Supply Noise

- Typical RMS resolution is 25-30 ps.
- Measurements with cleaner power (diamonds) is better than noisy power (squares).
## Specifications

<table>
<thead>
<tr>
<th>Specification</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>RMS Resolution (Delta T between two channels)</td>
<td>25 to 30 ps</td>
</tr>
<tr>
<td>Same channel re-hit time interval</td>
<td>64 ns</td>
</tr>
<tr>
<td>Temporary buffer capacity</td>
<td>128 hits/(4 ch)/(16 us)</td>
</tr>
<tr>
<td>LVDS output port rate</td>
<td>250 M bits/s/port</td>
</tr>
<tr>
<td>Output capacity in each LDVS output port:</td>
<td>128 hits/(16 ch)/(16 us)</td>
</tr>
<tr>
<td>Number of LVDS output ports:</td>
<td>1, 2, 3, 4/(16 ch)</td>
</tr>
<tr>
<td>Power Consumption (Core only)</td>
<td>9.3 mW/channel</td>
</tr>
<tr>
<td>Power Consumption (Total)</td>
<td>27 mW/channel</td>
</tr>
</tbody>
</table>
Other Applications: Single Slope ADC

A Low-Power Wave Union TDC Implemented in FPGA
If You Want to Try

- The FPGA on the Starter Kit is fairly powerful.
- More than 16 pairs LVDS I/O can be accessed via the daughter card.
- FPGA can fit 32 channels but implementing 16 channels is more practical given the I/O pairs.
- TDC data are stored in the RAM on the board and can be readout via USB.
- A good solution for small experiment systems as well as student labs.

www.altera.com
DK-START-3C25N
Cyclone III FPGA Starter Kit
$211

www.altera.com
THDB-H2G
(HSMC to GPIO Daughter Board)
$50
The End

Thanks
Timing Uncertainty Confinement
Historical Implementation in ASIC TDC

Unnecessary Challenges = Extra Efforts + Reduced Performance

- Deadtime is unavoidable.
- Coarse time recording needs special care.
- Two array + encoder sets are needed for raising edge and falling edge.
- The register array must be reset for next event.
- The encoder must be re-synchronized with system clock in order to interface with readout stage.
In history, Gray code counters, double counters and dual registers + MUX are found in ASIC TDC coarse time counter schemes.

Theses are unnecessary if the TDC is designed appropriately.

In FPGA, a plain binary counter is sufficient.
Deadtimeless operation is possible.
No special care is needed for coarse time.
Both raising and falling edges are digitized with a single array + encoder set.
No resetting is needed for the register array.
The output is synchronized with the system clock and is ready to interface with readout stage.
The timing uncertainty between HIT and CLK is confined in the sampling register array.

All the remaining logics are driven by the CLK signal.

No special cares such as Gray code counter is needed for coarse time counter.
## Comparison

<table>
<thead>
<tr>
<th>Historical Scheme: HIT-&gt; CK; (c0..c31)-&gt;D;</th>
<th>Preferable Scheme: HIT-&gt; D; (c0..c31)-&gt;CK;</th>
</tr>
</thead>
<tbody>
<tr>
<td>Deadtime is unavoidable.</td>
<td>Deadtimeless operation is possible.</td>
</tr>
<tr>
<td>Coarse time recording needs special care.</td>
<td>No special care is needed for coarse time.</td>
</tr>
<tr>
<td>Two array + encoder sets are needed for raising edge and falling edge.</td>
<td>Both raising and falling edges are digitized with a single array + encoder set.</td>
</tr>
<tr>
<td>The register array must be reset for next event.</td>
<td>No resetting is needed for the register array.</td>
</tr>
<tr>
<td>The encoder must be re-synchronized with system clock in order to interface with readout stage.</td>
<td>The output is synchronized with the system clock and is ready to interface with readout stage.</td>
</tr>
</tbody>
</table>
More Measurements

- Two measurements are better than one.
- Let’s try 16 measurements?
Wave Union Launcher B: *16 Measurements/hit*

1 Hit
16 Measurements
@ 400 MHz
Delay Correction

The raw data contains:
- U-Type Jumps: [48-63] \( \rightarrow \) [16-31]
- V-Type Jumps: other small jumps.
- W-Type Jumps: [16-31] \( \rightarrow \) [48-63]

Delay Correction Process:
- Raw hits TN(m) in bins are first calibrated into TM(m) in picoseconds.
- Jumps are compensated for in FPGA so that TM(m) become T0(m) which have a same value for each hit.
- Take average of T0(m) to get better resolution.

The processes are all done in FPGA.
Test Result
NIM Inputs

RMS 10ps

140ps

0 1 2

BNC adapters to add delays @ 140ps step.

LeCroy 429A
NIM Fan-out

NIM/ LVDS

Wave Union TDC B
Wave Union TDC B
Wave Union TDC B
Wave Union TDC B
Wave Union TDC B
Wave Union TDC B
Wave Union TDC B
Wave Union TDC B
A Preferable Scheme

DLL Clock Chain

- Minimum setup time between the multi-sampling register array stage and the clock domain transfer stage: 17 clock taps.
- Setup time between the clock domain transfer stage and the encoder register: 32 or 16 clock taps.
- All outputs including TC are aligned with c0.
- Supports both raising and falling edges.

32-bit Encoder with Registered Outputs

HIT

Multi-Sampling Register Array

Clock Domain Transfer

Coarse Time Counter

DV EG T4..T0 TC
EG: Edge, =1: Raising or =0: Falling.
T4..T0: Time.
DV: Data Valid, =1 Valid edge detected.
It is used as PUSH signal for FIFO or Write Enable for other memory buffers.
A Low-Power Wave Union TDC Implemented in FPGA

- The Sampling Register Arrays are clocked at 250 MHz.
- All other stages are clocked at 62.5 MHz.
- When a valid hit is sampled, the Sampling Register Array is disabled for 64 ns.
- The Data Load/Transfer Registers are enabled to load input 64 ns, so that a valid hit is guaranteed to be load once and only once.
- The Data Load/Transfer Registers transfer data from other channels when they are not enabled to load.
- Four channels share an Encoder and a Buffer with Zero Suppression.
Test Setup

- The hit time for each of the 16 channel inputs is digitized and encoded.
- Data from 4 channels are buffered and data from 4 groups of 4 channels are merged together.
- Raw hit times are converted to fine time through automatic calibration block.
- Data from all 16 channels are buffered and sent out via 4 pairs of LVDS ports @250 M bits/s.
Test Setup

- The hit time for each of the 16 channel inputs is digitized and encoded.
- Data from 4 channels are buffered and data from 4 groups of 4 channels are merged together.
- Raw hit times are converted to fine time through automatic calibration block.
- Data from all 16 channels are buffered and sent out via 4 pairs of LVDS ports @250 M bits/s.
Wave Union Launcher B

In

Wave Union Launcher B

CLK

0: Hold
1: Oscillate
The wave union launcher creates multiple logic transitions after receiving an input logic step.

The wave union launchers can be classified into two types:
- Finite Step Response (FSR)
- Infinite Step Response (ISR)

This is similar as filter or other linear system classifications:
- Finite Impulse Response (FIR)
- Infinite Impulse Response (IIR)
Wave Union?