

# picoTDC: Pico-second TDC for HEP

Jorgen Christiansen, Moritz Horstmann, Lukas Perktold (Now AMS), Jeffrey Prinzie (KU Leuven) CERN/PH-ESE



### **HPTDC**

- History
  - Architecture initially developed at CERN for ATLAS MDT (design transferred to KEK)
  - CMS Muon and ALICE TOF needed similar TDC with additional features / increased resolution
- Features
  - 32 channels(100ps binning), 8 channels (25ps binning)
  - 40MHz time reference (LHC clock)
  - Leading, trailing edge and TOT
  - Triggered or non triggered
  - Highly flexible data driven architecture with extensive data buffering and different readout interfaces
- Used in large number of applications:
  - More than 20 HEP applications: ALICE TOF, CMS muon, STAR, BES, KABES, HADES, NICA, NA62, AMS, Belle, BES, , ,
    - We still supply chips from current stock.
  - Other research domains: Medical imaging,
  - Commercial modules from 3 companies: CAEN, Cronologic, Bluesky
  - ~50k chips produced
- 250nm technology (~10 years ago for LHC)
  - Development: ~5 man-years + 500kCHF.
  - Can not be produced any more
- http://tdc.web.cern.ch/TDC/hptdc/docs/hptdc\_manual\_ve r2.2.pdf









## Full 65nm picoTDC ASIC



Channels: 64

Binning: 3ps, 12ps, (400ps)

RMS: 1-2ps, ~4ps

Reference: 40MHz clock

Dynamic range: 100us (12bit@40MHZ)

Leading, Trailing, TOT

Hit rate: < 320MHz/channel

Data buffers per channel (~256 hits per channel)

Triggered/un-triggered

4 readout FIFOs

Flexible readout interface

Power: ~1W − 1/4 W



#### Time Measurement Chain





#### **TDC Trends**



**New detectors and sensors require new TDC** 



- 3ps binning (1-2ps RMS)
- . High integration
- . Flexible



#### TDC Architecture Prototyped in 130nm



Counter



- External time reference (clock).
- 3 stage time measurement:
  - Counter: 800ps, Delay locked loop: 25ps, Resistive interpolation: 6.25ps
- Self calibrating using Delay Locked Loop (DLL)
- Design: Lukas Perktold



### Resistive Interpolation (130nm)





#### Measured Performance



#### **Code Density Test**

 $INL = \pm 1.3 LSB$ 

RMS = < 0.43 LSB (2.2 ps)

#### **Expected RMS resolution from circuit simulations:** including quantization noise, INL & DNL

$$2.3~ps\text{-RMS} < \sigma_{qDNL/wINL} <~2.9~ps\text{-RMS}$$

INL can be corrected for in software

DNL, Noise and jitter can not be corrected (single shot measurements)



### Single Shot Precision

- Three measurement series using cable delays
  - Both hits arrive within one reference clock cycle
  - Second hit arrives one clock cycle later
  - Second hit arrives multiple clock cycles later (~5ns)



TWEPP2013 slides and paper: <a href="https://indico.cern.ch/event/228972/session/6/contribution/61">https://indico.cern.ch/event/228972/session/6/contribution/61</a> ESE seminar: <a href="https://indico.cern.ch/event/225547/material/slides/0.pdf">https://indico.cern.ch/event/225547/material/slides/0.pdf</a>



picoTDC

### Mapping to TSMC 65nm

- Uncertain long term availability of IBM 130nm (now Globalfoundries)
- 2x time performance: -> 3ps binning
- Lower power consumption: < ~½</li>
  - ~1/8 if DLL binning of 12ps enough (RMS ~4ps).
- Larger data buffers
- More channels
- Smaller chip
- But higher development costs



#### Low Jitter PLL

- Clock multiplication from 40MHz to 2.56GHz for coarse time counter and time interpolator
  - Low jitter critical: ~400fs
  - Jitter filtering of 40MHz clock to the extent possible
    - 40MHz reference MUST be very clean
  - LC based oscillator
- Internal clock for logic and readout: 320MHz
- Design: Jeffrey Prinzie, KU Leuven
- Status:
  - PLL circuit analysed and simulated
  - Detailed layout and optimization
  - Prototype submitted May 2015 (Synergy with LPGBT PLL)







#### DLL

- 32 taps, 12.2ps delay
- Self-Calibrating
- Jitter not as critical, doesn't pile up









### Resistive Interpolation and Drivers

- Get down to 3ps bins
- Drivers: tapered buffers, each driving 32 FFs
- Calibration separate for 32 channels each







### Capture Flip Flops

- New design, the 130nm version used too high power consumption for intended 64 channels
- Static M/S Flip Flop followed by dynamic Flip Flop for metastability resolution
- Monte Carlo simulations show a mismatch of 1.25ps RMS







### Hit Decoding

- Three level pipelined logic
- First two levels happen in capture register array:
   128 -> 28 Bits
- Timing vs. power very critical, running @2.56GHz, including ~25k Flip Flops





picoTDC



### **Full Timing Macro**





- 64 channels, DLL and resistive interpolator in the center
- Hit signal input on the left, 28 bits output on the right



### Post Layout Power Consumption

DLL + resistive interpolation: 40mW

Time distribution + calibration: 260mW

Capture registers: 250mW

Decoding: 50mW

> Total @ 3ps bins: 600mW

Total @ 12ps bins: 180mW



#### Sources of Measurement Errors

- Bin size 3.2ps -> 880fs RMS
- PLL: 400fs RMS phase Jitter
- DLL: 400fs RMS phase Jitter, INL/DNL can be calibrated
- Capture FFs: 1.25ps mismatch
- Additional sources: input clock jitter, receivers, signal preprocessing









picoTDC

## TDC Logic

- Synthesized logic from Verilog RTL
- Based on data driven architecture from HPTDC
  - Simplifications with individual buffers per channel
  - Clocking: 320, 160, 80, 40 MHz (hit rates and power consumption)
  - Trigger matching based on time measurements
- Reuse of HPTDC verification environment
  - This is ~½ the design effort!.
- New interfaces to be defined and implemented
  - Control/monitoring, Trigger, Readout
- SEU/radiation tolerance
  - 65nm technology TID tolerant
  - SEU detection and minimize effects from SEU when it can have major consequences (system sync)
    - As done in HPTDC
  - Not classified as rad hard



### picoTDC architecture



64 channels, 3ps or 12ps time binning

64 channels, 3ps: ~1W; 64 channels, 12ps: ~0.5W; 32 channels, 12ps: ~0.3W



#### Interfaces

- Power: 1.2v, ~1.0W (64 ch, 3ps),
   ~0.5W (64ch, 12ps)
   ~0.3W (32ch, 12ps)
  - (Not yet defined if 1.5v/2.5v for LVDS IO)
- Hits: Differential SLVS (LVDS)
- Time reference: 40MHz SLVS
  - Other clock frequencies required ?.
  - Low jitter reference critical for high time resolution (especially for large systems time measurements across many channels/chips/modules)
- Trigger/BX-reset/reset: Sync Yes/No, Encoded protocol
- Control/monitoring: GBT E-link and I2C
- Readout SLVS: 4 readout ports of 1-10 signals
- (JTAG boundary scan + production test ?)
- Packaging: ~250 FPBGA





#### Readout

- 1 or 4 readout ports
  - 4 ports: High rate applications (e.g. non triggered)
     16 TDC channels per port
  - 1 port: Low-medium rate
     64 channels (or 32channels in 32 channel mode)
- Readout data: 32bit words
  - Headers, trailers, TDC data, status, etc.
- Readout ports interface
  - Byte wise:
    - 40, 80, 160, 320 MHz
  - Serial:
    - 8B/10B or 64B/66B encoding
    - Low speed: 40, 80, 160, 320 Mbits/s
    - High speed: 2.56 Gbits/s
- TDC readout bandwidth:
  - Max:
     320MHZ x 8 x 4 = 10Gbits/s (~4Mhits/s per channel without triggering)
     2.56Gbits/s x 4 = 10Gbits/s
  - Min: 1 x 40Mbits/s = 40Mbits/s



### Schedule

| • | Interpolator circuit prototype:   | Done      |
|---|-----------------------------------|-----------|
| • | Technology choice:                | Done      |
| • | Final Specifications:             | 95%       |
| • | Finalize TDC macro:               | 95%       |
| • | PLL prototype:                    | Submitted |
| • | Final RTL model:                  | Q4 2015   |
| • | P&R and Prototype submission:     | Q1 2016   |
| • | Prototype test:                   | Q2 2016   |
| • | Final production masks/prototype: | Q3 2016   |
| • | Production lot:                   | Q4 2016   |



#### Resources

- R&D
  - 2-3 man-years chip design:
  - Main designer: Moritz Horstmann (CERN fellow)
  - PLL: Jeffrey Prinzie, Leuven (synergy LPGBT)
  - Supervision: Jorgen Christiansen
  - Low jitter/power SLVS differential: Synergy with LPGBT
  - Contribution from others?
    - Interfaces/RTL/FPGA test board: Paul Davids, Alberta?
    - Testing/characterization?
  - Prototyping, packaging, testing: ~Funded
- Put in production
  - NRE , Packaging, test
  - Shared engineering run?.
  - Funding from clients/users/projects required
    - No large user that can pay it all
    - To be defined in detail in 2016, When full prototype available.
      - Entry price to get access to chips
      - Pro-rata to number of required chips





# Backup Slides



#### Voltage Controlled Delay Cell (130nm)



- Fully differential cell
- Voltage controlled
- Single ended output

Approximate propagation delay

$$\delta \propto \frac{C_{eff} \cdot V_{Osc}}{I_{Bias}}$$

Post layout extracted simulation

@VDD = 1.2 V



### Delay Cell Simulation Results





# Drive Line Simulation (32 FF)





### Capture Scheme





#### **Synchronous**

#### **Asynchronous**



#### Time Measurements

#### **Start - Stop Measurement**

- Measure relative time interval between two local events
- Small local systems and low power applications start

stop

event



#### **Time Tagging**

- Measure "absolute" time of an event (Relative to a time reference: clock)
- For large scale systems with many channels all synchronized to the same reference



