# A multi-channel 24.4 ps bin size Time-to-Digital Converter for HEP applications C. Mester, C. Paillard and P. Moreira CERN, 1211 Geneva 23, Switzerland Christian.Mester@cern.ch #### Abstract A multi-channel time-tagging Time-to-Digital Converter (TDC) ASIC with a resolution of 24.4 ps (bin size) has been implemented and fabricated in a 130 nm CMOS technology. An on-chip PLL is used to generate an internal timing reference from an external 40 MHz clock source. The circuit is based on a 32 element Delay Locked Loop (DLL) which performs the time interpolation. The 32 channel architecture of the TDC is suitable for both triggered and non-triggered applications. The prototype contains test structures such as a substrate noise generator. The paper describes the circuit architecture and its principles of operation. #### I. Introduction Detectors in HEP applications often require high precision timing measurements. For example the ALICE Time of Flight (TOF) detector, which provides information for particle identification, requires a TDC bin size of 25 ps on 160 704 channels. This leads to an over-all resolution of the full TOF detector of 100 ps. Together with other ALICE subdetectors, the mass of a charged particle can be calculated, allowing to distinguish $\pi$ , K and p. Figure 1: HPTDC's RC interpolation scheme in 8 channel mode The HPTDC, an 8/32 channel high resolution time-to-digital converter that was previously developed at CERN in a 250 nm CMOS technology [1], is now in use in the LHC experiments ALICE [2, 3], ATLAS [4], CMS [5] and LHCb [6]. Its resolution is programmable and can be set to 100 ps or 25 ps (bin size) by trading the number of measurement channels for resolution. To implement the high resolution mode an RC interpolation scheme (fig. 1) that combines four channels into one high-resolution channel has been implemented. The principle of interpolation is to use four channels to perform four conversions with 100 ps resolution but 25 ps delayed from each other, allowing to obtain an effective resolution of 25 ps. This, however, reduces the number of usable channels from 32 per chip to 8 per chip. Simulations show that in a 130 nm CMOS technology, a basic resolution of 25 ps can be achieved with a non-interpolating architecture, increasing the potential of integrating a higher number of high resolution channels per chip reducing thus the number of chips required for high resolution applications by a factor of at least 4. A new TDC, the TDC130, has been planned to profit from the speed and integration potential of this technology. A prototype chip has been fabricated to evaluate the timing properties of such a TDC. The paper describes the architecture of the prototype and of the planned TDC, focussing on the time base. A novel interpolation scheme resulting in bin sizes smaller than a logic gate delay is presented. ## II. TDC130 ARCHITECTURE Figure 2: TDC130 prototype architecture To evaluate the timing precision that can be reached in the 130 nm technology, a prototype (fig. 2) was fabricated. It contains a Phase Locked Loop (PLL), a Delay Locked Loop (DLL), the hit registers of 32 channels and a band-gap voltage reference for biasing. A programmable noise generator with an independent clock input is implemented. This allows for evaluation of the sensitivity of the circuit to substrate and power supply noise, as it will be generated by the final chip's synchronous logic. Figure 3: Full TDC130 architecture The core of the planned TDC130 (fig. 3) is a delay locked loop that provides phase interpolation. Its reference signal, a clock with a period of 780 ps, is generated by an on-chip clock multiplying phase-locked loop from an external (LHCstandard) 40 MHz clock source. The 32 element DLL covers one 780 ps clock cycle, leading to a bin size of 25 ps. A counter, clocked by the DLL's input clock can be used to extend the dynamic range of the TDC according to the requirements of the application. The DLL is the global timing reference for the 32 TDC channels, which therefore have identical timing properties. Sharing the time base reduces the power consumption per channel. The time stamp, the digital representation of the time of the event, is relative to the 40 MHz input clock. In LHC applications it gives the time within a bunch-crossing interval. As both the PLL clock multiplication factor and the number of DLL elements are a power of 2, the bin size is a binary fraction of 25 ns, the input clock period. Encoding of the measurement is thus simplified. As the TDC130 is targeted at High Energy Physics (HEP) applications, it supports high hit rate of measurements (3 MHz per channel). Every channel will have a dedicated level 1 buffer which is fully independent of other channels. Once an event is signalled at the chip's input, the value of the 32 phase-shifted DLL outputs is stored in a bank of registers, called hit registers. Data from the hit registers can be transferred to the level 1 buffer once per clock cycle. This is contrary to the HPTDC, where a buffer was shared among 8 channels and access subject to arbitration depending on the other channels' activity. Triggering is a well established technique to reduce the required readout bandwidth: The time stamps for all the events are stored in the level 1 buffer. A trigger processor selects those events which might be interesting and signals them to all detectors. The trigger signal arrives at each TDC with a fixed latency. Only after this latency, data is read out from the level 1 buffers if a trigger has been received, or discarded otherwise. As the level 1 buffers are dedicated to individual channels, data in the buffers are always in perfect time order, simplifying the trigger logic compared to the HPTDC. The data of all channels of a chip is merged by common processing logic and readout circuitry. As the trigger rate is typically much lower than the hit rate, common circuits can run synchronously with the 40 MHz reference clock. In order to enable the use of the TDC130 in a large variety of applications, it can also be configured for non-triggered applications e.g., in mass spectrometer applications, all measurements have to be processed off-chip, and thus read out. Consequently, no data can be discarded on-chip since no trigger signal is available. ## III. TIME BASE ARCHITECTURE The core of the time base is a DLL (fig. 4). It consists of a Voltage Controlled Delay Line (VCDL), composed of 32 differential buffer delay elements. A clock signal is permanently propagating in this line. Its control logic assures that the propagation delay of the complete line is always equal to one clock cycle. A D flip flop serves as a bang-bang phase detector, comparing the VCDL's input with its output. A charge pump and a filter capacitor convert the digital phase detector output into a control voltage, which changes the bias current of the VCDL's delay elements and thus their propagation delay. Temperature and voltage variation effects are consequently automatically compensated for. A start-up state machine avoids that the DLL will lock to the wrong delay. Assuming perfect matching of the delay elements, each individual element's delay is equal to one clock cycle $T_{\rm clk}$ divided by the number of elements N. Let the leading edge arrive at the first delay elements input at time $t_0$ . If at a later time t, the leading edge is at the input of the $n^{\text{th}}$ element, t is within $\pm \frac{T_{\text{clk}}}{2N}$ of $t_n = t_0 + \frac{T_{\text{clk}}}{N} \left( n + \frac{1}{2} \right)$ . Fine tuning of the individual delay element's bias currents is used to reduce the effects of delay cell mismatch. The hit register banks are connected to the VCDL outputs. Once an event is signalled at the hit register input, the state of the DLL is stored and the position of the leading edge can be determined by digital logic at a later stage. Figure 4: A phase-interpolating DLL The dynamic range of a DLL is always limited to one clock cycle. Thus, in order to achieve a useful dynamic range, either the delay line must be very long (high number of delay elements), leading to linearity problems due to mismatch, or the input clock frequency must be very low, limiting the resolution. An alternative is to expand the dynamic range using a clock synchronous counter while using a short VCDL with a high clock frequency. This has proven to be a good solution in the past [1]. One specification to the TDC is the use of the LHC standard 40 MHz clock frequency, but as it has been discussed, a high frequency clock is required for high resolution and high linearity. A clock multiplying PLL is used to generate the 1.28 GHz DLL clock based on a 40 MHz input clock. The synchronous logic doesn't need to run at high frequency and uses the 40 MHz clock. ## IV. DELAY ELEMENTS In the asynchronous domain of the TDC, the timing of signals needs to have a precision comparable to a gate delay. The design is very sensitive to parasitics and needs to be very symmetric. As long as related signals are equally delayed, the conversion linearity is not affected. On the other hand, nonlinearities are caused by e.g. unequal propagation delays between the VCDL outputs and the hit registers. global process variations affect PMOS and NMOS transistors independently. Therefore, related signals, such as the VCDL tap outputs, have to have the same polarity. This means that the VCDL's delay elements have to be all non-inverting. To ease data processing, a constraint put on the TDC is that the bin size must be a binary fraction of the 40 MHz LHC reference clock period, 25 ns. Acceptable bin size are either 25 ps or 50 ps. A bin size of 100 ps has already been achieved by the HPTDC in 250 nm and doesn't justify the use of a 130 nm technology. A buffer can be used as delay element, while a single inverter cannot. A single-ended buffer consists of two inverters in series, thus two elementary gates. A differential buffer can be realized in one stage. For the 130 nm CMOS technology used, the minimum delay simulated for a single-ended buffer implemented with low $V_{\rm t}$ transistors is 45 ps. In order to use them in a DLL, their delay must be adjustable, usually employing a current starving technique. This further increases the minimum delay achievable. The nominal operating point must leave a margin both to higher and to lower delays, leading to a nominal delay considerably higher than 45 ps. Figure 5: Delay element with inductive peaking including an adjustable bias current source for mismatch compensation Differential buffers with adjustable delay may consist of a differential pair, a current source and load elements, usually diode connected transistors [7]. The delay of the buffer depends on the current provided by the tail current source. In the DLL, the bias voltage is not constant, but generated by the DLL control logic. Simulations show a delay of about 32 ps under nominal conditions and below 45 ps in worst case. Replacing the diode connected transistor load elements by transistors with series gate resistors (active inductors) (fig. 5) such that they show inductive properties around the operating frequency of the delay element, this delay can be further reduced, reaching a bin size of 25 ps. For such a small buffer delay, mismatch effects can significantly degrade the linearity of the TDC. To compensate for this, the tail current of each VCDL delay element is trimmed by using an additional, individually configurable current source, as shown in fig. 5. Using differential delay elements implies a conversion from differential to single-ended at some point in the asynchronous domain. Convenient is a conversion either immediately after the VCDL tap outputs or after the hit registers. In the latter case, the differential signals need to be propagated from the VCDL to the hit registers and the hit registers have to be differential. Both single-ended and differential hit registers have been simulated and their performance compared. The simulations were done under the constraint that the differential and single-ended registers would have to have the same recovery time from the metastable state. This leads to a supply current of 1 mA for a differential register, while the single-ended implementation only takes 54 $\mu$ A in nominal operating conditions. The noise, i.e. the standard deviation of the supply current, generated by the differential register is lower than that of the single-ended one only in relative terms compared to the average supply current. In absolute terms, the differential register's noise is slightly higher than the noise generated by the single-ended registers. For these two reasons, lower power consumption and lower supply noise, it has been decided to use single-ended hit registers and perform the conversion immediately after the VCDL. #### V. PERFORMANCE ENHANCEMENT For even higher resolution ( $\approx$ 6 ps) a novel interpolation scheme (fig. 6) is planned. The reference clock signal is propagated through a second DLL with an M element VCDL, which, on the contrary to the N element main DLL, does lock to a fraction of the period $\frac{m}{N}$ of the reference clock. The phase detector is connected to the end of the M element DLL and to the input of the $m^{th}$ element of the main DLL. The second DLL generates a control voltage $V_{\mathrm{Ctrl2}}$ such that its element's delay $t_{\mathrm{M}}=\frac{m}{M}t_{\mathrm{N}}.$ An interpolation factor of F=4 can be reached with M=4and m=5: $t_{\rm M}=\frac{5}{4}t_{\rm N}=\left(1+\frac{1}{4}\right)t_{\rm N}$ . Note that integer delays of $t_{\rm N}$ correspond to a shift of the time stamp in the hit register and can therefore be disregarded. If the control voltage $V_{\rm Ctrl2}$ is propagated to other delay elements which are equal to those inside the DLL, their delays are (in first approximation) identical. The incoming hit signals are propagated though a delay line composed of F - 1 = 3 elements with a delay $t_{ m M}$ with tap outputs before every element and after the last. A bank of hit registers is connected to each tap output. The reference clock signal is propagating though the two DLLs, but the delay lines in the channels carry the hit signals. As a result, an interpolation scheme similar to the RC fine interpolation of the HPTDC can be achieved while taking advantage of the autocalibration property of DLLs. Figure 6: Fine interpolation scheme Arrays of DLLs [8] can also provide sub-element delays using auto-calibrating DLLs without the need to distribute an analogue control voltage across the channels. Unfortunately, they cannot be built with a number of elements which is a power of 2. In addition, the reference clock signal is permanently propagating though all DLLs, increasing the power consumption. Furthermore, a large number of signals needs to be distributed across the chip, requiring large buffers. Each buffer dissipates roughly as much power as one delay element. The architecture proposed here is thus more efficient from both the area and the power consumption point of view. ## VI. POWER CONSUMPTION The prototype's power consumption is estimated to be 300 mW. For comparison, the previous HPTDC, including synchronous logic, not present in the TDC130 prototype, consumes 1300 mW in high resolution (100 ps) mode. # VII. SUMMARY A high resolution TDC in a standard 130 nm technology has been planned and a prototype fabricated to evaluate the resolution of the proposed VCDL circuit. Experimental verification is being prepared. A novel interpolation scheme has been described and will be implemented in a future prototype. ## REFERENCES - [1] M. Mota, J. Christiansen, A High-Resolution Time Interpolator Based on a Delay Locked Loop, IEEE Journal of Solid-State Circuits, Vol. 34, No. 10, pp. 1360–1366, Oct. 1999 - [2] ALICE Addendum to the Technical Design Report of the Time of Flight System (TOF), CERN/LHCC 2002-016, ISBN 92-9083-192-8, Apr. 2002 - [3] M. Bondila et al., ALICE T0 detector, Nuclear Science, IEEE Transactions on , vol.52, no.5, pp. 1705–1711, Oct. 2005 - [4] P. B. Amaral et al., The ATLAS level-1 central trigger system, Nuclear Science Symposium Conference Record, 2004 IEEE, vol. 3, no., pp. 1673–1677 Vol. 3, 16-22 Oct. 2004 - [5] A. Parenti, The CMS Muon System and Its Performance in the CMS Cosmic Challenge, Nuclear Science, IEEE Transactions on , vol. 55, no.1, pp. 113–121, Feb. 2008 - [6] LHCb Outer Tracker Technical Design Report, CERN-LHCC-2001-024, ISBN 92-9083-200-2, 2001 - [7] J. G. Maneatis, Low-jitter process-independent DLL and PLL based on self-biased techniques, IEEE Journal of Solid-State Circuits, vol. 31, Issue 11, Nov. 1996, pp. 1723–1732 - [8] M. Mota, J. Christiansen, A four channel, self-calibrating, high resolution, Time To Digital Converter, Proceedings of the 5<sup>th</sup> IEEE International Conference on Electronics, Circuits and Systems (ICECS'98), Lisbon, Portugal, September 1998 This research project has been supported by a Marie Curie Early Stage Research Training Fellowship of the European Community's Sixth Framework Programme under contract number MEST-CT-2004-007307 MITELCO.