Charge Pump Clock Generation PLL for the Data Output Block of the Upgraded ATLAS Pixel Front-End in 130 nm CMOS


a University of Bonn, Physics Department, Nussallee 12, 53115 Bonn, Germany
b CPPM, Aix-Marseille Universite Marseille, CNRS/IN2P3, Marseille, France
c INFN, Genova via Dodecaneso 33, IT-16146 Genova, Italy
d LBNL, 1 Cyclotron Road, Berkeley, CA 94720, USA
e NIKHEF, Science Park 105, 1098 XG Amsterdam, Netherlands

kruth@physik.uni-bonn.de

Abstract

FE-I4 is the 130 nm ATLAS pixel IC currently under development for upgraded Large Hadron Collider (LHC) luminosities. FE-I4 is based on a low-power analog pixel array and digital architecture concepts tuned to higher hit rates [1]. An integrated Phase Locked Loop (PLL) has been developed that locally generates a clock signal for the 160 Mbit/s output data stream from the 40 MHz bunch crossing reference clock. This block is designed for low power, low area consumption and recovers quickly from loss of lock related to single-event transients in the high radiation environment of the ATLAS pixel detector. After a general introduction to the new FE-I4 pixel front-end chip, this work focuses on the FE-I4 output blocks and on a first PLL prototype test chip submitted in early 2009. The PLL is nominally operated from a 1.2 V supply and consumes 3.84 mW of DC power. Under nominal operating conditions, the control voltage settles to within 2 % of its nominal value in less than 700 ns. The nominal operating frequency for the ring-oscillator based Voltage Controlled Oscillator (VCO) is $f_{VCO} = 640$ MHz.

The last sections deal with a fabricated demonstrator that provides the option of feeding the single-ended 80 MHz output clock of the PLL as a clock signal to a digital test logic block integrated on-chip. The digital logic consists of an eight bit pseudo-random binary sequence generator, an eight bit to ten bit coder and a serializer. It processes data with a speed of 160 Mbit/s. All dynamic signals are driven off-chip by custom-made pseudo-LVDS drivers.

I. INTRODUCTION TO THE NEW PIXEL DETECTOR FRONT-END CHIP

FE-I3 is the pixel detector front-end chip of the current ATLAS experiment at the LHC. Simulations have shown that due to the architecture of this chip, it will suffer from various sources of inefficiency and its performance will degrade significantly with increased LHC luminosities [2]. Furthermore, the sensors of the innermost pixel layers will suffer from severe performance degradation after a few years of operation in the hostile radiation environment close to the interaction point. It is for these reasons that an international collaboration is already working on a new silicon detector front-end chip called FE-I4 suitable for LHC upgrades scheduled for 2013 or later. The first upgrade will be the Insertable B-Layer (IBL). As it imposes complex engineering efforts to disassemble the present detector, a new layer of pixels will be inserted into the present tracker at a radius of $r \approx 3.7 \text{ cm}$. A second upgrade will be a full replacement of the complete tracker using four to five pixel layers between $\approx 3.7 \text{ cm}$ and $\approx 25 \text{ cm}$ together with silicon strips at larger radii in about 2020. FE-I4 is meant to serve for both upgrades. Among its new features are an increased die area $18.8 \text{ mm} \times 20.2 \text{ mm}$ but smaller individual pixels of $50 \mu \text{m} \times 250 \mu \text{m}$. One front-end chip consists of $336 \times 80$ pixels. The active area of the front-end pixel chip has been increased from 75 % to 90 %. In order to fit the clustered nature of physical hits, the new architecture groups four pixels into one digital region with a five deep buffer for local hit storage. The hit processing logic works in a way that not every hit is sent to the periphery of the chip. Instead hits are stored locally in the pixel region until the decision about the relevance of the hit is made. This reduces the traffic on the double column bus by a factor of 400.

FE-I4 will be manufactured in a 130 nm standard CMOS process technology. The thin SiO$_2$ gates of the 130 nm technology node give natural radiation hardness to the transistor devices despite high radiation levels and make the use of enclosed layout transistors no longer a hard requirement which helps to increase the packing density.

The output stages of the FE-I4 are located in the periphery of the chip. The clock signal for the data processing at 160 Mbit/s is locally generated on-chip by a single ring-oscillator based PLL and is used in the FEI4 data output block.

II. PHASE LOCKED LOOP

Figure 1 depicts the block diagram of the PLL with its main building blocks: Phase Frequency Detector (PFD), Charge Pump (CP), Loop Filter (LF), differential VCO, Frequency Divider (FD) and Output Buffers (BUF). The architecture is that of a classic type II charge pump PLL. The advantage of a type II PLL over a type I PLL is that it provides better correction of the PLL output for errors at the input. Additionally the loop gain
and stability properties are set independent of each other and the 
PFD of a type II PLL does not only detect phase mismatch but 
also frequency mismatch [3].

The nominal VCO oscillation frequency is $f_{VCO} = 640$ MHz. At the time the design of the PLL started, it had 
not been decided whether the 160 Mbit/s front-end output data 
will be processed at 160 MHz single-edge or 80 MHz double 
edge. The PLL prototype can provide both clock frequencies 
derived from $f_{VCO}$. Besides, the choice of a higher frequency $f_{VCO}$ 
esases the task of generating lower frequency outputs with 
a clean 50% duty cycle required for double edge data process-
ing. Furthermore, the physical dimensions of the capacitive ele-
ments required in the LF are smaller (cf. Eq. 1) and the devices 
consume less die area. This enables an on-chip integration of the 
complete LF without external components. Due to synergy with 
other projects, the PLL also provides higher frequency clocks 
at $f_{OUT} = 320$ MHz and $f_{OUT} = 640$ MHz. The mentioned 
benefits come at the price of a slightly increased power con-
sumption for the VCO and the high frequency divider stages.

The loop transfer function (neglecting higher order terms) is

$$H(s) = \frac{I_{CP} K_{VCO}}{2\pi C_{notch}} \frac{1 + s R_{natch} C_{natch}}{s^2 + s \frac{I_{CP} K_{VCO} R_{natch}}{2\pi N_{natch}} + \frac{I_{CP} K_{VCO}}{2\pi N_{natch}}}$$

where $I_{CP}$ is the charge pump current (cf. Fig. 3), $K_{VCO}$ is 
the VCO gain, $R_{natch}$ and $C_{natch}$ are loop filter elements (cf. 
Fig. 4) and $N = 16$ is the frequency division factor of the loop.

### A. Phase Frequency Detector and Loss of Lock Detection

The PFD uses a classical architecture with an additional loss 
of lock detection circuitry (see Fig. 2). The loss of lock detection 
latches the DN signal -resp. UP signal- of the PFD output 
with the rising edge of the $f_{FB}$ signal coming from the feedback 
branch of the control loop Fb2Fast -resp. the rising edge of 
the $f_{REF}$ reference clock signal Ref2Fast- delayed by a certain 
time $T$. This delay time $T$ determines the sensitivity of the loss 
of lock detection. A loss of lock resulting in DN = high -resp. 
UP = high- for longer than $T$ (neglecting the propagation delay 
of a D-flipflop) will cause the signal Fb2Fast -resp. the signal 
Ref2Fast- to go high indicating severe changes in $V_{CTRL}$. The value for $T$ has to be chosen large enough in order to prevent 
the loss of lock detection signals to go permanently high due to 
process variations.

### B. Charge Pump

The charge pump uses a differential architecture with a com-
plementary dummy branch (see Fig. 3). Thus the charging and 
the discharging current source provide an almost constant cur-
rent without switching on or off. While the main branch is con-
rolled by the UP and the DN signal coming from the PFD, 
the complementary branch is controlled by UP and DN. The 
inverted signals are delayed by the propagation delay of the in-
verters used. The switching transistors M1 to M4 in the charge 
pump are minimum size devices and thus the charge injected 
into the loop filter upon breaking the current path is minimized. 
As a consequence spikes on $V_{CTRL}$ due to charge injected from 
the transistor channels are reduced [4].

### Figure 1: Schematic block diagram of the PLL.

### Figure 2: Schematic of the phase frequency detector and the loss of lock detection.

### Figure 3: Schematic of the charge pump with its dummy branch.
C. Loop Filter

The first branch of the LF (cf. Fig. 4) with the capacitance \( C_{pole} \) gives a low-pass characteristic to the control loop. However, the control loop is unstable with the associated frequency pole. The second branch of the LF (\( R_{notch}, C_{notch} \)) creates a frequency notch in order to increase the phase margin of the open-loop transfer function. By a rule of thumb \( 10 \times C_{pole} \) should be less than \( C_{notch} \) in order to ensure sufficient phase margin. The third branch of the LF (\( R_{ripple}, C_{ripple} \)) forms another non-dominant frequency pole that filters high frequency noise on \( V_{CTRL} \). The characteristic frequency response of the overall control loop can still be considered a second order system. The sum of all the capacitance values in the LF is \( C_{SUM} \approx 10 \text{ pF} \). All capacitors are vertical natural caps fully integrated on chip. The die area consumption of the PLL core is dominated by these capacitor devices to a large extend.

\[ \]

D. Differential Voltage-Controlled Oscillator

The VCO consists of three inverters connected as a ring oscillator and a fourth inverter that serves as a buffer. The inverters are differential pairs loaded with PFET active loads and cross-coupled stages for rail-to-rail hard switching behavior (see Fig. 5).

\[ \]

E. Frequency Dividers and Output Buffers

The FDs consist of four custom-made divide by two toggle-flipflops. The VCO output frequency of \( f_{VCO} = 640 \text{ MHz} \) is consecutively divided down to 320 MHz, 160 MHz, 80 MHz and finally to 40 MHz equaling a total frequency division factor of \( N = 16 \).

In the output buffering stages, the differential clock signals from the dividing chain are converted to single-ended clock signals. Before the clock signals are sent out of the chip, the lower frequency clock signals are all gated with the 640 MHz clock for clock alignment. It is also possible to disable the lower frequency clocks in order to save dynamic power consumption. The periphery of the test chip includes silicon proven LVDS drivers integrated into the pads that send the dynamic signals off chip [1].

III. INTEGRATED DIGITAL TEST LOGIC

The digital test logic integrated on the fabricated PLL test chip consists of an eight bit pseudo random binary sequence generator, an eight bit ten bit coder and a serializer. The clock signal for the test logic can either be an external clock or the 80 MHz single-ended output of the PLL core. The output data of the serializer is a 160 Mbit/s double data rate bit stream. The integration of the test logic on-chip provides a built-in self-test for the PLL output signal integrity. The test logic implemented resembles a large part of the future FE-I4 data output block.

IV. SIMULATION RESULTS

Figure 6 illustrates the settling of the \( V_{CTRL} \) under 3\( \sigma \) process variations.

\[ \]

The simulation is based on a parasitic extraction of the PLL core with layout parasitic capacitances included. The PLL
$V_{CTRL}$ settles in less than $t_{settle} = 1.5 \mu s$ in all process corners. Under nominal conditions $V_{CTRL}$ settles in $t_{settle} \approx 650 \text{ ns}$ to an accuracy of 2% of its final value. In order to investigate the PLL response to single-event transients, charges of 3 pC in 1.5 ns pulses [5] have been injected into various nodes of the control loop. Figure 7 shows the settling of $V_{CTRL}$ being interrupted by a charge injection at $t = 900 \text{ ns}$ into the very same node that controls the oscillation frequency of the VCO. Furthermore, Fig. 7 sketches the reaction of the loss of lock detection. While $V_{CTRL}$ is rising, the VCO is oscillating too slowly. Consequently the Ref2Fast signal is high, indicating that the reference clock is too fast resp. $f_{VCO}$ is too low. When the charge injection takes place $V_{CTRL}$ drastically increases, speeding-up the VCO and thus the Fb2Fast signal changes to high, indicating that the frequency of the signal coming from the feedback branch is higher than the input reference clock signal.

From noise simulations the VCO phase noise is $-83.3 \text{ dBc/Hz @ 1 MHz}$ offset and the noise is dominated by flicker noise of the bias current sources. The phase noise can be significantly improved to $-90.0 \text{ dBc/Hz @ 1 MHz}$ offset by enlarging the area of the devices in these bias circuits. The enlargement of these devices does not affect the total die area consumption of the PLL core and will be incorporated in future designs.

V. MEASUREMENT RESULTS

Figure 8 shows the PCB designed for the measurements of the PLL demonstrator. The trim potentiometers on the right allow for a flexible adjustment of bias currents and voltages. The input reference clock is fed to the SMA connector at the bottom. Next to the SMA connector on the right, jumpers can be used to enable or disable the different outputs of the test chip. The connection points for the probe heads are located at the top. The demonstrator itself is bonded onto the PCB close to a custom made LVDS transceiver chip that is also bonded onto the PCB in between the SMA connector and the connectors for the probes.

For all measurements, the input reference clock has been supplied by an Agilent 81134A pulser with a jitter rms of 2 ps according to the data sheet. The oscilloscope used in the measurements is a Tektronix TDS5104B 5 GS/s, 1 GHz scope with active differential probes of 1 GHz bandwidth. The equipment used limits the measurement accuracy for signals with frequencies higher than 160 MHz. However, it needs to be kept in mind that the lower frequency clocks are internally generated from the higher frequency clocks. Thus the encouraging results for the lower frequency clocks indicate well functioning higher frequency clocks. As the output clock measurements are performed on the PCB, these measurements always include the performance characteristics of the LVDS drivers integrated into the output pads of the test chip.

Table 1 summarizes the results obtained for the PLL demonstrator. The results have been obtained by triggering the scope on one edge and measuring the time jitter resp. frequency jitter on the consecutive edge (cycle-to-cycle jitter) with the built-in measurement functions of the scope. The duty cycle has also been acquired with the measurement functions of the scope.

Table 1: Measurement data for the PLL operated from a 1.2 V supply.

<table>
<thead>
<tr>
<th>Equipm.</th>
<th>PLL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency [MHz]</td>
<td>40</td>
</tr>
<tr>
<td>Jitter pk-pk [ps]</td>
<td>44</td>
</tr>
<tr>
<td>$\sigma$-Frequency [kHz]</td>
<td>6.5</td>
</tr>
<tr>
<td>$\sigma$-Period [ps]</td>
<td>4.1</td>
</tr>
<tr>
<td>Duty Cycle Deviation [%]</td>
<td>x</td>
</tr>
</tbody>
</table>

Figure 9 shows the eye diagram of a 160 Mbit/s data stream.
sent out by the digital test block. The test logic uses the chip internal single-ended 80 MHz clock output of the PLL core. The shift of the crossing points indicates a deviation of the duty cycle from the ideal 50%. The deviation is attributed to an asymmetry in the circuits behaviour outside the PLL core.

![Image](65x502 to 266x679)

Figure 9: 160 Mbit/s serialized output data stream of the on-chip digital test logic using the PLL 80 MHz clock output.

The opening of the eye diagram is \( \geq 6.0 \) ns on the time axis and 284 mV on the voltage axis. The reduction of signal level on the voltage axis is not related to the PLL characteristics but to signal overshoot due to off-chip impedance mismatch. The tracking range of the VCO is \( 336 \text{ MHz} \leq f_{VCO} \leq 976 \text{ MHz} \). Outside of this range the Fb2Fast -resp. Ref2Fast- signals go to permanent high.

VI. Conclusion

A new ATLAS Front-End chip FE-I4 is being developed in a 130 nm standard CMOS technology for use for upgraded LHC luminosities, both for the Insertable B-Layer project and Super-LHC. FE-I4 is based on a low-power analog pixel array and new digital architecture concepts. After a short introduction to the new features of the FE-I4 chip, the focus is on the output stages. In order to handle the expected hit rate, the front-end will stream data out at 160 Mbit/s. A type-II PLL has been developed to generate the necessary clock signal with a well-defined duty cycle from the available 40 MHz bunch crossing reference clock. The PLL core draws a low current of 3.2 mA from a 1.2 V supply and consumes a die area of only 255 \( \mu \text{m} \times 225 \mu \text{m} \). The VCO of the PLL is based on a three-stage differential ring oscillator working at a nominal frequency of 640 MHz. The design trade-offs involved with the choice of a ring oscillator in terms of area, noise and locking range are discussed. Choosing an oscillation frequency higher than the output frequency for the VCO guarantees a lower area consumption of the LF capacitors and a well-defined duty cycle handling at the expense of slightly increased power consumption for the VCO and the four-stage dividing chain. In the ATLAS experiment, the PLL will be placed in a hostile radiation environment. In case of single-event transients due to severe charge injections, a short settling time to recover from a loss of lock is important. The presented PLL recovers from any given upset in less than 1.5 \( \mu \text{s} \).

A stand-alone PLL test chip has been submitted for fabrication early in 2009. Among its outputs are clock signals with 80 MHz for double edge data transfer and 160 MHz for single edge data stream out at 160 Mbit/s. The differential clock output lines are driven by integrated LVDS drivers. Simulation results as well as performance measurements for this test chip are presented and discussed.

The PLL is equipped with on-chip loss-of-lock detection circuits. Furthermore, the demonstrator includes a digital block for 160 Mbit/s double data rate output streaming, consisting of an eight bit pseudo random binary sequence generator, an eight bit to ten bit coder and a serializer. The integrity of the serialized 160 Mbit/s double data rate bit stream generated by the test logic has been investigated and has been found acceptable. The first prototype of the complete FE-I4 IC is scheduled for tape out at the end of 2009.

REFERENCES


