

rsity of Bergamo eerina



ment of Electrical, Computer omedical Engineering, and Bio

# **Low-Power Clock Distribution Circuits for the Macro Pixel ASIC**

L. Gaioni<sup>1</sup>, F. De Canio<sup>2,3</sup>, M. Manghisoni<sup>1,3</sup>, L. Ratti<sup>2,3</sup>, V. Re<sup>1,3</sup>, G. Traversi<sup>1,3</sup>, A. Marchioro<sup>4</sup>, K. Kloukinas<sup>4</sup>

MPA main characteristics

16x120 matrix featuring a pixel size 16x120 matrix featuring a pixel siz of 1500 μm x 100 μm Designed in a 65 nm CMOS technology Limited power budget (~200 mW) available to carry out the complex

functions integrated in the chip 40 MHz clock operating frequency

Two routing schemes for the clock distribution have been investigated: the column distribution (CD) and the row distribution (RD)

**Low Power Architectures** 

<sup>3</sup> INFN Sezione di Pavia Pavia, Italy



4 CERN European Organizati Nuclear Res Geneva, Switze



## Introduction

The innermost part of the CMS tracker at the HL-LHC is based on a combination of pixelated and short strip sensors, the so-called  $\ensuremath{\textit{Pixel-Strip}}$  (PS)module. The short strip layer is read out as a classical strip Strip (PS)module. The short strip layer is read out as a classical strip detector by means of the Strip Sensor ASIC (SSA), while the pixelated layer is readout by means of the **Macro Pixel ASIC** (MPA), bump bonded to the sensor, as normally done in hybrid pixels. Clock distribution circuits account for a significant fraction of the **power** distributed all over the chip with minimum possible skew. While keeping the skew at a minimum, clock distribution networks waste a significant amount of power. This work reviews different CMOS circuit architectures envisioned for low **power clock distribution** in the MPA. Two main topologies will be discussed, one based on standard Supply voltage, the other on auxiliary, reduced supply. Circuit performance, in terms of powers.

and compared with that relevant to standard CMOS drivers.

## The Macro Pixel ASIC

## 🔺 r16 \*\*\*\*\* ٠ 🔺 r15 \*\*\*\*\* \*\*\*\*\* Row distribution Column distributio



· Review and comparison of different architectures already published in the literature

- Simulations of buffers able to reduce the clock swing to a predetermined value
- Each architecture evaluated in terms of power and propagation delay
- Possible benefits evaluated by a comparison with conventional full swing (FS) buffers, supplied with 1.2 V
- The simulated structures include a clock The simulated structures include a clock driver distributing a 40 MHz clock signal to 16 receivers through a clock line featuring a lumped parasitic capacitance equal to 5 pF (this emulate one column in the CD scheme of the MPA)

#### Charge redistribution



- Vin low:  $C_1 \rightarrow V_{DD}$  and  $C_2 \rightarrow gnd$
- $V_{upper} = V_{lower} = \frac{C_1 + D_D}{C_1 + C_2}$
- Analog receiver: CMOS differential stage with active load followed by two inverters
- Power of the driver: 30 µW (FS buffer 290 µW)

CD scheme for the MPA would consume a huge

The proposed solution features an RD architecture

re a central buffer column distributes the clock

along the matrix and 1 clock line per row distributes the clock to the 120 pixel cells in the row

A study of the optimum number and dimension of repeaters to be placed on the central column (laid out with ultrathick metal M9) has been carried out

investigated, based on CMOS buffers supplied with reduced VDD and on low swing drive

Such solutions are modified versions of the schemes shown in the previous section, and have been chosen taking also into account circuit reliability

We evaluated each architecture in terms of total power consumption (including the contribution from receivers and repeaters) and the maximum skew between pixels. A comparison with conventional full swing buffers (1.2 V) has been

Two possible implementations have been

amount of power due to the large number of

columns

carried out.

Controlled charge time



Output swing controlled by chain (A $\rightarrow$ B) delay, V<sub>th</sub> of MOSFETs, W of the output stage, bus capacitance C

- $V_{\text{OUT\_L}} < V_{\text{th,N}}$  and  $V_{\text{OUT\_H}} > V_{\text{DD}}\text{-}~|V_{\text{th,P}}|$
- The power consumption is about: f  $C_L \; V_{DD} \; V_{SW} \;$  (where  $V_{SW}$  =  $V_{OUT\_H}$  -
- V<sub>OUT\_L</sub>) Delay: 960 ps (FS buffer 975 ps)
- Power of the driver: 92 µW (FS buffer 290 µW)
- A cascade of two inverters can be used for full swing signal recovery

### Drivers with two extra supplies

- Two inverter driver: the second one uses two extra reference voltages (REF\_H, REF\_L) and LVT MOSFETs
- A cascade of two inverters can be used as a receiver, but a differential amplifier can be implemented (if needed)
- Delay: 965 ps (FS buffer 975 ps) Power of the driver: 37 uW (FS buffer 290
- uW) μν., Strong reau me delay reduction of the power dissipated with
- Main issue is to generate (low power) REF\_L and REF\_H





- · The driver limits the interconnect swing from  $|V_{th,P}|$  to  $V_{DD}\mathchar`-V_{t,N}$
- The receiver is composed of a transmission gate
- and a cross-coupled latch circuit
- Transistors P1 and N1 provide positive feedback to completely cut off P2 and N2
- Delay: 700 ps (FS buffer 975 ps)
- Power of the driver: 123 μW (FS buffer 290 μW)

#### Static Reduced-Swing



 This scheme reduces the swing of the driver between V<sub>th,P</sub> and (V<sub>DD</sub>-V<sub>th,N</sub>) Poor rise and fall time Delay: 1150 ps (FS buffer 975 ps) Power of the driver: 110 µW (FS buffer

290 µW)

Delay: 935 ps (FS buffer 975 ps)

- - Reduced supply voltage

Standard CMOS buffers as TXs and RXs, supplied VDDL=800 mV

|               |                            |                             |                      |                                 | 5 6 10                     |  |
|---------------|----------------------------|-----------------------------|----------------------|---------------------------------|----------------------------|--|
|               |                            |                             |                      | M9 (u) Width [µm]               |                            |  |
|               | Skew © rising<br>edge [ps] | Skew © falling<br>edge [ps] | Average Skew<br>[ps] | Total Power<br>Consumption [mW] | Power-Skew<br>Product [fJ] |  |
| 0.8V Solution | 632                        | 660                         | 646                  | 2.34                            | 1512                       |  |
| Low-swing     | 489                        | 636                         | 562                  | 4.34                            | 2441                       |  |
| 1.2V Solution | 435                        | 447                         | 441                  | 5.41                            | 2386                       |  |

wer-Skew

1000

Topical Workshop on Electronics for Particle Physics – Aix-en-Provence, France/ September 22-26, 2014

**Clock distribution architectures for the MPA** 

### Low swing driver

