

# CMS microstrip tracker readout at the SLHC

Imperial College London

#### **OUTLINE**

brief review of LHC strip readout architecture possible architectures for SLHC FE chip power estimates triggering architectures summary

Mark Raymond and Geoff Hall, Imperial College London, UK.

Topical Workshop on Electronics for Particle Physics, Naxos, Greece / September 2008



# **CMS LHC Si strip readout system**



APV25 0.25 μm CMOS FE chip
APV outputs analog samples @ 20 Ms/s
APVMUX interleaves 2 APVs onto 1 line @ 40 MHz
Laser Driver modulates laser current to drive
optical link @ 40 Ms/s / fibre
O/E conversion on FED and digitization
@ ~ 9 bits (effective)

### LHC control / readout chain overview





#### no zero-suppression (sparsification) on detector

all 75,000 APVs operating synchronously (all FE chips doing same thing at same time) advantages

can be emulated externally (APVE) to prevent APV buffer overflows no need to timestamp on front end data volume occupancy independent easy to identify upset chips (digital header)

pedestal, CM subtraction and zero suppression on FED raw data also available for setup, performance monitoring and fault diagnosis

### **SLHC** challenges for CMS tracker

#### power - the big issue

higher luminosity, higher granularity => more FE chips electronics related material dominates material budget (cabling, cooling)

#### triggering

not possible to keep L1 trigger rate at 100 kHz without contribution from tracker

=> new features and existing architectures need re-design

can make best use of advances in:

#### electronics technology

finer feature sizes, lower supply voltages
=> reduced power consumption
but savings depend on any additional FE functionality

#### off-detector link technology

high speed digital, ~ multi - Gbps but more channels so power consumption an issue here =>digitization on front end if want to retain pulse ht. info

will examine pros and cons of different FE chip architectures



# front end chip architectures







existing LHC architecture – APV25

slow 50 nsec CR-RC FE amplifier, analog pipeline, 2.7 mW/channel

peak/deconvolution pipe readout modes

peak mode -> 1 sample -> normal CR-RC pulse shape

deconvolution -> weighted sum of 3 consecutive samples combined to give single BX resolution

all analog approach – not compatible with digital off-detector data transmission moving to SLHC – if want to retain pulse height information – where to digitise?

### "digital APV" architecture



digitization after analog mux => only one ADC per chip, ADC power becomes ~negligible e.g. 6.4 mW (0.13  $\mu$ m, 8 bits) / 128 = 50  $\mu$ W / channel

analog pipeline remains so could retain slow shaping + analog deconvolution approach
pipeline implementation with gate capacitance still possible for 0.13? (probably not for finer processes)

6

**but** rather complicated chip – all the complexity of current APV + more (e.g. sparsification)

### binary architecture – un-sparsified

#### what about binary un-sparsified?

much simpler than "digital APV" particularly for pipeline and readout side

need fast front end and comparator => more power here

but no ADC power and simpler digital functionality will consume less

#### allows retention of features we like

simpler synchronous system
no FE timestamping
data volume known, occupancy independent
(so no trigger-to-trigger variation)



**but** less diagnostics (can measure front end pulse shape on every channel in present system some loss of position resolution, common mode immunity)

binary, un-sparsified is an option we are considering

### front end amplifier power

#### 0.25 $\mu$ m APV25 was designed for long strips ~ 12 – 19 cm (15 – 25 pF)

needed high I/P device g<sub>m</sub> for noise and speed

noise 
$$\propto C_{SENSOR}/\sqrt{g_m}$$
  
risetime  $\propto C_{SENSOR}/g_m$ 

-> led to large IDS = 400  $\mu$ A, for  $g_m$  = 8 mA/V

#### APV25 uses 3 power rails

middle voltage rail introduced to save power at expense of PSU complexity

#### at SLHC

e.g. if strip length ↓ factor 2 (or more)

=>  $C_{SENSOR}$  ↓ factor 2 =>  $g_m$  ↓ factor 4 for same noise

0.13  $\mu m$  simulations ->  $g_m$  ~ 2 mA/V achievable for ~ 100  $\mu A$ 

supply rail halves for 0.13 so factor of 8 power savings in input device possible (over APV25)

can choose to sacrifice some of this gain to simplify PSU system, by going to 2 rail design



### simulated FE amplifier performance

#### **0.13** μm simulation example

### 0.13μm preamp/shaper – 2 supply rails only

for short strips ( $C_{SENSOR} \sim 5$  pF) choose preamp and shaper input device currents (and Rfs) to achieve 50 and 20 nsec CR-RC pulse shapes

| peaking<br>time | 50 ns | 20 ns |   |
|-----------------|-------|-------|---|
| IPRE [uA]       | 40    | 90    | , |
| IPSF [uA]       | 15    | 15    |   |
| ISHA [uA]       | 10    | 30    | , |
| ISSF [uA]       | 35    | 15    |   |
| total [uA]      | 100   | 150   |   |
| power [uW]      | 120   | 180   |   |
| noise [e]       | 800   | 890   |   |

pipe capacitance

=> for short (~few cm) strips can get quite good preamp/shaper noise performance for > factor 5 less than APV (~1 mW) even with only 2 rails



### simulated pulse shapes (C<sub>SENSOR</sub> = 5 pF)



# **SLHC FE chip overall power estimates**

#### APV25 [μW/channel]

| preamp/shaper       | 1050       |
|---------------------|------------|
| inverter            | 500        |
| APSP                | 200        |
| mux & output stages | 550        |
| digital             | <u>400</u> |
|                     | 2700       |

plenty of uncertainty in many of the 0.13µm numbers (simulations, estimates, guesses) (particularly digital consumption) binary (unsparsified) likely to offer least FE chip power

target ~ 500  $\mu W$  / channel for short strip readout chip @ SLHC

#### 0.13 pipeline chip with pulse ht. info – "digital APV"

| preamp/shaper      | 120            | 50 ns shaping, C <sub>DET</sub> ~ 5 pF , simulations |
|--------------------|----------------|------------------------------------------------------|
| pipe readout       | 50             | APV25 / 4 (guess)                                    |
| ADC                | 50             | 1 ADC / chip (ITRS estimate)                         |
| digital            | 120            | (APV25 / 10) x 3 ( /10 for technology, x3 for SEU)   |
| fast serial output | 230            | 30 mW / 128 (guestimate for fast LVDS – maybe)       |
| ·                  | <del>570</del> | could do better with diff. current ?)                |

#### 0.13 binary chip - non-sparsified readout

| preamp/shaper      | 180 | 20 ns, C <sub>DET</sub> ~5pF, fast FE required |
|--------------------|-----|------------------------------------------------|
| comparator         | 20  | simulations                                    |
| digital            | 60  | much simpler than above                        |
| fast serial output | 230 | just guess same as above                       |
|                    | 490 | 5                                              |

# system architectures

system architecture depends a lot on FE chip architecture

data volume determines ratio of FE chips to off-detector link

data volume depends on
 sparsification or not
 pulse height info

sparsification increases complexity of what goes here

e.g. need extra stage of buffering to combine occupancy dependent data volumes in sparsified system

unsparsified simplifies merging architecture

link power / sensor channel depends on no. of FE chips/link



### estimated link power contribution

no. of chips / link depends on estimations of data volume – some details in backup slides

|                                      | link<br>speed         | # of 128 chan.<br>chips/link | power<br>per link | link power/<br>sensor chan. |
|--------------------------------------|-----------------------|------------------------------|-------------------|-----------------------------|
| LHC unsparsified analog              | 0.36 Gb/s (effective) | 2 / analog<br>fibre          | 60 mW             | 230 μW                      |
| SLHC digital APV no sparsification   | 2.5 Gb/s              | 32 / GBT                     | ~ 2W              | 490 μW                      |
| SLHC digital APV with sparsification | 2.5 Gb/s              | 256 / GBT                    | ~ 2W              | 60 μW                       |
| SLHC binary unsparsified             | 2.5 Gb/s              | 128 / GBT                    | ~ 2W              | 120 μW                      |

#### LHC unsparsified analog

230  $\mu$ W / sensor channel: ~ 10% of overall channel budget need to do better at SLHC (e.g. 10% of 0.5 mW = 50  $\mu$ W)

#### **SLHC** digital APV without sparsification not viable

link power contribution too high (no. of channels will increase at SLHC)

#### **SLHC** digital APV with sparsification appears best

**but** can only be achieved with extra buffering between FE chips and link more chips to develop, some additional power

#### **SLHC** binary unsparsified next best

has strong system advantages

### **Triggering**

CMS can't keep trigger rate at 100 kHz at SLHC without P<sub>T</sub> information from tracker major new feature for CMS tracker - ideas how to do it are still developing current assumption is that there will probably be dedicated **PT** layers, providing prompt trigger info i.e. different from more conventional, triggered pipeline chip, layers will summarise a few ideas for triggering layers here



one possible "strawman" layout

X section through one quarter of tracker

### some possible approaches

#### stacked tracking

correlate hits from tracks in closely spaced layers high PT track passes through pixels directly above each other needs separate chip to perform correlation

#### cluster width discrimination

high PT track -> narrow cluster width
basic concepts clear but need to understand issues
associated with practical implementations
(e.g. power, construction, cost, ...)



R-Ф plane, "ideal" barrel tayer

Track momentum discrimination using cluster width in Si strip sensors, *G.Barbagli, F.Palla, G. Parrini,* TWEPP07



Stacked Tracking for CMS at Super-LHC, *J.Jones et al,* 12<sup>th</sup> LHC Workshop, 2006



### possible PT module for inner layer



<sup>\*</sup>http://indico.cern.ch/getFile.py/access?contribId=15&sessionId=2&resId=1&materiaIId=slides&confId=36581



<sup>\*</sup> http://indico.cern.ch/getFile.py/access?contribId=3&sessionId=0&resId=0&materiaIId=0&confId=36580

### summary

a snapshot of where CMS SLHC tracker readout is at the moment – things will change

have started to think about pros and cons of different architectures trade-offs between power, FE chip and system complexity, system robustness, and performance

#### timescales

```
    3 year readout chip development programme about to start year 1: test structures for different sensor options polarity, strip length, DC coupling year 2: full chip prototype year 3: final prototype

need clearer system level definition here e.g. sensor choices, powering scheme – serial/parallel analog/binary, sparsify or not
```

binary, non-sparsified could be preferred for short strip pipeline type readout simpler chip, simpler system frees up resources to tackle ...

#### ... triggering

this is the most challenging aspect of the CMS tracker for SLHC dedicated triggering layers probably the way to go ideas still developing, need further investigation (simulation) could be several more chips to develop here

# extra slides

### data volume calculation details

### LHC unsparsified analog

raw link data bandwidth 9 bits (effective) x 40 Ms/s = 0.36 Gbps

actual triggered data rate = 280 samples per 2 APVs (per data frame) @ 100 kHz (L1 trigger rate) (2 APVs data interleaved at 40 Ms/s on one fibre)

 $= 280 \times 9 \text{ bits } \times 100 \text{ kHz} = 0.25 \text{ Gbps}$ 

so link use efficiency factor ~ 70% (0.25/0.36)



### **SLHC** unsparsified "analog" readout

raw GBT data BW 2.56 Gbps organized as up to 30 x 80 Mbps lanes assume 2 W / GBT

raw data volume per 128 chan.chip for 6 bits ADC @ 100 KHz L1 trigger rate

```
= 128 \times 6 \times 100 \text{ kHz} = 77 \text{ Mbps}
```

=> only 1 chip / GBT lane

=> 32 chips / GBT

 $=> 2 / (128 \times 32) = 490 \mu W / sensor channel$ 

factor ~ 3 higher than LHC figure

actually would be unfeasible to fit 77 Mbps onto 80 Mbps lane

link use factor too high - buffer depth on FE would have to be very deep

would need higher BW link or only 5 bits ADC

### **SLHC** sparsified "analog" readout

```
data volume determined by occupancy (ave. no. of hits above threshold / BX )
     assume 4% occupancy (higher luminosity compensated by higher granularity)
           => 5 hits / 128 channel chip on average
     assume 6 bits ADC for pulse height info
data volume / L1 trigger
     assume each FE chip produces a data packet in response to L1 trigger, comprising:
           8 bits individual chip address
           12 bits timestamp (LHC orbit)
           7 bits channel address + 6 bits ADC value for each hit (13 bits / hit)
           = 85 bits for data packet containing 5 hits
     => average raw data volume per L1 trigger = 85 x 100 kHz = 8.5 Mbps
                      => ~ 8 chip / GBT lane
           => 256 chips / GBT
           => 2 / (128 \times 256) = 61 \mu W / sensor channel
```

but 8.5 Mbps x 8 = 68 Mbps - **85%** of 80 Mbps / GBT lane - rather high use of link BW

### binary, non-sparsified, data volumes

only 1 bit / hit, occupancy irrelevant, this is a significant advantage of not sparsifying raw data volume per L1 trigger, per 128 chan. chip = (128+16) x 100 kHz = **14.4 Mb/s**(16 bits for digital header information – e.g. error bits and triggered pipeline location like APV)

```
=> ~ 4 chip / GBT lane
=> 128 chips / GBT
=> 2 / (128 x 128) = 122 μW / sensor channel
```

14.4 Mbps x 4 = 58 Mbps - only 73 % of 80 Mbps / GBT lane - comfortable use of link BW

# PT module for inner layer(1)



#### use stacked tracking approach - 2 layers

but long pixels: 2.5 mm x 100  $\mu$ m allows wire bonding and easy prototyping

readout chip ideas (see \* for more details)

each chip deals with 2 x 128 channel columns
each column divided into 32 x 4 channel groups
transmit 5 bit group address and 4 bit hit pattern to correlator
provides more info than single channel addresses
can also use cluster width discrimination to reduce valid patterns
1000, 0100, 0010, ... 1100, 0110, ... but not 1110, 0111



#### correlator

compares hit pattern and address from both layers (no address decoding required) if match then shift result off-detector

note: not quite as simple as this
will need extra features to cope with:
hits in adjacent groups
more than one (or two) cluster groups
(should be rare)

<sup>23</sup> 

### PT module for inner layer (2)



<sup>\*</sup>http://indico.cern.ch/getFile.py/access?contribId=15&sessionId=2&resId=1&materiaIId=sIides&confId=36581

### **ADC** power consumption

### ADC Scaling \*

A/D Performance Figure of Merit
 FoM = 2<sup>ENOB</sup> \* f<sub>sample</sub>/P

| Year               | 2003 | 2006 | 2009    | 2012  | 2015 |
|--------------------|------|------|---------|-------|------|
| Tech [nm]          | 130  | 90   | 65      | 45    | 32   |
| FoM<br>[GHz/W]x103 | 0.8  | 1.2  | 1.6-2.5 | 2.5-5 | 4-10 |

From ITRS roadmap 2003

ADC on every channel hard to do

6 bits @ 20 MHz -> 1.6 mW  $(0.13\mu m)$ 

ADC on every chip quite possible

8 bits @ 20 MHz -> 6.4/128 ->  $50 \mu W/chan$ 

**International Technology Roadmap for Semiconductors** (ITRS-2003)

(forecast from the semiconductor industry with 15 year perspective)

based on general considerations (individual architecture dependent)

ADC power given by process, Effective No. Of Bits, conversion frequency and FoM

ADC power @ 20 MHz [mW]

APV25 power 2.7 mW / chan.

|       | 130nm | 65nm |
|-------|-------|------|
| 8bits | 6.4   | 2.5  |
| 6bits | 1.6   | 0.6  |

<sup>★</sup> from A. Marchioro talk at 2<sup>nd</sup> CMS SLHC workshop

### **APV25** power breakdown



(digital ~0.4 mW)

input amplifier power the largest component for APV25 at LHC preamp dominates amplifier power (I/P device current) inverter power not relevant to SLHC

APV25 designed to cope with 2 sensor polarities

| APV25 power breakdown [mW/channel] |             |  |  |
|------------------------------------|-------------|--|--|
| preamp/shaper                      | 1.05        |  |  |
| inverter                           | 0.5         |  |  |
| APSP                               | 0.2         |  |  |
| mux & output stages                | 0.55        |  |  |
| digital                            | 0.4         |  |  |
|                                    | <del></del> |  |  |
|                                    | 2.7         |  |  |

# L=10<sup>34</sup> muon L1 trigger rate

