

Alessandro Caratelli

on behalf of the CMS Tracker ASICs designers team

# Requirements for the high luminosity tracker upgrade [1]

SYESTEM STUDIES

## Phase-2 upgrade tracker requirements:

- Higher luminosity
- From 20 to 200 pileup events per BX
- Increase radiation tolerance
- Reduced material budget
- Participate in the L1 trigger
- Improve trigger performance



- Increase granularity
- Introduction of a pixelated sensor in OT
- Radiation tolerance up to 100 Mrad
- Quick and on-chip particle discrimination
- Higher trigger rate (1MHz) and longer latency (12.5  $\mu$ s)
- Power density < 100 mW/cm<sup>2</sup>
- Add tracking information to the Level-1 trigger decision



# A novel particle detector electronic system

SYESTEM STUDIES

The outer tracker detector can provide for every event additional information for the trigger decision leading to a significant improvement of the particle recognition efficiency



The complete real time tracker readout is not feasible



HOW? The readout electronics in the detector can send pre-selected information for the Level-1 event reconstruction





**Intelligent pixel particle detector** capable to locally self-select interesting signatures of particles interesting for the physics, without relying on an external trigger system



This approach allows for a significant data reduction efficiency improvement

Detector capable of providing particle transverse momentum information in addition to simple geometrical positioning and energy measurements

# An intelligent particle tracking system based on p<sub>T</sub> discrimination



# The CMS Outer Tracker [1, 2]



[1] CMS collaboration. "The phase-2 upgrade of the CMS tracker." CMS-TDR-014 (2017).

[2] Abbaneo, Duccio. "Upgrade of the CMS Tracker with tracking trigger." Journal of Instrumentation 6.12 (2011): C12065.

CONCLUSIONS

TEST RESULTS

# The Pixel-Strip module



13296 Modules
44 Mstrips + 174 Mpixels
200m<sup>2</sup> of silicon area



16 x SSA ASICs (Strip ROC) [3]

16xMPA (Pixel ROC + stub finding) [4]



[3] Caratelli, Alessandro, et al. Characterization of the first prototype of the Silicon-Strip readout ASIC (SSA). No. CMS-CR-2018-286. 2018.

[4] Ceresa, Davide, et al. Characterization of the MPA prototype, a 65 nm pixel readout ASIC with on-chip quick transverse momentum discrimination capabilities. No. CMS-CR-2018-279. 2018.

[5] Moreira, Paulo. "The LpGBT project status and overview." ACES. 2016.

[6] Nodari, Benedetta, et al. A 65 nm data concentration ASIC for the CMS outer tracker detector upgrade at HL-LHC. No. CMS-CR-2018-278. 2018.

# MPA and SSA ASICs: system level architecture choices

SYSTEM STUDIES

### Initial open design choices:

- Define which functionality are implemented in the SSA and which in the MPA
- Minimise system power requirements
- Minimize bandwidth requirements
- Maximize the particle recognition efficiency
- Bandwidth among ASICs
- Data encoding
- Transmission FIFOs depth
- Data compression
- Particle hit clustering at SSA level
- Several others

Functionality and the efficiency depends on physics statistics, particle rates and hit occupancy (no simple test vectors)

Becomes necessary a Simulation framework capable of providing:

System Studies and performances evaluation

**Design Verification** 

- **Study and compare** different system implementation
- Evaluate tradeoff between performances and power optimization
- Report efficiency parameters by comparison with a system reference model
- Evaluate the efficiency of the particle recognition and of the data readout
- Realistic stimuli generation from Monte-Carlo simulations of complex interactions in high-energy particle collisions

# MPA and SSA ASICs: system level architecture choices

SYSTEM STUDIES

### Initial open design choices:

- Define which functionality are implemented in the SSA and which in the MPA
- Minimise system power requirements
- Minimize bandwidth requirements
- Maximize the particle recognition efficiency
- Bandwidth among ASICs
- Data encoding
- Transmission FIFOs depth
- Data compression
- Particle hit clustering at SSA level
- Several others

Functionality and the efficiency depends on physics statistics, particle rates and hit occupancy (no simple test vectors)

Becomes necessary a Simulation framework capable of providing:

System Studies and performances evaluation

**Design Verification** 

- Verify the RTL implementation and the chip-set functionalities
- Generation of realistic activity information for precise power analysis
- Verify post-layout netlist
- Verify at clock-cycle level precision the subsystems integration and the communication among modules the ASICs

# System level simulation framework [7]

Implemented in: SystemVerilog HDL / UVM + Python



[7] Caratelli, Alessandro, et al. "System Level simulation framework for the ASICs development of a novel particle physics detector." 2018 14th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME). IEEE, 2018.

1.6 Gb/s - latency: < 500 ns

# System architecture definition [8]

SYSTEM STUDIES



[8] A. Caratelli, D. Ceresa, S. Kloukinas, S. Scarfi et al. Readout architecture for the Pixel-Strip module of the CMS Outer Tracker Phase-2 upgrade. No. CMS-CR-2016-405.

320 Mb/s



[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfì "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.

SYSTEM STUDIES



- Periphery operating at bunch crossing event rate Periphery at 320MHz Clustering Coordinate Centroids Parallax Data Stub Data encoding and Serialiser Logic extraction formatting correction 2.56 Gb/s (8 stubs) ordering Control and configuration
  - Fast combinatorial **clustering** at event rate to limit cross-talk effect
  - Wide clusters represents not interesting events: are filtered to optimize bandwidth and processing power.
  - Correct the parallax error of approximating the cylindrical geometry with planar pixel-strip sensors.
  - Up to 8 Cluster Centroids coordinates are transmitted per every event to the MPA coincidence logic for the Stub generation and the transverse momentum discrimination

[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfì "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.



[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfi "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.



[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfì "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.

# System architecture definition



[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfì "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.



[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfì "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.



[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfì "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.



[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfi "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.



[9] D. Ceresa, A. Caratelli, G. Bergamin, J. Kaplon, K. Kloukinas, S. Scarfì "MPA-SSA, design and test of a 65nm ASIC-based system for particle tracking at HL-LHC featuring on-chip particle discrimination." 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2019.

# Design for Testability

## Memory Built-In-Self-Test

- Test full memory functionality in <10 ms</li>
- Results saved in internal registers accessible via slow-control
- Few additional hardware self contained hierarchical block
- Clock gating during normal operation (only leakage power)

# **Periphery Scan Chain**

- FSM Easy to access standardized approach with TRL
- 92% of fault coverage in SSA (300ms)
- ~95% of fault coverage and 25k ff in MPA (750ms)





# Logic Built-In-Self-Test for pixel array

- FSM embedded in Pixel Array logic and vectors from configuration
- Requires compression / decompression logic
- ~90% coverage

# Power Reduction Methodology

## Power optimization

- Clock gating in all configuration registers and logic
- Architecture studies to minimize power consumption
- Use Multi-VT standard cells
- Use gated SRAM blocks
- Multi-supply voltage (1.0V 1.2V)
- Find power hungry and low activity blocks and optimize their implementation



## Power study:

- Static and Dynamic power analysis
- Voltage drop analysis on different scenario
- Power-Grid-View (PGV) for Macro



# Total Ionizing Dose effects hardening

### Digital domain:

- 9-tracks library selected as compromise between power consumption and radiation tolerance considering the operating range -40°C / 0°C
- Characterization of the digital cells parameters (prop. delay, transition time, setup/hold, etc. ) for radiation corner
- Increased margins for TID degradation (setup uncertainty jitter + additional 8% of clock period reduced max transition derate factors)
- Due to narrow channel effects → Removed minimum-width cells (D0, D1) and delay elements
- Only thin-oxide devices → 1.2V max (CMOS IO and SLVS)
- Custom ESD structures latch-up resistant

### Memories:

- A custom memory compiler allowed to generate a SRAM with cell transistor featuring nMOS W > 200 nm pMOS W > 500 nm
- Protection against latch-up is reached by placing p<sup>+</sup> guard bands between n<sup>-</sup> regions.

### Usage of ELT devices in input stage:

- To prevent the radiation induced drain-to-source leakage current increase due to the charge trapped in the shallow trench isolations (STI).
- To mitigate the 1/f noise increase on irradiated devices due to side-effects of the STI region in nMOS operated at low drain current.

# Digital library choice and delay corner comparison

- Supply voltage scaling
- 9 tracks library chosen as compromise between power consumption and radiation tolerance
- Temperature inversion effect prevent the SSA from using a high-Vt library cells at 0.9V.
- Mix of standard-Vt and low-Vt digital cells at 1.0V+10% as compromise of power consumption, memory operation and propagation delay at  $-40^{\circ}$ C





# Single Event Effects tolerance

### **State machines**

- Triple module redundancy (FULL)
- Triplicated Clock-trees
- Triplicated Reset distribution
- FF minimum distance 15um

### Latch FIFOs

- Control and header fields triplicated
- Data latches not protected

### Data pipeline

 No SEU protection applied due to limited power budget

### Clock tree

- Clock tree triplicated
- The non-triplicated logic uses the voted clock in critical areas
- The non-triplicated logic uses one of the branches in non-critical areas:
  - Simplify scan-chain insertion
  - Helps in reducing buffering for hold fix (power)
  - Allow for CPPR on the 3 branches

## Triplicated pads for

- ClockControl
- ResetScan-Chain IOs

### Configuration registers

- Triple module redundancy with error detection and self-correction
- Clock enabled only during
  - o asynchronous readout operation,
  - configuration operations
  - self-correction

### Glitch filters

- Reset inputs
- TEST-MODE signal
- Scan-chain TEST POINTS control (on the control of the system clock / test clock selection multiplexers)

# Single Event Effects tolerance

### Physical implementation

- Use of instance space groups among triplicated registers
- Avoid logic simplification by synthesis and P&R flow
- Spacing for clock and reset:
  - After CTS locate the critical cells and impose a minimum distance
  - procede in successive ECO placements and ECO routing steps



### **Functional simulation**

- System Verilog UVC for randomize the injection (constrained from the specific test case)
- The randomization is constrained accordingly: Error probability, average SEE rate, minimum time split, etc..
- Injection of single event effects in multiple ASICs at the same time to evaluate the consequences that SEE in an ASIC have on the other ASICs part of the chipset
- Possibility to focus the SEU injection on particular module or subsystem and evaluate the effect at system level
- Possibility to inject SEU in hundred of cells per clock cycle (register grouped in non-interacting categories)

### Additional checks

- Script to verify that no triplicated instance is optimized out
- Script to verify placement constraints after chip assembly

CONCLUSIONS

# Physical Implementation flow



- Digital-on-top design flow
- Hierarchical implementation
- Multi supply voltage 1.0 V ± 10% 1.2 V ± 10%
- 3 independent power and ground domains to reduce noise coupling with guard-ring isolation
- Multi-Vt design (Low-Vt used only in critical timing arcs)
- C4 bump floorplan + wirebond for wafer probing
- Complex CTS and timing closure due to triple clock tree balancing and SEU hardening
- Constraints for TMR and digital cells placement
- Skew balancing among triplicated and voted clock trees
- Strip cell sampling clock guarantees <200ps skew in all corners</li>
- Non-default CTS rules to mitigate cross-coupling
- QRC extracted information already at the optimization stage due to design size

# MPA-SSA-CIC Timeline



# The ASICs







# ASICs testing

- The SSA, the MPA and the CIC were produced in a full mask-set engineering run
- The first 6 wafers have been tested at wafer level
- Test routine includes:
  - Scan-chain test for production defects
  - Functional test of digital circuits

- Analog bias parameter caracterization
- Front-end caracterization

- Noise analysis
- Serial ID and trimming in e-fuses
- The wafer have been diced and the chip bonded on carrier boards for radiation tests and detailed cractarization







**TEST RESULTS CONCLUSIONS** INTRODUCTION SYSTEM STUDIES ASICS ARCHITECTURE **DESIGN METHODOLOGY** 

## SSA test results

#### 0.08 Shaper pulse reconstruction Wafer 1 9000 chips tested mean = 55.02 mV/fCWafer 2 Wafer 3 0.07 by injecting a known charge via the SSA calibration system and acting on mean = 55.23 mV/fCWafer 6 threshold, calibration pulse delay-line and clock deskewing DLL. 0.06 1.25 mV/fC Peaking Time = 19.3 ns distribution 0.05 mean = 54.36 mV/fCstd = 1.29 mV/fC0.04 -20Normalize 80.0 0.02 Peaking time < 25ns Threshold Voltage [mV] $\overline{OFS_{FE}} = 5.54mV$ with linear behavior and 0.01 up to 8fC 0.00 Normalised distribution Normal 50 FE Gain Mean [mV/fC] -80Gain and offset distributions ₹ 1.29 fC -100for a non-trimmed SSA ASIC 1.72 fC 2.15 fC 0.1 2.58 fC -1200.0 70 10 20 60 10 Front-End Offset [mv]

Time [ns]

### SSA Threshold trimming



Trimming performed at 2.0 fC.
Threshold spread evaluate for 2.0 fC and 1.25fC

## SSA Threshold distribution after trimming



### SSA Analog Front-End Noise vs Temperature



### SSA Front-end noise measurements



Channel input noise evaluated as the standard deviation of the error function fitting the S-Curves (1.25 fC and 2.0 fC)

# Temperature Characterization summary

No errors or timing issues observed on digital logic



- No errors or issues observed in analog FE
- Bias structures variation within compensating range
- FE noise change within expectation

- Full set of digital functionalities tests
- Tests of memories (with BIST) and configuration
- Characterization of all bias parameters
- S-Curve for FE Gain, Noise and Trimming
- ADC, E-Fuses, Voltage swipes and several others



### Memory Built-In-Self-Test

- Test full memory functionality in <1 ms</li>
- Results saved in internal registers accessible via slow-control
- Clock gating during normal operation (only leakage power)



### Scan Chain

- 92% of fault coverage in SSA ASIC
- Custom approach for triplicated design
- SHIFT, RESET and CAPTURE tests
- A total of ~950 test vectors required
- Full test duration < 300 ms
- Scan-chain in SSA operates correctly up to 20MHz



CONCLUSIONS INTRODUCTION ASICS ARCHITECTURE **DESIGN METHODOLOGY TEST RESULTS** SYSTEM STUDIES

## MPA-SSA test results

### SSA → MPA Communication

- No phase aligner at MPA input due to power restrictions
- The communication rely on precise design of the timing
- SSA-MPA communication timing was verified in static timing analysis and simulated post-layout in all cross-corner combinations (UVM verification environment)













# Total Ionizing Dose characterization

# X-ray TID Characterization summary

- 4 chips have been irradiated up to 200 Mrad and 1 chip up to 350 Mrad
- No errors or timing issues observed on digital logic



- Bias structures variation within compensating range
- FE noise change within expectation
- ADC reference voltage variation larger then expected:
  - Needed changing the target reference voltage to keep stable the DAC output up to 200Mrad.

### **TID Test routine:**

- Full set of digital functionalities tests
- Tests of memories (with BIST) and configuration
- Characterization of all bias parameters
- S-Curve for FE Gain, Noise and Trimming
- ADC, E-Fuses, Voltage swipes and several others



# SSA Front-End equivalent noise evolution with TID and temperature

SSA 2.1 average FE noise\* vs TID at -10°C



SSA 2.1 average FE noise\* vs TID at +20°C



<sup>\*</sup> FE noise evaluated on the S-Curves – 2 fC internal charge injection – Sensor inputs floating

# SSA Single-Event Effect tests with heavy ions

SYSTEM STUDIES

SEE testing carried out in UCL at Louvain-la-Neuve, Belgium

### No hard errors observed

- No loss of control observed
- No loss of synchronisation observed
- No chip looks or control errors in general

# Configuration system error-free

- Verified by readout and comparison of full chip configuration at each test iteration (30 seconds)
- SEU correction counter monitoring

Limit cross section for 10h test  $\sim 5.10^{-9}$  cm<sup>2</sup>



# SSA Single-Event Effect tests with heavy ions

### Stub and L1 data SEE cross-section:



### Bit error rate estimation

(based on OT fluxes from FLUKA simulation)

|                             | Maximum SSA bit-error rate expected    |
|-----------------------------|----------------------------------------|
| Stub data                   | 1.2·10 <sup>-11</sup> BX <sup>-1</sup> |
| L1 data ( 12.6 us latency ) | $1.0 \cdot 10^{-11} \text{ BX}^{-1}$   |
| L1 data ( 2.5 us latency )  | $9.7 \cdot 10^{-11} \text{ BX}^{-1}$   |



# SSA Wafer Probing - process and analog performance

SYSTEM STUDIES

## Ring oscillators Frequency



The SSA includes different types of ring oscillator to monitor variations in: Process – Temperature – Total Ionizing Dose

### **FE Noise Performance Tests**



- Map of the average FE noise
- Cut criteria noise < 1.7 LSB

## FE Threshold Trimming



- Map of the threshold spread after the trimming procedure
- Cut criteria std(Th) < 0.5 LSB

# SSA Wafer Probing - yield

SYSTEM STUDIES

## Digital Tests summary map



## Analog Tests summary map



### Total yield map



- Stub data [0.9V, 1.0V, 1.1 V]
- L1 data [0.9V, 1.0V, 1.1 V]
- Memory BIST [0.8V, 1.0V, 1.2 V]
- Configuration and all other digital functionalities

- Analog bias calibration
- FE functionality
- FE Threshold trimming
- Noise analysis

Overall yield (all tests) > 95%

# Summary

- System-level studies allowed to define the architecture of the PS-Module ASICs
- After testing the prototypes, the final version of the ASICs (MPA2, SSA2 and CIC2) have been submit to production in a full mask-set engineering run.
- The first wafers arrived at CERN, have been diced and tested
- The tests on the final version of the chips show results in agreement with the expectation
  - Front-end performances fulfil specifications
  - X-Ray TID test confirms radiation harness up to 200 Mrad
  - Heavy Ion test confirms the functionality of the chosen hardening strategy
  - o Climatic chamber tests shows a parameter variation within the calibration range
- Wafer-level testing show a high yield, which allow us to move to the next step of ordering the production wafer and define the automated production test procedure.

# Summary



## CIC2 ASIC

# L. Caponetto, G. Galbit, S. Scarfì, B. Nodari, S. Viret, A. Caratelli, D. Ceresa (IP2I Lyon university)



## SSA2 ASIC

### **DESIGNED AND TESTED BY:**

A. Caratelli, G. Bergamin, D. Ceresa, J. Kaplon, K. Kloukinas, C. Nedergaard, S. Scarfi



### MPA2 ASIC

### **DESIGNED AND TESTED BY:**

D. Ceresa, G. Bergamin, A. Caratelli, J. Kaplon, K. Kloukinas, A. Nookala, S. Scarfi (CERN EP-ESE)



Total of: ~ 185 000 chips