

### **RD53: Lessons Learned** A verification perspective

Stefano Esposito <stefano.esposito@cern.ch> on behalf of the RD53 collaboration

### Agenda

#### • Section 1: An overview of RD53

- The RD53 Collaboration
- The RD53 Pixel Chip

#### • Section 2: Verification

- Why Verification
- Verification is not Testing
- RD53 Verification

#### • Section 3: Lessons Learned

- What we did well
- · What we could have done better
- General Lessons from RD53



# Section 1 An overview of RD53



#### **The RD53 Collaboration**

| <b>Collaboration board chair:</b><br>Lino Demaria, Torino                    | Interface to experiments: Co-spokespersons<br>Jorgen Christiansen, CERN (CMS),<br>Maurice Garcia-Sciveres, LBNL (ATLAS) | <b>Experiment observers</b><br>Duccio Abbaneo, CERN (CMS) ,<br>Kevin Einsweiler, LBNL (ATLAS)           |  |  |  |
|------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|--|--|--|
| RD53 design framework: Co-ordination: Bari                                   |                                                                                                                         |                                                                                                         |  |  |  |
| Floorplan/integration: Bari                                                  | Digital:<br>RTL, Design flow, P&R, Timing:                                                                              | Serial Powering:<br>Dortmund, ITAINNOVA                                                                 |  |  |  |
| Analog front-ends:<br>CMS/linear: Bergamo/Pavia:<br>ATLAS/differential: LBNL | Torino, Pisa, CERN, LPNHE (Paris)<br>Simulation & Verification:<br>CERN, Bergen, Oxford<br>Design for testability:      | IPs: Support and possible updates<br>Current DAC: Bari<br>Voltage DAC: Prague                           |  |  |  |
| Monitoring: CPPM                                                             | Bari<br>SEU/SET simulations:                                                                                            | Bandgaps: Bergamo<br>ADC, mux, temp, radiation: CPPM<br>PLL & serializer: Bonn                          |  |  |  |
| IO Pad frame: Bonn                                                           | CERN, Seville                                                                                                           | Differential IO: Bergamo/Pavia<br>Power on reset: Seville<br>Ring oscillator: LAL<br>Analog buffer: RAL |  |  |  |

#### Testing

Organization: ATLAS: LBNL, CMS: CERN Many RD53 and ATLAS/CMS groups: LBNL, Bonn, Oxford, CERN, CPPM, LAL, Torino, Aragon, ETH, Florence, Zurich , , , RD53 test systems: YARR (LBNL), BDAQ53 (Bonn)



### **The RD53 Pixel Chip**

#### **Requirements**

| Parameter           | Value (ATLAS/CMS)                                             |
|---------------------|---------------------------------------------------------------|
| Max Hit Rate        | 3 GHz/cm <sup>2</sup> (12 GHz/chip)                           |
| Trigger Rate        | 1 MHz / 750 kHz                                               |
| Trigger Latency     | <b>12.5</b> μs                                                |
| Pixel size (chip)   | <b>50x50</b> μm²                                              |
| Pixel size (sensor) | <b>50x50</b> μm <sup>2</sup> or <b>25x100</b> μm <sup>2</sup> |
| Pixel array         | 400 x 384 pixels / 432 x 336 pixels                           |
| Chip dimensions     | 20 x 21 mm <sup>2</sup> / 21.6x18.6 mm <sup>2</sup>           |
| Min threshold       | 1000 e-                                                       |
| Radiation Tolerance | 1 Grad                                                        |
| Power delivery      | Serial powering                                               |
| Power               | < 1 W/cm <sup>2</sup>                                         |
| SEE tolerance       | SEU rate, innermost ~100 Hz/chip                              |

10/23/2023

#### Generations

| RD53A | <ul> <li>Demonstrator Chip – Half size</li> <li>Submitted in 2017</li> </ul>                                                                   |
|-------|------------------------------------------------------------------------------------------------------------------------------------------------|
| RD53B | <ul> <li>Testing Chip</li> <li>ItkPix v1 submitted 3/2020</li> <li>ItkPix v1.1 submitted 10/2020</li> <li>CROC v1 submitted 06/2021</li> </ul> |
| RD53C | <ul> <li>Final Chip</li> <li>ItkPix v2 submitted 3/2023</li> <li>CROC v2 submitted 10/2023</li> </ul>                                          |



### **The RD53 Pixel Chip**

#### **Requirements**

| Parameter           | Value (ATLAS/CMS)                                             |
|---------------------|---------------------------------------------------------------|
| Max Hit Rate        | 3 GHz/cm <sup>2</sup> (12 GHz/chip)                           |
| Trigger Rate        | 1 MHz / 750 kHz                                               |
| Trigger Latency     | <b>12.5</b> μs                                                |
| Pixel size (chip)   | <b>50x50</b> μm²                                              |
| Pixel size (sensor) | <b>50x50</b> μm <sup>2</sup> or <b>25x100</b> μm <sup>2</sup> |
| Pixel array         | 400 x 384 pixels / 432 x 336 pixels                           |
| Chip dimensions     | 20 x 21 mm <sup>2</sup> / 21.6x18.6 mm <sup>2</sup>           |
| Min threshold       | 1000 e-                                                       |
| Radiation Tolerance | 1 Grad                                                        |
| Power delivery      | Serial powering                                               |
| Power               | < 1 W/cm <sup>2</sup>                                         |
| SEE tolerance       | SEU rate, innermost ~100 Hz/chip                              |

10/23/2023

#### Generations



























Digital Core (8x8 pixels)



Hit LE

BxId-Latency BXId

/...

Memory cell

n

latency buffer

Data Trigger

Memory cell

















# Section 2 Verification



### What is Verification

- Design activity to prove correctness
  - Verification is a resource limited quest to find as many bugs as possible before shipping
- Hard problem
  - How to prove absence of bugs?



Chip Design and Manufacturing Cost under Different Process Nodes: Data Source from IBS\*



## **Why Verification**

- Reduce schedule risk
  - Silicon respin takes time
- Reduce financial risk
  - Masks cost millions
- First time silicon is the goal
  - Verification finds bugs before it is too late



Source: Wilson Research Group and Siemens EDA, 2022 Functional Verification Study Unrestricted [© Siemens 2022] Functional Verification Study



## Why don't we just do more testing?

- Verification is performed on the design
- Testing is perfomed on the product
- Debug silicon is much harder than debug code
  - Rootcausing a bug in simulation takes days at most
  - Rootcausing a bug in silicon takes weeks, if possible at all
- Complexity argues against this approach

|           | Logic<br>Gates | FF         | Transistors<br>(approx.) |
|-----------|----------------|------------|--------------------------|
| Matrix    | 56,389,284     | 10,523,520 | 601,423,704              |
| Periphery | 5,597,232      | 825,491    | 54,220,667               |
| Total     | 61,986,516     | 11,349,011 | 655,644,371              |
|           |                |            |                          |

**RD53C Gates and Transistors counts** 



### Why don't we just do more testing?

- A bug found in silicon costs
  - Redo masks \$\$\$
  - Wait again for the wafers to be ready
- The smaller the node, the higher the cost
- More functionality with same area require smaller nodes





### **RD53 Verification**

- Started with architectural exploration framework
  - Readapted for RD53B verification
- RD53C used a new approach
  - Unified verification methodology
  - Metric driven verification
- Complex Software design
  - Must interface with simulated hardware
  - Must consider HW design constraints
  - Translate from cycle-accurate simulation to transaction-level simulation





### The good of RD53 Verification

#### **Discovered some nasty bugs** •

- Hit sampling issue causing 50% dead time •
- SEU vulnerability causing unacceptable rates of • chip stuck
  - Chip stuck: a chip that doesn't send any more • data until soft-reset

#### **Avoided dangerous regressions**

- Regression: introduction of a new bug while • adding a new feature or fixing a different bug
- Reorganization of the DM feature during RD53B • to RD53C transition suppressed all data in a commonly used configuration

10/23/2023





### The good of RD53 Verification

#### Allowed extensive simulation campaigns

- Sign-off simulations include more than 15k runs
  - Including SEE simulations
- Verification requires resources

| Туре            | Number of simulations |
|-----------------|-----------------------|
| RTL simulations | 4442                  |
| GL simulations  | 9030                  |
| SEU simulations | 3028                  |
| Total           | 16500                 |



### The bad of RD53 Verification

#### Precise reference model

- Required lot of effort
- Lack of design documents made it very hard to achieve

#### Late introduction of SEE simulations

- Required re-adapting parts of the verification environment
- Due to organizational issues

#### Lack of manpower

- Most of the effort for RD53C was a 1-person effort
- Key people left the project after RD53B first submission (03/2020)



# Section 3 Lessons Learned





### What we did well

#### Effective Simulation found bugs impossible to find on silicon

- SEU issue caused chip-stuck
  - Impossible to find root cause in beam testing
- Hit sampling issue cause 50% deadtime
  - Finding the issue in testing would have required extensive calibration injection campaigns
  - Finding the root cause would have been impossible
- Reset propagation issue
  - Hard to identify in testing
  - Impossible to find root cause



### What we could have done better

#### • Project reviews alone are not sufficient

- Need for technical rolling reviews by specialists
  - Would have found better ways to implement complex parts of the verification environment
  - Would have found better ways to implement some complex hardware modules

#### Avoid Single point of failure in teams

- If the team is one person, them leaving is a disaster
- Collaborations should schedule around key people being not easy to replace
- Documentation should be required
  - And its quality should be evaluated by specialists during project reviews

~1y delay between RD53B - and RD53C due to people leaving



### **General Lessons from RD53 Verification**

#### • Verification is a complex problem

- Finding a bug requires much more time and effort than writing one
- Need for stable and expert teams
- Verification effort must start with the project
  - Verification as a "panic" issue brings more issues
- First-time silicon is the goal



### **General Lessons from RD53 Verification**

- Verification as last-step of the project is bad
  - Designs should be made considering needs of the verification effort
    - Making verification easier means better products with less delays
  - Verification team should be involved as early as possible
    - Requirements refinement
    - Architectural specification
- No matter the effort, bugs can escape



Source: Wilson Research Group and Slemens EDA, 2022 Functional Verification Study Unrestricted |© Slemens 2022 | Functional Verification Study





home.cern

Backup



### **ASIC Verification 101**

- Stimuli Generation
  - Constrained randomization
- Checkers
  - Reference Model to predict
  - Scoreboard to check
- Metrics
  - How good is verification?
  - Verification Goal





### **Requirements and Specifications Matter**

#### • Any verification effort starts with requirements

- Verification Engineers should be involved in requirements refinement
- Specifications are key inputs for verification (and design)
  - Verification engineers rely on specifications to define the verification plan

#### Documentation is tradition

- People come and go
- Documentation stays



### The importance of being a Design Document

#### • Design documents must be limited in scope

- One takes it all makes it hard to maintain
- Broad scope Chip Manual documents are good but bad
  - Good for users
  - Bad for design and verification
  - Always out-of-date
- Design Documents and Manual should stay separate

