# Reliability studies on uQDS, PDSU and PDSU-BIS interface for the IT protection

L. Felsberger, D. Westermann, D. Wollmann

Acknowledgements to R. Denz, C. Martin, T. Podzorny, I. Romera Ramirez, J. Steckert, J. Uythoven



### Introduction

#### Universal Quench Detection System (uQDS):

- Detect magnet quench
- Trigger PDSU
- Trigger FPA loop, Diagnostics

#### **Protection Device Supervision Unit (PDSU):**

- (Re-)Trigger magnet protection systems
- Trigger beam dump
- Detect spurious magnet protection firing
- Trigger FPA loop, Diagnostics

#### Beam Interlock System (BIS):

- Transmit beam dump request
- Diagnostics

#### Main failure modes:

- Missed magnet protection and beam dump (target for LHC systems 1 in 1000 years)
- Spurious magnet protection and beam dump (target for LHC 1 in 1 year)



 $\rightarrow$  Reliability analysis crucial

### **Reliability Analysis Methodology**

Risk identification and quantification

Top-Level Failure Modes, Effects and Criticality Analysis (FMECA)

• Identify system, functions, associated risks and hazards and possible end-effects

#### Accelerator Risk Matrix

• Quantify reliability requirements to mitigate risks and hazards

#### Top-Down reliability model

• Capture system structure, redundancies, critical/non-critical parts, demand, inspection rates

Risk estimation

and mitigation

- Component-Level FMECA
  - Analyse detailed sub-system design to identify their failure probabilities for each end-effect

#### $\rightarrow$ Design qualification

 Feed results from Component-Level FMECA into Top-Down FTA to qualify design or require design improvements











• Magnet quench







Magnet quenchuQDS detection







- Magnet quench
- uQDS detection
- 6 PDSUs triggered







- Magnet quench
- uQDS detection
- 6 PDSUs
  triggered
- Beam dump & magnet protection activated







- Magnet quench
- uQDS detection
- 6 PDSUs
  triggered
- Beam dump & magnet protection activated
- PC stopped (beam dump via <u>PIC not fast</u> <u>enough</u>)















See also talks by <u>C.</u> Hernalsteens and T. Podzorny

enough)



Tunne

USC/UJ

16 x HDS

8 x HDS

8 x HDS

16 x HDS

Courtesy of J. Spasic





HL Annual Meeting, Genova, 10/10/2024, Lukas Felsberger

### **Top-Down Reliability Models**

### **Magnet Protection**

### **Beam Dump/Spurious Firing**



Magnet protection model ignores beam dump functionality (covered in spurious firing model) Spurious firing model ignores magnet protection functionality (covered in magnet protection model)



### uQDS & PDSU Hardware

 uQDS and PDSU share designs of hardware modules







### **uQDS & PDSU Hardware**

 uQDS and PDSU share designs of hardware modules





## uQDS & PDSU Hardware

 uQDS and PDSU share designs of hardware modules





#### 20.0 FITS/channel (to uQDS) Q1,Q2,Q3,BB,SCL

HDS/CLIQ CT/IFS/interface box



# uQDS & <u>PDSU</u> FMECA

- uQDS and PDSU share designs of hardware modules
- Detailed FMECA carried out for
  - Analogue monitoring channels (similar between uQDS and PDSU)
  - Digital Platform (identical between uQDS and PDSU)
  - Approximate (pessimistic) FMECA for other modules & interfaces
- Relevant failure mode types for magnet protection & beam dump
  - Blind unsafe failure (detected upon commissioning or demand)
  - Blind unsafe failure (detected every fill/ramp)
  - Detected unsafe failure (visible in supervision)

BIS

PDSU

Heaters + CLIQ

uQDS

#### FITS: Failures in 10^9 hours (~10^5 years)

#### FITS: Failures in 10<sup>9</sup> hours (~10<sup>5</sup> years) PDSU uQDS Voltage taps, IFS, patch panel aters + CLIQ 13.8 FITS/channel 2.Q3.BB.SCL (coil) 0 FITS per channel (coils) Monitoring Monitoring Channel Channel 11.4 FITS per channel (coils) 4.0 FITS (coils) 9.9 FITS/path (to PDSU)

- uQDS and PDSU share designs of • hardware modules
- **Detailed FMECA carried out for** •
  - Analogue monitoring channels (similar between uQDS and PDSU)
  - Digital Platform (identical between uQDS and PDSU) •
  - Approximate (pessimistic) FMECA for other modules & • interfaces
- **Relevant failure mode types for magnet** • protection & beam dump
  - Blind unsafe failure (detected upon commissioning or demand) •
  - Blind unsafe failure (detected every fill/ramp) ٠
  - Detected unsafe failure (visible in supervision) •





## **uQDS & PDSU FMECA**

- **Component failure rate source is 217+ electronics** • reliability prediction & FMD91/2016 standard
  - Values apply for indoor, stationary mission profile during useful • lifetime
- If end effect unclear, pessimistic choice taken •
- Certain end effect assignments should be • validated by functional tests in hardware
  - E.g. behavior under 3.3V voltage rail drift, ADC behavior under • reference voltage drift
- **FMECA** process identified parts of design that ۲ may be optimized further for QDS CONS design for main dipole magnets
  - E.g. placement of additional pull up/down lines in channel







CERN

### **Top-Down Reliability Model – Beam Dump/Spurious Firing**

- Few pessimistic simplifications required
- HDS case shown, as CLIQ has additional redundancy in readout
  - Clear separation of redundant paths as PDSU retriggering does not retrigger between paths A & B
- BIS concentrator
  - New CIBFX design
  - Originally developed for EPC use cases
  - Reliability study as part of <u>BISv2</u>
    <u>reliability study</u>





BIS

**PDSU** 

uQDS

### **Results – Failure Rates**

#### Target



**Repair/Inspection Policy:** 

- Commissioning:1 operational (op) year = 7200hours/300 days
- Ramp detection interval: 12 hours
- Reaction to Supervision: 12 hours

Magnet protection: 128 instances that can have a single quench Beam dump: 216 instances that can have a spurious trigger

- Maximum number of failures when the demand interval approaches the commissioning interval
  - Magnet protection less reliable, mainly due to longer chain of systems in critical path
- → For both protection functions the reliability target is comfortably met.
  - $\rightarrow$  But under the condition of regular systematic testing



### **Commissioning interval** Magnet protection

Failures per 1000 years in IT systems for different demand intervals



**Repair/Inspection Policy:** 

- Commissioning: <u>1 or 3 operational years</u>
- Ramp detection interval: 12 hours
- Reaction to Supervision: 12 hours

Magnet protection: 128 instances that can have a single quench Beam dump: 216 instances that can have a spurious trigger

- With a commissioning interval of 3 years instead of 1 year, the number of failures increase
  - Mainly due to the probability of blind failures accumulating that are only visible in commissioning or on demand.
    - Difference smaller if demand rate is higher
- → With 3-year intervals, the reliability target is not met
- $\rightarrow$  Yearly quench test is recommended



### System Monitoring & Testing Magnet protection

Failures in 1000 years - Magnet protection demand every 12.8 years - different fill inspection intervals



**Repair/Inspection Policy:** 

- Commissioning: 1 operational (op) year
- Ramp detection interval: 12 hours → 7200 hours
- Reaction to Supervision: 12 hours → 7200 hours

Magnet protection: 128 instances that can have a single quench Beam dump: 216 instances that can have a spurious trigger

#### • Strong impact of less frequent/imperfect testing

- Only a small increase of about 1.1E-05, if the failures are detected and repaired after 72 hours.
- Maximum of 6.8E-01 failures if the failures are detected for the first time during yearly commissioning.
  - This assumes an interlock of operation (SIS) if <u>both</u> critical paths lose supervision.

### → Monitoring & ramp testing is crucial for protection function!

→ Extending coverage yields additional reliability margins

#### $\rightarrow$ Detected problems can be fixed after fill

 $\rightarrow$  Do not need to stop operations



### **Conclusions & Next Steps**

- A reliability model for the quench protection and beam dump functions of the IT shows
  - The foreseen uQDS, PDSU and BIS concentrator hardware design conforms with the reliability requirements
  - This is under the condition that
    - yearly commissioning tests are performed (IST) to check the integrity of the system and all interfaces
    - an automated test during ramp is executed every LHC fill as part of a sequencer task to check integrity of the system
- Follow-up of the study
  - The uQDS/PDSU FMECA analysis results should be validated by selected HW functional tests/simulations
  - Availability aspects of the system to be quantified and checked against operational data
  - An analysis of critical firm- and software and configuration management should complement the hardware study
- In view of the consolidation of the LHC main dipole QDS system
  - The reliability model should be adapted, and pessimistic assumptions refined
  - Design improvements triggered by uQDS/PDSU FMECA analysis should be implemented if possible





### **Protection System Life Cycle**

Clear and exhaustive specifications of the project

Machine Protection systems development follows defined life-cycle

Ensures that risks are mitigated

Inspired by IEC 61508 and adapted for CERN context

The scope of the uQDS & PDSU reliability analysis is to

- Identify risks and hazards and quantify requirements for their mitigation
- Qualify the detailed hardware design according to the defined requirements





### **Component-Level FMECA - Introduction**

### **M**

Failure Modes, Effects, and Criticality Analysis (FMECA)

Purpose: identify potential failure modes of individual components within a system & quantify failure impact at system level

| <u>Id</u> | <u>Component</u> | Description                                         | <u>failure_mode</u> | <u>Alpha</u> | Component Failure Rate | Failure Mode Rate | End Effect          |
|-----------|------------------|-----------------------------------------------------|---------------------|--------------|------------------------|-------------------|---------------------|
| 1.1       | C2               | -±10% 50V X7R SMD Multilayer Chip Ceramic Capacitor | Open                | 9            | 0.357                  | 0.032             | Spurious Protection |
| 1.1       | C2               | -±10% 50V X7R SMD Multilayer Chip Ceramic Capacitor | Parameter change    | 61           | 0.357                  | 0.218             | No effect           |
| 1.1       | C2               | -±10% 50V X7R SMD Multilayer Chip Ceramic Capacitor | Short               | 30           | 0.357                  | 0.107             | Blind channel       |



### **FMECA Process** Key steps

- 1. Using Bill of Materials, do a component-wise Failure Rate Prediction.
  - Mainly based on 217Plus standard (2015/RIAC, but also available: Telcordia TR/SR, MIL-217, NSWC). Completed by manufacturer and test data.
  - Requires definition of mission profile/environment as well as operating conditions for individual components
- 2. Identification & apportionment of component failure modes
  - i.e., capacitor -> {open, param. change, short}.
  - Based on handbooks (MIL-HDBK338, FMD2016).
- 3. Assigning end-effects to each failure mode of every component of the system.
  - i.e., Capacitor C1: open -> no effect, short -> false dump, param. change -> blind failure.

| 🛐 😂 🛃 🐚 🥬  | X   🖿 🧉               | 5 G G          | X ∣ • <b>7</b><br>⊒ Grid                           | Plot           | 🗈   💖   🙀   217<br>Plot & Grid 🥏 Li | Plus<br>braries 🍕 | Diode  Parts Library | Reports      | •   0             |              |  |
|------------|-----------------------|----------------|----------------------------------------------------|----------------|-------------------------------------|-------------------|----------------------|--------------|-------------------|--------------|--|
|            |                       | 1              | Prediction blocks • General • 🚰 🗸 🦹 All rows • 💭 🙀 |                |                                     |                   |                      |              |                   |              |  |
|            |                       |                | 1                                                  | 0              | Part number                         | Descript          | tion                 |              | Category          | Failure rate |  |
| 🕀 🦲 6:Bear | nnector:FR=           | 2              |                                                    | 0-2            | Beam 2                              |                   |                      | System Block | 940               |              |  |
| @ _ 3:Pow  | R=89.63               | 2              | 1                                                  | 10TPB47M       | ±20% 10                             | V ESR 0R07 Tant   | alum Solid C         | Capacitor    | 0.9288            |              |  |
| + 4:Bear   | r:FR=0.4674           | 2              | 3                                                  | 10TPB47M       | ±20% 10                             | V ESR 0R07 Tant   | alum Solid C         | Capacitor    | 0.9279            |              |  |
| ⊕ _ 5:Be   | Block Propertie       | s - 2.2 : ±20% | 10V ESR                                            | 0R07 Tantalu   | m Solid Capacitor wit               | h Condu           | ? X                  | m Solid C    | Capacitor         | 0.9279       |  |
| э 🔲 7:Ве   |                       |                |                                                    |                |                                     |                   |                      | m Solid C    | Capacitor         | 0.9279       |  |
| Deratings  | General Para          | meters Rate/   | Pi Factor                                          | Notes H        | yperlink                            |                   |                      | PROMs        | External          | 2            |  |
|            |                       |                |                                                    |                | _                                   |                   |                      | ligger       | External          | 0            |  |
|            |                       |                | Quantity:                                          | 1              |                                     |                   |                      | igger        | External          | 0            |  |
|            |                       | Adjustmen      | t Factor:                                          | 1              |                                     |                   |                      | igger        | External          | 0            |  |
|            |                       | Year of Man    | ufacture:                                          | 2020           |                                     |                   |                      | DC with      | IC, Plastic Encap | 1.383        |  |
|            |                       | Du             | ty Cycle:                                          | 1              |                                     |                   |                      | pose Tra     | Transistor        | 37.59        |  |
|            |                       | Cycli          | ng Rate:                                           | 2              |                                     |                   |                      | pose Tra     | Transistor        | 37.59        |  |
|            | A                     | mbient Temp, O | perating:                                          | 35             |                                     |                   |                      | pose Tra     | Transistor        | 37.59        |  |
|            | /                     | Ambient Temp,  | Non-Op .:                                          | 25             |                                     |                   |                      | pose Tra     | Transistor        | 37.59        |  |
|            |                       | Capaci         | tor Type:                                          | Aluminum       |                                     | ~                 |                      | pose Tra     | Transistor        | 37.59        |  |
|            |                       | Capacitance (  | Micro F):                                          | 47             |                                     |                   |                      | pose Tra     | Transistor        | 37.59        |  |
|            |                       | Elec Stress Ca | Ic Mode:                                           | Calculated     |                                     | ~                 |                      | pose Tra     | Transistor        | 37.59        |  |
|            | Voltage 5             |                |                                                    | 0.1            |                                     |                   |                      | pose Tra     | Transistor        | 37.59        |  |
|            |                       | Operating Vo   | tage (V):                                          | 1              |                                     |                   |                      | pose Tra     | Transistor        | 37.59        |  |
|            | Rated<br>Ambient-Case |                |                                                    | 10             |                                     |                   |                      | pose Tra     | Transistor        | 37.59        |  |
|            |                       |                |                                                    | 10             |                                     |                   |                      | er Chip C    | Capacitor         | 0.3395       |  |
|            |                       |                | inp rupe.                                          | 10             |                                     |                   |                      | er Chip C    | Capacitor         | 0.3503       |  |
|            | Stress=               | Temp=          |                                                    |                |                                     | OK                | Cancel               | er Chip C    | Capacitor         | 0.3395       |  |
| L          |                       |                | 2                                                  | 58             | CC0603_10NF_5.                      | ±10% 50           | V X7R SMD Multil     | ayer Chip C  | Capacitor         | 0.3503       |  |
|            |                       |                | 2                                                  | 55             | CC0603_10NF_5.                      | ±10% 50           | V X7R SMD Multil     | ayer Chip C  | Capacitor         | 0.3395       |  |
|            |                       |                | 2                                                  | 59             | CC0603_10NF_5.                      | ±10% 50           | V X7R SMD Multi      | ayer Chip C  | Capacitor         | 0.3503       |  |
|            |                       |                | 2.56 CC0603_10NF_5 ±10% 50V X7R SMD Multilayer Ch  |                |                                     |                   | ayer Chip C          | Capacitor    | 0.3503            |              |  |
|            |                       |                | 2                                                  | 50             | CC0603_10NF_5.                      | ±10% 50           | V X7R SMD Multil     | ayer Chip C  | Capacitor         | 0.3503       |  |
|            |                       | 2              | 51                                                 | CC0603_10NF_5. | ±10% 50                             | V X7R SMD Multil  | ayer Chip C          | Capacitor    | 0.3503            |              |  |
|            |                       |                | 2                                                  | 57             | CC0603_10NF_5.                      | . ±10% 50         | V X7R SMD Multi      | ayer Chip C  | Capacitor         | 0.3395       |  |
|            |                       |                |                                                    | 18             | CC0603_100NF_                       | ±10% 50           | V X7R SMD Multil     | ayer Chip C  | Capacitor         | 0.3396       |  |
|            |                       |                |                                                    | 19             | CC0603_100NF_                       | ±10% 50           | V X7R SMD Multil     | ayer Chip C  | Capacitor         | 0.3396       |  |
|            |                       |                |                                                    | 13             | CC0603_100NF_                       | ±10% 50           | V X7R SMD Multil     | ayer Chip C  | Capacitor         | 0.3529       |  |
|            |                       |                |                                                    | 20             | CC0603_100NF_                       | ±10% 50           | V X7R SMD Multi      | ayer Chip C  | Capacitor         | 0.3396       |  |
|            |                       |                | 2                                                  | 21             | CC0603_100NF                        | ±10% 50           | V X7R SMD Multil     | ayer Chip C  | Capacitor         | 0.3396       |  |
|            |                       |                |                                                    |                |                                     |                   |                      |              |                   |              |  |

Screenshot of Isograph Reliability Workbench (tool used for FMECA analysis)





BIS

**PDSU** 

Heaters + CLIQ Q1,Q2,Q3,BB,SCL

UQDS 1A (and voltage taps) 35A trim Q1B Q1A UODS 1A Front-end Channe UQDS 1A Front-end Channel EE 131 EE 142 EE 141 EE 154 for U Q1A P4 for U Q1A P1 PA3 PA2 PB1 PB4 PA4 PA1 PB2 PB3 UQDS 1A Front-end Channel EE 121 EE 132  $\odot$ for U Q1A P2 (-) $(\mathbf{+})$ UQDS 1A Front-end Channel for U\_Q1A\_P3 UQDS 1A Front-end Channe EE 224 EE 211 for U\_Q1B\_P3  $\odot$ EE 111 EE 122 UQDS 1B (and voltage taps) EE 112 EE 124 Asymmetric detection: Coil-coil comparison of neighboring coils (PA3 - PA2, PA4 - PA1, PB1 - PB4, PB2 - PB3) Magnet symmetric detection: Comparison of magnet halves: (PA3 + PA4) - (PA4 + PA1), (PB1 + PB4) - (PB2 + PB3) Full symmetric detection: Comparison of Coil voltages between Q1A and Q1B UQDS 1B Front-end Channe LIODS 1B Front-end Channe EE 133 EE 144 EE 143 EE 153 for U Q1A P4 for U Q1A P1 UQDS 1B Front-end Channel EE 123 EE 134 for U Q1A P2 Reliability model assumes single coil UQDS 1B Front-end Channel for U Q1A P3 UQDS 1B Front-end Channel EE 222 EE 212 for U\_Q1B\_P3 quench

- Quench protection strategy is inherently redundant
- For single coil quench, triple redundant detection method & each of them redundant in hardware

# **Top-Down Reliability Model – Magnet Protection**



## **Top-Down Reliability Model – Magnet Protection**

- Quench protection strategy is inherently redundant
- For single coil fault, triple redundant detection method & each of them redundant in hardware





## **CIBFx+CIBF or only CIBFx?**

Failures in 1000 years - Beam Dump/Spurious Firing with and without CIBF





- Depending on the demand rate, the additional CIBF reduces the number of faults per 1000 years by **0 to 2.20E-05**.
- In the relevant range of 0.0046 spurious firings per year per HDS/CLIQ (1 spurious firing per year), the influence is with a difference of about 3.24E-08 to 3.24E-09 almost negligible.



<sup>-</sup>ailures per 1000 years

### **Design qualification – Analytic Approach – Magnet Protection**



- An analytical approach was chosen over a simulation approach for time reasons and results are consistent
- The minimal cut set method was used to consider various inspection intervals and repair actions

