



# Reliability Applied to KM3NET

S.Colonges CLB meeting 30/01/2014

# Who am i?



- Stéphane COLONGES
- APC Laboratory
- Electronic Product Assurance Manager
- Activities:
  - Support engineering to improve reliability
  - Components & system qualification
- Projects:
  - Auger Observatory (1830 boards in harsh environment)
  - Space projects: Taranis, SimbolX, R&D anti-coïndidence
  - CTA observatory (QAM)

## Table of contents

- Development process
- Reliability for KM3net
- Reliability analysis
- My contribution
- Conclusion



#### **Development process**

V Cycle



# Concept of operation



Product definition (scientific needs)
Think Needs before solutions!

- Constraints (environment, life cycle, maintenance, costs...)



#### **Requirements and architecture**

-Requirements (functional, performances, environmental, RAMS...)

-Functional analysis (<u>SADT method</u>)

-Applicable needs

-PBS



#### **Requirements and architecture**



-FMECA (using tools like FTA and RBD) → Avoid SPF, reduce failures criticism

→ Iterative process!



### FMECA

| Project: |       |          | Version:        |                  |                  | Date:             |                  |   |   |     |                       |
|----------|-------|----------|-----------------|------------------|------------------|-------------------|------------------|---|---|-----|-----------------------|
| System:  |       |          |                 | Subsystem:       |                  |                   | Teamwork leader: |   |   |     |                       |
| ld.      | Comp. | Function | Failure<br>mode | Failure<br>cause | Local<br>effects | Global<br>effects | S                | 0 | D | RPN | Corrective<br>actions |
|          |       |          |                 |                  |                  |                   |                  |   |   |     |                       |
|          |       |          |                 |                  |                  |                   |                  |   |   |     |                       |
|          |       |          |                 |                  |                  |                   |                  |   |   |     |                       |

# Design



Good conception = high reliability

Parts selection, consider:

- Obsolescence (Vs LTA), maturity, wide distribution
- Environment: temperature, ESD, salt, humidity...
- MTBF (FIT)
- ROHS Vs Whiskers?

- Improve reliability ( Derating - ECSS-Q-ST-30-11C, redundancies, ESD protections, ESR compromise...)

# Design

#### **Reliability analysis**

- MTTF :



- Acceptable failure rate?

- Spare quantities

- FMECA (identify failure modes and reduce effects).

Iterative process:

- -Identify the weakness points
- -Identify failure type and process
- -Change the design to improve reliability

# Design

#### **Critical Design Review:**

#### - Objectives:

-Validate the detailed product conception

-Prototype analysis and test results

-Check the product conformity with the specifications

#### - Documentation:

- -Definition justification document
- -Manufacturing files, procedures and documents
- -RAMS plan and FMECA (Reliability analysis) & MTBF evaluation
- -Interfaces Control Document
- -Preliminary user guide



#### Pre-production (Detailled design) -Goal:

-Produce a small quantity of boards, update the design to production process, design hardening

#### -Industrialization:

-PCB rules (IPC2221... IPC600...)

-IPC class 2



-Manufacturing and soldering processes

-Manufacturing test and inspection (In situ tests, test bench...)

- -HASS procedure (eliminate youth failures)
- -Environment protection (coating, ESD suppressor...)
- -Storage, packaging, handling...

## Pre-production (Detailled design)

-Production Readiness Review:



-Manufacturing sub-contractor, public tender...(instructions, CCTP...)

-Industrial files



#### -Dividing in batches, allow to:



-Detect weakness point (PCB layout, production process)

-Change layout or production process (non conformities correction)

#### - Configuration follow up :

-Non conformities taken into account

-Modification taken into account

-Document folder: customer/sub contractor use the same files and document version

#### Document folder example

RÉPERTOIRE DOCUMENT

DÉSIGNATION : INDICE :

.

Carte AUGER - IN2P3 ("carte unifiée") K

08/10/03

Date de mise a jour :

réf, du sous document Nombre de folios / remarques Indice Désignation du sous document NomAuger04072003 (document Excel) Nomendature carte unifiée AUGER G Nomenclature des composants Intégrée et mise a jour dans la nomendature de Rill of matérial (document, tyt) Version 1.3 ==> repères topologiques la carte unifiée AUGER (version obsolète) Instructions générales de fabrication ; en particulier, instructions de câblage, instructions Exprication Instructions (document Word) E Instructions de fabrication de la carte unifiée de trooicalisation, instructions d'emballage Instructions de programmation (documen (broW D Instructions de programmation Instructions logiciel Spécifications générales : compte rendu de la réunion Spécifications générales (réunion du energificationsenerales (document/Word) I R du 24/10/2002 24/10/2002) testfonctionnelub (document Word) C Procédure de test fonctionnel Description du test fonctionnel deverminagespecifications (documer Word) F Procédure de deverminage Description de la procédure Relatif aux fichiers de fabrication Unified Board version С Schéma électronique 14 Cette version inclue les plages de test schemaUB1V4.toz PI Dodf A Schema interne du PLD Fichiers gerbers du CI version 2.1 (Gerber étendu F Fichiers gerbers : Gerber21.tgz RS274-X) Fichiers Gerbers Incluent en particulier : 4 couches : routage Top et Bottom et plan TOP, art modifié entre indice Det E plan de routage internes SERI TOP art modifié entre indice Det E plan de sérioraphie Sérigraphie top et bottom Masque de vernis épargne VE TOP.art modifié entre indice D et E Verni épargne top et bottom Masque de refusion modifié entre versions D et E. M44 entevé car Masque de refusion pour la soudure des Masque refusion de (PASTEMASK TOP2 1 art) composants CMS ron soudé) Non modifié entre version D et E plan de percage plan d'implantation TOP et BOTTOM A Fichiers acrobat reader (topUB1V4.pdf, botUB1V4.pdf) Spécifications particulières pour le circuit R imprimé Specifications CL Spécifications pour la réalisation du circuit imprimé Fichier Fabmaster (pour test in-situ) relatif aux extract20.val.gz fichiers cerbers version 2.0 pour la fabrication в Fichier Fabmaster pour UB version 2.0 de la carte

Because of the former was the second of the

#### Define a common test and HASS strategy:

-Visual inspection

- -Boundary Scan, In situ test (nails), mobil probe
- -HASS or ESS with light functionnal test
- -Functionnal test  $\rightarrow$  perform in the manufactory







(from -20 to 67°C – 3°C/mn slope - burn-in 8 hours – 7 temperature cycling – 30 mn dwell time - Total duration 23 hours - 20 UB in the oven)



#### Auditing:

-Relation customer/supplier  $\rightarrow$  as flexible as possible -Win-Win relationship  $\rightarrow$  they have interest in science

project (pub). We want the higher quality product



- The FIDES methodology (from page 259) identifies a list of recommendations which, if followed, will facilitate construction of a product reliability. This set of recommendations has been broken down into a set of questions.
- The answers that a company gives to these questions provides:
- a measurement of its ability to make reliable products,
- a quantification of the process factors used in the calculation models,
- the possibility of identifying improvement actions.

#### Audit procedure

To control an audit, the auditor must:

- Identify the audit scope.
- Prepare the audit.
- Perform the audit.
- Collect proofs.
- Process the collected information.
- Draw conclusions.
- Write an audit report.
- Present the audit result.



| Level                 | Process                                                | $\Pi_{Process}$ | Process grade |  |
|-----------------------|--------------------------------------------------------|-----------------|---------------|--|
| ∨ery high reliability | Process almost with no weakness                        | <1.7            | > 75%         |  |
| High reliability      | Controlled process, reliability<br>engineering         | 1.7 to 2.8      | 50% to 75%    |  |
| Standard              | Usual ISO 9001 version 2000<br>type quality procedures | 2.8 to 4.8      | 25% to 50%    |  |
| Unreliable            | Reliability problems not taken<br>into account         | >4.8            | <25%          |  |

Evaluate (audit) process influence with: Process.xls

. And fill Пр = 4,00

in result1\_stress of FIDES Mill V2004A -2- Component.xls

process failure distribution

support 20%

system Integration 13%

specification

desi 169

equipment

production

23%

|                         | Contributio<br>n<br>(of marks<br>ot the | Mark<br>obtained | By maz<br>mark | Contributio<br>n<br>(of the<br>theoretic | Process<br>grade |                                         | Percentage<br>of failures of<br>each phase |
|-------------------------|-----------------------------------------|------------------|----------------|------------------------------------------|------------------|-----------------------------------------|--------------------------------------------|
| specification           | #DIV/0!                                 | 0.0              | 433,9          | 8,0%                                     | 0%               | $\Pi_{\text{Specification}} = 1.18$     | 7,7%                                       |
| design                  | #DIV/0!                                 | 0,0              | 867,7          | 16,0%                                    | 0%               | Π <sub>Design</sub> = 1,39              | 14,3%                                      |
| equipment<br>production | #DIV/0!                                 | 0.0              | 1 301,6        | 24,0%                                    | 0%               | ILEProduction = 1,65                    | 19,9%                                      |
| system<br>integration   | #DIV/0!                                 | 0.0              | 650,8          | 12,0%                                    | 0%               | $\Pi_{\text{Integration}} = 1,28$       | 11,2%                                      |
| rield<br>operation &    | #DIV/0!                                 | 0.0              | 1084,7         | 20,0%                                    | 0%               | $\prod_{\text{Field operation}} = 1.52$ | 17,2%                                      |
| support                 | #DIV/0!                                 | 0.0              | 1084,7         | 20,0%                                    | 0%               | II.52                                   | 17,2%                                      |
| Total process ==>       |                                         | 0,0              | ******         |                                          | 0%               | $\Pi_{Process}$ = 8,00                  | 87,5%                                      |





 $\mathbb{X}$ 

help to define the influence of the process in term of reliability (Questions and recommandations...)

FIDES Mill V2004A - Process.xls



#### Audit example use Fides excel tool

. .

# Installation (Integration, test and verification)

-Installation procedures:

-10% of failures = human error

-ESD and lightning protection

-Test bench (laboratory)



-Test facilities (to test systems on the site before installing lines)

# Commissionning (System verification and validation)

-Verify functions and performance according to requirements

-Parameters useful for failure detection correctly monitored

-Monitoring software (easy abroad parameters access)



## **Operation and maintenance**

- Evaluate maintenance resources (spares quantities, costs, people...)
- Recovery procedures
- Update FMECA and MTTF: iterative update using experiment feedback (failures data collection)
- Database
- Record maintaining activities
- Local staff training



## Reliability for KM3net How to jump on the bandwagon?

- 1) Requirements and functional analysis
- 2) MTTF analysis and FMECA
- 3) Verify design rules (Derating...)
- 4) Documentation
- 5) Review
- 6) Then next steps (pre-prod,

Production, ...)



#### Reliability analysis - Tools



# **FIDES** - Based on failure physics and calibrated with test • feedback and field failure data Technologies RELIABILITY Uses Process

#### **FIDES Begins**

Why FIDES ?

- Reliability Data book prediction are obsolete! (don't cover actual component technologies)
  - ➔ MIL-HDBK-217 is not maintained since 1995
- FIDES → Funding in 2001 by DGA (French DoD) and 8 international companies (+ BOEING, JAXA, CNES, CNRS... interested)

Handbook and tools :

www.fides-reliability.org



### FIDES

Based on physics failure , accelerating factors and process contribution:



#### MTTF / Bath tub



#### Conclusion

- People should be aware on QA added value!
- Your collaboration is necessary
- Keep in mind: reliability is an iterative process

• • •

And 2 other words : Brainstorming and workgroups...



#### More slides...

# FMECA An iterative process PROCEDURE-FLOWCHART



#### Methodology

PBS and Technical specifications

#### Reliability Block Diagram and/or Fault Tree Analysis

Single Point Failure identification (SPF)

#### FMECA :

Failure mode? Effects and criticality? Corrective actions?

Iterative Process→ Continuous improvement

Who made?

**Project Team** 

#### **CIL, FDIR**... Criticality reevaluation

➔ The goal is to have lower criticality

#### Reliability analysis - Methodology



## MTTF / Temperature

- Composants optiques : 0.8 eV
- Bipolar Ics, transistors, diodes : 0.7 eV
- MOS ICs : 0.6 eV
- T1: normal temperature (related to the FIT)
- T2: new temperature

$$\lambda(T_2) = \lambda(T_1) \times \exp\left[\frac{E_A}{K} \times \left(\frac{1}{T_1} - \frac{1}{T_2}\right)\right]$$

Avec la constante de Boltzman : K= 8.63 \*  $10^{-5}$ eV/K

# FMECA

| Identifica-<br>tion number              | Function                           | Failure modes                                                                          | Failure effects                       |                                      |                                                                | Failure detection                                                     | Compensating                                                                                         | Severit   | Remarks |
|-----------------------------------------|------------------------------------|----------------------------------------------------------------------------------------|---------------------------------------|--------------------------------------|----------------------------------------------------------------|-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|-----------|---------|
|                                         |                                    | and causes                                                                             | Local effects                         | Next higher<br>level                 | End effects                                                    | method                                                                | provisions                                                                                           | y class   |         |
| 1<br>(no<br>component<br>id. available) | External<br>power<br>supply        | Low voltage                                                                            |                                       |                                      | System shut<br>down                                            | IGONACUT signal<br>is active                                          | Shut down system and<br>charge batteries, check<br>solar panel and solar<br>panel controller         | IV        |         |
| 2                                       | Power<br>protection<br>and control | No shut down<br>when occurs a low<br>voltage on the<br>input, or constant<br>shut down |                                       | Voltage too<br>low, or no<br>voltage | Power supply<br>problems or<br>no supply<br>(fuse may<br>fuse) | Current<br>consumption is<br>higher if low<br>voltage, or no<br>power | Check components<br>describes for this<br>function in table 1, and<br>repair what is necessary<br>to | III or IV |         |
| 21                                      | Comparator                         | Bad information<br>on the output                                                       | No detection<br>of voltage<br>problem | Bad trigger<br>for the timer         | See line 2                                                     | No trigger or<br>constant trigger on<br>the input of 22               | Check M21, change if necessary                                                                       | III or IV |         |
| 22                                      | Timers                             | No change on the<br>output after a<br>trigger; Timing<br>problem                       |                                       |                                      | See line 2                                                     | No shut down<br>when low voltage<br>occur, or repetitive<br>shut down | Check component<br>described in table 1                                                              | III or IV |         |