#### DE LA RECHERCHE À L'INDUSTRIE



# DEPENDABILITY OF SPACE BORNE INSTRUMENTATION



#### JEAN FONTIGNIE – PA/QA MANAGER AT CEA/IRFU

www.cea.fr

## OVERVIEW OF SPACE PROJECT ORGANIZATION

- Space agency (CNES-France, ESA-Europe) : Client
- Industrial contractor (Thales Alenia, Airbus D&S)
- Consortium of research labs for instruments



Example on Euclid Mission Thales is prime contractor Airbus Defense & Space is in charge of telescope and payload module



VIS (photometer) : Consortium managed by MSSL (UK), with contributions from UK / France / Switzerland / Italy

NISP (spectro) : Consortium managed by LAM (Marseille), with contributions from France / Italy / Spain

# Cea common reference – Gold Rule

Complex organization, mixing different cultures Space Agencies impose a common reference to all "contractors" European Coordination for Space Standards (doc tree : bonus slide / www.ecss.nl) Dependability, Electronic components procurement & Rad hardness Material & Mechanical parts & processes procurement, Software

Rule : Each lab in charge of a "flight deliverable" identifies a "PA/QA manager" in charge of those matters + Quality Assurance

Space mission duration

- 100% ON availability without maintenance
- > 100.000 hours

Compare to well known object

- A car : 150.000km @50km.h<sup>-1</sup>
- 3000 hr with maintenance

Gold rule for space instrumentation : Schmitt axiom (lesson learnt)

- part I: What is not identified as a requirement is not verified
- part II : What is not verified, don't expect it to work properly
- Part III : Early verification will lower failure consequence



#### Early verification needs early model/prototypes







STM

Dependability requirements :

- Implemented on the flight model
- Anticipated since the early model / studies
- Verified through all the program
- Concern all phase of mission



Quasi exclusive use of space qualified HiRel components

- Procurement spec

   [min-max] for critical parameters
   Test conditions for verification of parameters & frequency
- Adding Target for failure rates -> Quality level

| Passive components               | Active components                               |                                   |
|----------------------------------|-------------------------------------------------|-----------------------------------|
| ESA SCC – MIL S<br>MIL R         | ESA SCC MIL QML V / JAN S<br>MIL QML Q / JANTXV | High Proc cost                    |
| MIL P - Automotive<br>Commercial | MIL 883B<br>Commercial                          | High risk & up-<br>screening cost |

- Up-screening : procurement with lower Q level + testing
- For high volume, up-screening may be competitive
- High proc cost (passive 5€ to 100€; op amp 500€; FPGA 15 k€)
- Limited access to recent / high perfo EEE



Very integrated Hi-Energy particle detector Need for a internal filtering capacitor Volume/capacitance constraint

Decision : Study possibility of upscreening of an "automotive" ceramic capacitor (6.3V 10µF 0805)





#### **EEE - EXAMPLE OF UP-SCREENING**



Cea

#### **EEE – EXAMPLE OF UP-SCREENING**





How can I defer the wear-out zone of components?

#### **PART STRESS – DERATINGS REQUIREMENTS**

Any EEE component shall show margin between operating conditions and max ratings established by manufacturer (NON NEGOCIABLE) Margin detailed for each family (ref ECSS-Q-ST-30-11C) Applies to all modes, in standard conditions (not in fault conditions), steady & surge

Example : Solid tantalum capacitors



#### Requirement

The capacitor stress sum value of steady-state voltage, AC voltage shall not exceed the load ratios specified

Surge current shall be derated to 75 % of the Isurge max. Isurge max is defined as Vrated/(ESR+Rs). 100 % surge current screening shall be applied for all surface mounted capacitors types.

Reverse voltage shall not exceed 75 % of the manufacturer's specified maximum value for the reverse voltage.

The dV/dt rating capability of the capacitors shall be respected



#### PART STRESS – 2<sup>ND</sup> KIND



2<sup>ND</sup> Deratings applied (not required by a "standard"), "in house practice".

For passive components, as far as possible, we do not choose extreme value within a package Problem most often occurs on those extreme areas (limit of technology / construction)







Failure will occur, OK. But can the instrument survive?



## Cez

## **FMEA – FAILURE TOLERANCE REQUIREMENTS**

Failure tolerance is present in requirements (linked to space operations)

- No SPF (Single Point Failure) shall lead to mission loss (Consequence)
- No possible failure propagation between redounded units
- No possible failure propagation through cross-strapped IF
- Provide Housekeepings to allow ground diagnostic & on board FDIR





### FMEA - FMECA

Others – Lesson learnt – ground activities – failure segregation

- No possible failure propagation between units
- No possible failure propagation from test equipment to flight equipment
- Electronic board shall survive to an unintended loss of power
- (complete or partial eg +5V/+12V/-12V)
- "Power interface" shall incorporate V or I limiters





### FMEA - PRELIMINARY

Early project step – preliminary design Functional FMEA : identification of critical areas



| Failure<br>mode | Effect | Rank | Detection | Recovery | Reco |
|-----------------|--------|------|-----------|----------|------|
|                 |        |      |           |          |      |
|                 |        |      |           |          |      |
|                 |        |      |           |          |      |
|                 |        |      |           |          |      |
|                 |        |      |           |          |      |
|                 |        |      |           |          |      |



Some SPF are withdrawn through simple reco's (see bonus slide for example component level FMEA on SHU power amplifier)

Some Single Point Failure remain purpose of FMEA is to identify remaining SPFs and make associated risk accepted at mission level

For each subsystem : the Interface Control Document should identify maximum ratings on it's interfaces (not only operating conditions)

Some requirements are not accessible to verification through analysis ->Test on prototype / Early model

Example : "Electronic board shall survive to an unintended loss of power (complete or partial +5V/+12V/-12V)"

#### **RELIABILITY ASSESSMENT**



Failure will randomly occur, OK. But how often?



#### **Reliability apportionment – Early project phase**



Purpose : check adequacy of instrument architecture / reliability targets

DE LA RECHERCHE À L'INDUSTRI

#### **RELIABILITY DIAGRAM**

Target : Mission 6.25y, R=0,92 Subsystems failure rate in FITs ( $\lambda$  failure / 10<sup>9</sup>h) Reliability over period t : R=Exp(-  $\lambda$ .t)



Instrument reliability  $R = \prod R_{subsyst} = 0.581$  (see bonus slide for details)

Any possible science in degraded mode (10 or 11 ROE-CCD blocks)? R(11 blocks) = 0,857 / R(10 blocks) = 0,931

Cold redundancy for PMCU? R(11 blocks) = 0,904 R(10 blocks)=0,982



Updated architecture / operation scenario of instrument



Requirement document of subsystem :

- MCU : cold redundancy, <1000 FITs,
- no cross-strapping on DPU-MCU
- DPU : cold redundancy, < 1500 FITs
- ROE : < 800 FITs
- Cross strapping ROE\_DPU
   + change on mission scenario (less minimum active surface end of life)



#### Verification of compliance to reliability targets

- Summing  $\lambda$  of EEE parts when design stable
- EEE part  $\lambda$  estimation
- MIL-HDBK-217F reliability handbook (preferred)
- Alternative IEC-TR-6230, FIDES
- Example of using MIL-HDBK-217F -> see bonus slide
- Use of dedicated commercial tools or spreadsheets
- For complex / recent EEE, MIL-HDBK gives too high  $\lambda$
- Eg FPGA ACTEL RTAX2000 :  $\lambda_{die} > 1000$  Fits
- Use of extended life time testing data to predict reliability over mission and
- See example in bonus slide :  $\lambda_{die} = 7$  Fits

#### **PART STRESS - DERATINGS**



Some components will have some drifts. Is the instrument tolerant?



#### WORST CASE ANALYSIS

Phenomenon

- a) Random caracteristics of components inside the [min-max] range of the proc spec
- b) Drifts will occur along the mission with ageing & radiation.

Cumulation of those phenomenon may bring instrument out of performance requirement

Impact of a) will be detected during ground testing (but schmitt part iii) Impact of b) is only accessible to analysis (WCA)

Worst Case Analysis performed on focused area only (decision from design engineer / Dependability engineer) Criteria : critical function, long chain

### WORST CASE ANALYSIS – CASE STUDY

#### Method

- identify function/area : temperature probe bias
- Identify awaited min perfo : 0.5% accuracy
- Identify components and key parameters

| Component    | parameter    |
|--------------|--------------|
| U13 op amp   | Vos, Ib, Ios |
| U38 volt ref | Vo           |
| Res          | R            |

- For each paramater identify
  - Min-Max in procurement spec
  - Drifts due to radiation (radiation test reports)
  - Drifts due to ageing

For ageing : drifts from accelerated life test & Acc factor Acceleration factor = arrhénius law (125° C -> T op)

+ linear dependance to time



• Drifts for operational amplifier

| Op amp | initial   | Radiation | Ageing |
|--------|-----------|-----------|--------|
| Vos    | +/- 150µV |           | 1µV    |
| lb     | 3nA       | 20nA      | 30pA   |
| los    | 1nA       | 5nA       |        |

- Designer : provide a model (spreadsheet) of circuit perfo from key parameters & drifts
- Model used to predict mission drifts
  - Nominal 550µA
  - Initial +/- 11µA
  - Ageing & radiation +/- 0.2µA
- Initial drift out of 0.5% accuracy but compensated (calibration)
- Mission drift within specification



Space instrumentation community is a user of accelerator test facilities (calibration of instruments, radiation testing of EEE)



Single Event Latch-up / Heavy lons testing On a space detector IC @ Louvain - Belgium

We encourage accelerator community to continue their effort on reliability / dependability

Further question & clarification jean.fontignie@cea.fr

### Bonus slide - Material & process procurement

#### Materials / Mechanical parts :

- Known behaviour in launch & space environment through experience & database
- Procured through procurement specification (either int' standard or dedicated)
- Processes : special focus on
- Gluing, Surface treatment Process documented (controlled through a procedure) characteristics (mechanical or optical) verified after exposition to stress (thermal cycling / thermal shock)
- Electronic assembly processes
   HiRel / Hicost components need HiRel PCB & HiRel process for assembly
   on PCB (and HiRel PCBs)
   Verification of process through testing on samples (PCB + components)
   100% inspection, Agreed & documented success criteria

Cea

#### **Bonus slide – ECSS stanbdards doc tree**



DE LA RECHERCHE À L'INDUSTR



#### **FMEA - DETAILED**



| Component | Failure<br>mode             | Effect                                                          | Rank | Detection                               | Recovery                                              | Reco                                                   |
|-----------|-----------------------------|-----------------------------------------------------------------|------|-----------------------------------------|-------------------------------------------------------|--------------------------------------------------------|
| L3        | Short Circuit<br>(On mode)  | Short circuit of<br>Pha/PhB, no<br>current in<br>stepper motror | 1S   | PhaseA/B<br>current HK                  | Switch to redundant may be inefficient                | Change from common<br>mode filter to standard<br>coils |
|           | Short circuit<br>(Off mode) | Current loop in<br>L3<br>Loss of SHU<br>torque                  | 1S   | Loss of steps,<br>wrong end<br>position | Switch to<br>redundant to<br>redundant<br>inefficient | Change from common<br>mode filter to standard<br>coils |



## **FMEA – FAILURES MODE OF COILS**



#### ECSS-Q-ST-30-02C 6 March 2009

| Туре                                       | Failure modes                                                                                    | Remarks                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|--------------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 07 01 RF coil<br>07 02 cores<br>07 03 chip | OC<br>SC between terminals<br>SC between turns<br>Any single terminal SC to<br>core or structure | SC between terminals or turns to be<br>considered except where specific provisions<br>other than enamel are taken (e.g.<br>specifically insulated wire, kapton layer or<br>specific design rules)<br>It is important to consider SC between<br>terminal and core or structure according to<br>technology for inductors mounted directly<br>on the structure<br>Breaking of the magnetic core is assimilated<br>to SC and is considered except where<br>specific provisions are taken (e.g. potting) |



R(t)=  $e^{-\lambda t}$ . [1+10.(1- $e^{-\lambda t}/10$ )], Assuming  $\lambda_{off} = \lambda_{on}/10$ 

Hot redundancy (10 among 12)

#### **RELIABILITY ESTIMATION -DETAIL**



|              | CCD    | ROE            | SHU    | MCU       | DPU            |
|--------------|--------|----------------|--------|-----------|----------------|
| Failure rate | 0 FITS | 800 FITS       | 0 FITS | 1000 FITS | 1500 FITS      |
| Reliability  |        | 0,591 (12 ROE) |        | 0,947     | 0,921<br>0,996 |

Instrument reliability  $R = \prod R_{subsyst} = 0.581$ 

## **RELIABILITY FEED BACK LOOP**

### Weakest element CCD-ROE

- a) Enhance ROE reliability? Would need lower than 150 FITs!
- b) any possible science with 10 or 11 CCD-ROE (degraded mode)? (see bonus slide for reliability formula of "n among m")

|              | CCD    | ROE      | SHU    | MCU       | DPU       | Instrument |
|--------------|--------|----------|--------|-----------|-----------|------------|
| Failure rate | 0 FITS | 800 FITS | 0 FITS | 1000 FITS | 1500 FITS |            |
| Reliability  |        | 0,591    |        | 0,947     | 0,996     | 0,558      |
| 11 CCD-ROE   |        | 0,909    |        | 0,947     | 0,996     | 0,857      |
| 10 CCD-ROE   |        | 0,987    |        | 0,947     | 0,996     | 0,931      |

Focus on MCU. Cold redundancy? MCU DPU CCD ROE CCD ROE SHU MCU DPU 11 CCD-ROE 0,909 0,991 0,904 10 CCD-ROE 0,987 0,991 0,982

### **RELIABILITY ASSESSMENT – DETAILED DESIGN**

#### Verification of compliance to reliability targets

Summing  $\lambda$  of EEE parts when design stable

EEE part  $\lambda$  estimation

- MIL-HDBK-217F reliability handbook (preferred)
- Alternative IEC-TR-6230, FIDES
- Example Ceramic capacitor  $\lambda_n = \lambda_h \pi_t \pi_c \pi_n \pi_{sr} \pi_a \pi_e$ •

| symbol      | contrib              | Value/formula               |                   |                                                                                              |
|-------------|----------------------|-----------------------------|-------------------|----------------------------------------------------------------------------------------------|
| $\lambda_b$ | Intrinsic            | 0.00099                     | $\pi_{T} = \exp$  | $\left(\frac{-Ea}{8.617 \times 10^{-5}} \left(\frac{1}{T+273} - \frac{1}{298}\right)\right)$ |
| $\pi_t$     | Temp                 | Arrhenius, Ea = 0.35        | •                 | $(8.617 \times 10^{-5} (1 + 273^{-230}))$                                                    |
| $\pi_c$     | Cap value            | (Cap Value) <sup>0.23</sup> |                   |                                                                                              |
| $\pi_v$     | Voltage              | (Op volt. / 0.6 Max volt.   | ) <sup>3</sup> +1 |                                                                                              |
| $\pi_q$     | Q level              | 0.001 to 1.5 -> 0.01        |                   |                                                                                              |
| $\pi_e$     | Environ <sup>t</sup> | 0.5 to 40 -> 0.5 (Space)    | )                 |                                                                                              |

Use of dedicated tools or spreadsheets ۲

Complex EEE would need to extrapolate beyond limits of MIL-HDBK

ACTEL RTAX2000 Rad tolerant FPGA 250Kgates, 0.15µm techno, Die 2.25cm<sup>2</sup>

#### 5.3 MICROCIRCUITS, VHSIC/VHSIC-LIKE AND VLSI CMOS

DESCRIPTION CMOS greater than 60,000 gates

 $\lambda_p = \lambda_{BD} \pi_{MFG} \pi_{T} \pi_{CD} + \lambda_{BP} \pi_{E} \pi_{O} \pi_{PT} + \lambda_{EOS}$  Failures/10<sup>6</sup> Hours

| Feature Size                                                                                                                                                                                                                                                  |                                           |           | Die Area (cm <sup>2</sup> ) |               |               |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|-----------|-----------------------------|---------------|---------------|--|--|
| (Microns)                                                                                                                                                                                                                                                     | A ≤ .4                                    | .4 < A≤.7 | .7 < A ≤ 1.0                | 1.0 < A ≤ 2.0 | 2.0 < A ≤ 3.0 |  |  |
| .80                                                                                                                                                                                                                                                           | 8.0                                       | 14        | 19                          | 38            | 58            |  |  |
| 1.00                                                                                                                                                                                                                                                          | 5.2                                       | 8.9       | 13                          | 25            | 37            |  |  |
| 1.25                                                                                                                                                                                                                                                          | 3.5                                       | 5.8       | 8.2                         | 16            | 24            |  |  |
| $\pi_{CD} = \left( \begin{pmatrix} \frac{A}{21} \end{pmatrix} \begin{pmatrix} \frac{2}{X_s} \end{pmatrix}^2 \begin{pmatrix} .64 \end{pmatrix} \right) + .36 \qquad A = \text{Total Scribed Chip Die Area in cm}^2 \qquad X_s = \text{Feature Size (microns)}$ |                                           |           |                             |               |               |  |  |
| Die Area Conve                                                                                                                                                                                                                                                | rsion: cm <sup>2</sup> = MIL <sup>2</sup> | + 155,000 |                             |               |               |  |  |

Die Complexity Correction Factor - #CD

 $\lambda p > 10^3 FITs$ 

Using such formula would lead to

## Reliability assessment – (too) complex EEE

Use of field reliability data for die (Hi-Rel components manufacturer release them) Use of MIL-HDBK for package Based on extended lifetime testing data performed on any component batch

$$\lambda = \frac{\chi^2}{2.Af.Device\,Hours}$$

Chi-square at 60% level confidence level 2f+2 degrees of freedom (f number of failure) Device hours : number of tested devices tested x duration of lifetime testing AF : acceleration factor (Arrhenius with Ea=0.7, 125° C test)

$$\lambda_{die} = 7 FITs$$