## Using FPGAs in a Radiation Environment

ATLAS CMS Electronics Workshop for LHC upgrades (ACES2014) 18-20 March, 2014, CERN

Michael Wirthlin Brigham Young University, CHREC Provo, Utah, USA





## Workshop on FPGAs for High-Energy Physics

### Workshop on FPGAs for High-Energy Physics

21 March 2014 CERN Europe/Zurich timezone

#### Overview

Scientific Programme

Timetable

**Contribution List** 

Author List

Registration

Modify my Registration

Participant List

CERN Shuttle Service

External Laptops

Welcome to Geneva

How to get to CERN

Video Services

#### Support

j.pospisil@cern.ch

FPGAs in High Energy Physics is a workshop to discuss the use of commercial off the shelf field programmable gate arrays for high energy physics experiments and accelerator instrumentation. FPGAs used to process information from LHC experiments are exposed to radiation. They will be subjected to ionizing dose and single event effects. To use them effectively, redundant architectures need to be implemented to circumvent single event effects.

The first workshop will take place at CERN, on Friday, March 21st, 2014. The workshop will be held the day after the Fourth Common ATLAS CMS Electronics Workshop for LHC Upgrades (ACES 2014).

The workshop will include presentations on the intended use of FPGA in each experiment, presentation of irradiation test results, and possible strategies. The goals of the workshop are to share information on:

- FPGA Radiation Test Results
- FPGA SEU Mitigation Methods
- FPGA Upset Prediction Approaches
- FPGA Tools, Methods, and Design Approaches
- · Results and Experiences from existing experiments
- New FPGA architectures
- Tutorials and Demonstrations

Starts 21 Mar 2014 08:00 Ends 21 Mar 2014 18:00 Europe/Zurich CERN Filtration Plant Building 222 R-001





Search





NSF Center for High-Performance Reconfigurable Computing

# Modern FPGA Architectures

- Exploit advantages of programmable logic
  - In-system programmable
  - Low non-recurring engineering (NRE) costs
- High Logic Density and Serial I/O Bandwidth



- Up to 2M logic cells
- Up to 2.8 Tb/s serial I/O
- 68 Mb internal BRAM

## Integrated Processors, Memory, and I/O







# CHREC Space Processor (CSP)

- CubeSat Processing Board (10cm x 10cm)
  - Command & data handling, experiment & instrument control, data compression, sensor processing, attitude control, et al.
- Integrate COTS processing w/RadHard suport
  - Zynq-7020: Dual-core ARM (A9) + Artix-7 FPGA fabric
  - Radiation hardened NAND Flash, watchdog, and power supply





Reconfigurable Computing

## Xilinx Kintex7

- Commercially available FPGA
  - 28 nm, low power programmable logic
  - High-speed serial transceivers (MGT)
  - High density (logic and memory)
- Built-In Configuration Scrubbing
  - Support for Configuration Readback and Self-Repair
  - Auto detect and repair single-bit upsets within a frame
  - SEU Mitigation IP for correcting multiple-bit upsets
- Proven mitigation techniques
  - Single-Event Upset Mitigation (SEM) IP
  - Configuration scrubbing
  - Triple Modular Redundancy (TMR)
  - Fault tolerant Serial I/O State machines
  - BRAM ECC Protection
- Demonstrated success with previous FPGA generations in space
  - Virtex, Virtex-II, Virtex-IV, Virtex 5QV



### Kintex7 325T

- 407,600 User FFs
- 326,080 logic cells
- 840 DSP Slices
- 445 Block RAM Memory
  - 16.4 Mb
- 16 12.5 Gb/s Transceivers





## **Kintex-7 Radiation Testing**



LANSCE, Los Alamos, NM, Oct. 2012

- White spectrum neutrons (5.7E10)
- CRAM/BRAM cross section test



CERN, Geneva, Switzerland, Nov. 2012

- White spectrum hadrons (1.8E9)
- CRAM/BRAM cross section test







### 2013

TSL, Uppsala, Sweden, May 2013

- High Energy Protons (180 MeV), White Spectrum Neutrons
- Estimate proton cross section
- Validate scrubber and TMR

#### Texas A&M, College Station, Sept. 2013

- Heavy Ion Testing (N, Xe, Ar)
- 16 hours of testing (6 MeV-49 MeV)
- Single Event Latchup (SEL) Testing
- Wide range LET testing
- Space Rate Upset estimation

LANSCE, Los Alamos, Sept. 2013

- Mitigation Validation
- Enhanced scrubber testing
- Multi-Gigabit Transceiver Testing

Performance

hputing

- TMR validation
- Preliminary ZYNQ test



 "Soft error rate estimations of the Kintex-7 FPGA within the ATLAS Liquid Argon (LAr) Calorimeter", M J Wirthlin, H Takai and A Harding, Journal of Instrumentation, Volume 9, January 2014

Two papers submitted to 2014 Nuclear and Space Radiation Effects Conference (NSREC)

## **Kintex-7 Radiation Testing**



Lawrence Berkely National Laboratory, Berkeley, CA, Feb 24, 2014

- Single-Event Latchup (SEL)
- Multi-Bit Upset (MBU)







### 2013

TSL, Uppsala, Sweden, May 2013

- High Energy Protons (180 MeV), White Spectrum Neutrons
- Estimate proton cross section
- Validate scrubber and TMR

#### Texas A&M, College Station, Sept. 2013

- Heavy Ion Testing (N, Xe, Ar)
- 16 hours of testing (6 MeV-49 MeV)
- Single Event Latchup (SEL) Testing
- Wide range LET testing
- Space Rate Upset estimation

LANSCE, Los Alamos, Sept. 2013

- Mitigation Validation
- Enhanced scrubber testing
- Multi-Gigabit Transceiver Testing

Performance

nputing

- TMR validation
- Preliminary ZYNQ test



 "Soft error rate estimations of the Kintex-7 FPGA within the ATLAS Liquid Argon (LAr) Calorimeter", M J Wirthlin, H Takai and A Harding, Journal of Instrumentation, Volume 9, January 2014

Two papers submitted to 2014 Nuclear and Space Radiation Effects Conference (NSREC)

## LAr Upset Rate Estimation

| <br>Timepix | V-4VQ(1)                              | V-4VQ(2)                                       | Simple |
|-------------|---------------------------------------|------------------------------------------------|--------|
|             |                                       | $1.82 \times 10^{-6}$<br>$1.63 \times 10^{-6}$ |        |
|             | (bit <sup>-1</sup> fb <sup>-1</sup> ) |                                                |        |

 $^1obtained by multiplying the measure cross section by the fluence of particles above 20 MeV (2.84x10^8 <math display="inline">\rm cm^{-2}fb^{-1})$ 

- Phase 2 will integrate 2 fb<sup>-1</sup> in 10 h (5.56E-5 fb<sup>-1</sup>/s) 3000 fb<sup>-1</sup> for the integrated run
  - CRAM: 1.01E-10 upsets/bit/s
  - BRAM: 9.06E-11 BRAM upsets/bit/s
- Estimate accuracy: ± 50%
- Overall upset rate will depend on device
  - Larger devices have more CRAM and BRAM bits





# **Implications of Upset Estimations**

- Configuration RAM (CRAM) : 1 upset/150 s
  - Continuous configuration scrubbing is required
    - Prevent build-up of configuration errors
    - Scrub rate > 10x upset rate ( > 1/15 s)
  - Active hardware redundancy required
    - Mitigate effects of single configuration upset
    - Example: Triple-Modular Redundancy (TMR)
- BRAM : 1 upset/670 s
  - Exploit BRAM ECC (SEC/DED)
  - Employ BRAM scrubbing
    - Prevent build-up of errors to "break" SEC/DED code







## **TMR & Scrubbing Example**







## **CRAM MBU Testing Results**

#### Intra-Frame MBUs Inter-Frame MBUs **Upsets/ev** Frequency **Upsets/ev** Frequency ent ent 90.1% 65.0% 1 1 2 2 26.8% 7.5% 3 3 1.4% 2.9% 4 .60% 4 3.5% 5 5 .61% .26% 6+ .16% 6+ 1.3% \*results based on 2012 LANSCE neutron test ECC Intra-Frame MBU: Frame #0 not protected by ECC Frame #1 Intra-Frame MBU Inter-Frame MBU





# **10 Hour CRAM Upset Estimates**







## **Configuration Scrubbing**

- Configuration Scrubbing Constraints
  - Must repair single and multiple-bit upsets quickly
    - Accumulation of upsets will break mitigation (such as TMR)
    - Accumulation of upsets will increase static power
  - Minimize external circuitry (avoid radiation hardened scrubbing HW)
- Kintex7 FPGA contains internal "Frame" Scrubber
  - Continuously monitors state of configuration memory (FrameECC)
  - Automatically repairs single-bit errors within a frame
  - Identifies multi-bit errors and configuration CRC failures
- Additional scrubber support needed to repair MBUs
  - JTAG connection to host controller (slow, limited hardware)
  - Configuration controller and on-board memory (fast, complex hardware)
- Several Configuration Scrubbing approaches currently being validated







## **Configuration Scrubbing Approach**

- Configuration Scrubbing Constraints
  - Must repair single and multiple-bit upsets quickly
  - Minimize external circuitry (avoid radiation hardened scrubbing HW)
- Multi-level Scrubbing Architecture

### Inner Scrubber

- Uses internal Kintex7 Post CRC scrubber
- Scans full bitstream
  - repairs single-bit upsets
  - Detects multi-bit
    upsets
- Full bitstream CRC check
- Repair 91% upsets



Outer Scrubber

- JTAG Configuration Port
- Monitors state of inner scrubber
- Repairs multi-bit upsets
- Logs upset activity
- Repair 9% upsets (slower)

Multi-level scrubber currently validated at September, 2013 LANSCE test





# **Triple Modular Redundancy**

Voter after FF

**Feedback Voters** 













## **BL-TMR**

- BYU-LANL TMR Tool
  - <u>BYU-LANL Triple Modular Redundancy</u>
  - Developed at BYU under the support of Los Alamos National Laboratory (Cibola Flight Experiment)
  - Used to test TMR on many designs
    - Fault injection, Radiation testing, in Orbit
  - Testbed for experimenting with various TMR application techniques













# **BL-TMR Design Flow**



**BL-TMR Design Steps** 

- 1. Component Merging
  - 2. Design Flattening
  - 3. Graph Creation and Analysis
  - 4. IOB Analysis
  - 5. Clock Domain Analysis
  - 6. Instance Removal
  - 7. Feedback Analysis
  - 8. Illegal Crossing identification
  - 9. TMR Prioritization & Selection
  - 10. Voter Selection
  - 11. Instance Triplication
  - 12. Voter Insertion
  - 13. Netlist generation



## **BL-TMR Validation**







# Summary

- Extensive testing of Kintex-7 FPGA
  - Static Cross Section Estimations
    - CRAM, BRAM, Flip-Flops
    - Multi-Bit Upsets (MBU)
  - Single-Event Latch up Testing
- Mitigation Strategy Identified
  - Kintex-7 Scrubber developed and validated
  - BL-TMR for logic mitigation
- Future Work
  - Validation of BL-TMR mitigation approach
  - Testing of Multi-GigaBit Transceivers (MGT)



