



### Intermodular Configuration Scrubbing of On-detector FPGAs for the ARICH at Belle II

R. Giordano<sup>1</sup>, Y. Lai<sup>2</sup>, S. Korpar<sup>3,4</sup>, R. Pestotnik<sup>4</sup>, A. Lozar<sup>4</sup>, L. Šantelj<sup>4,5</sup>, M. Shoji<sup>2</sup>, and S. Nishida<sup>2</sup>

<sup>1</sup>Università degli Studi di Napoli "Federico II" and INFN Sezione di Napoli, 80126, Italy <sup>2</sup>High Energy Accelerator Research Organization, 305-0801, Japan <sup>3</sup>University of Maribor, 2000, Slovenia <sup>4</sup>Jozef Stefan Institute, 1000, Slovenia <sup>5</sup>University of Ljubljana 1000, Slovenia

Presenter email: rgiordano@na.infn.it

#### Outline



- SuperKEKB, Belle II and ARICH
- Single event upsets in ARICH front-end boards
- The Configuration Consistency Corrector
- Irradiation test results
- Conclusions

#### SuperKEKB and Belle II





Super

KEKB

LER HER 2013/July/29 unit GeV 7.007 4.000 3.6 2.6 A 2,500 Number of bunches 1.04 1.44 mA Bunch Current 3,016.315 m



#### SuperKEKB e<sup>+</sup>e<sup>-</sup> B factory @ KEK (Tsukuba, Japan) Design parameters

- Target L=  $8 \times 10^{35}$  cm<sup>-2</sup> s<sup>-1</sup>
- LER 4 GeV (e<sup>+</sup>), HER 7 GeV (e<sup>-</sup>)
- Belle2 detector at beam collision point
- Physics beyond the Standard Model at the intensity frontier
  - CKM matrix elements, CPV studies, rare B,D,  $\tau$  decays and more

## Aerogel Ring Imaging CHerenkov Counter





- Particle Identification in end cap
- K/π separation > 4σ in momentum range 1-3.5 GeV/c
- Requirements
  - operation in 1.5T magnetic field
  - limited available space ~280 mm
  - radiation hardness
    - 1MeV eq. neutron fluence: 10<sup>12</sup> n/cm<sup>2</sup>
    - Total ionizing dose: 1 kGy





- Proximity focusing aerogel RICH
  - <n> ≈ 1.05
  - θc(π) ≈ 307 mrad@ 3.5 GeV/c
  - $\theta c(\pi) \theta c(K) = 30 \text{ mrad}@ 3.5 \text{ GeV/c}$
  - pion threshold 0.44 GeV/c, kaon threshold 1.54 GeV/c



#### Photon Detector & Readout Electronics





#### Configuration SEUs in FEB FPGAs

- Spartan-6 devices use Boron as p-type dopant
- B<sup>10</sup> (20%) has a high σ for thermal neutron capture => single event upsets (SEUs) in configuration SRAM

 $\begin{array}{c} {}^{10}\text{B+n} \rightarrow {}^{7}\text{Li} + \alpha + \gamma (94\%) \\ \\ \rightarrow {}^{7}\text{Li} + \alpha \qquad (6\%) \end{array} \end{array} \right] - 3.5 \text{ kbarns}$ 

- Previous irradiation tests at the TRIGA reactor of Jožef Stefan Institute (Ljubljana, Slovenia)
  - 250 kW research reactor from General Atomics
  - 10<sup>7</sup> n/(cm<sup>2</sup>·s) in dry room
  - neutron spectrum similar to Belle II spectrometer
  - extrapolation at Belle II: 8 SEU/h per board, or 3.3 kSEU/h overall
- In October 2019 runs, nearly 5% of front-end FPGAs were affected by configuration SEUs in 24 hours

Dry room for irradiation

**TRIGA** layout



![](_page_5_Picture_14.jpeg)

![](_page_5_Figure_15.jpeg)

#### Repairing FEB Configuration On-the-fly

• Star read-out topology

Idea

- FEB FPGAs are programmed with the same bitstream => redundancy at system-level
  - Parallel readback of FEB (Spartan-6) configuration from Merger (Virtex-5)
  - Real-time 4-out-of-6 bitwise majority voting on JTAG streams (TDOs) for error detection
  - Quick single frame reconfiguration for error correction

![](_page_6_Figure_8.jpeg)

![](_page_6_Picture_9.jpeg)

7

### The Configuration Consistency Corrector - C<sup>3</sup>

![](_page_7_Figure_1.jpeg)

- No memory needed for golden bitstream and no a priori limit on # of bitflips per frame that can be repaired
- Xilinx Soft Error Mitigation (SEM) controller in Spartan-6 is limited at 1 bitflip per frame

- Features
  - Majority voting configuration of up to 6 FPGAs streams
  - built around Xilinx PicoBlaze6 processor
  - runs at 127 MHz (Belle2Link clock in Merger)
  - 3.3s scrubbing period
  - ~1 ms single frame repair time
- 6 JTAG ports, two IO modes
  - 1. Single-port Read/Write (used for configuration repair)
  - 2. All ports Voted Read / Broadcast Write (used for readback)
- BRAMs store
  - Frame buffers (260x8b)
  - Target FPGA frame addressing device-specific information (1252x8b)
  - uP Program (4096x18b)
- 16-bit upset counter for each target FPGA
- UART or JTAG IO for debug/control

Architecture derived from

R. Giordano et al., "Configuration Self-Repair in Xilinx FPGAs," doi: <u>10.1109/TNS.2018.2868992</u>

R. Giordano et al., "Custom Scrubbing for Robust Configuration Hardening in Xilinx FPGAs," doi: 10.3390/instruments3040056

## The Configuration Consistency Corrector – C<sup>3</sup> (2)

![](_page_8_Figure_1.jpeg)

- Triple Modular Redundancy for logic and scrubbing for BRAMs and scratchpad
- Periodic reset of uP for internal registers cleanup
- Runs in background, no disruption of user design implemented in FPGA
- UART for scrubber control and logging of upsets details

#### Scrubbing Logs

![](_page_9_Picture_1.jpeg)

![](_page_9_Figure_2.jpeg)

detection time stamp (unix time hex)

- For each upset, the C<sup>3</sup> sends a text line on UART with
  - unix time stamp, FPGA no., frame address, bit offset(s), polarities
- Very useful for testing and debugging, but the same info could be used to study correlations with of upsets to the radiation environment or to reset FEBs only when essential bits are hit

#### C<sup>3</sup> firmware standalone

![](_page_10_Picture_1.jpeg)

#### Implementation

![](_page_10_Picture_3.jpeg)

- C<sup>3</sup> has a small logic footprint
- In V5LX50T just 828 slices (11%) and 9 BRAMs (15%)

#### C<sup>3</sup> firmware standalone

| Logic Resources | Used  | Available | %  |
|-----------------|-------|-----------|----|
| Slices: FFs     | 1,068 | 28,800    | 3  |
| Slices: LUTs    | 2,005 | 28,800    | 6  |
| Slices: overall | 828   | 7,200     | 11 |
| BUFGs           | 3     | 32        | 9  |
| BRAM 36k        | 9     | 60        | 15 |
| BSCAN           | 1     | 4         | 25 |

#### Implementation (2)

![](_page_11_Picture_1.jpeg)

#### Readout + C<sup>3</sup> firmware

![](_page_11_Picture_3.jpeg)

#### Readout + C<sup>3</sup> firmware

| Logic Resources | Used   | Available | %  |
|-----------------|--------|-----------|----|
| Slices: FFs     | 14,932 | 28,800    | 51 |
| Slices: LUTs    | 16,159 | 28,800    | 56 |
| Slices: overall | 5,977  | 7,200     | 83 |
| BUFGs           | 10     | 32        | 31 |
| BRAM 36k        | 35     | 60        | 58 |
| BSCAN           | 1      | 4         | 25 |

- Implementation of Merger firmware w/ C<sup>3</sup>
- Fits V5LX50T resource availability
  - Slices at 80%, BRAMs at 58%

![](_page_12_Figure_0.jpeg)

### Dry Chamber

![](_page_13_Picture_1.jpeg)

![](_page_13_Figure_2.jpeg)

![](_page_13_Figure_3.jpeg)

![](_page_13_Picture_4.jpeg)

6 FEBs: 2 layers Bottom 4, top 2

Merger

![](_page_13_Picture_7.jpeg)

/90°

- trays
- Prepared two chained trays: one w/ Merger & one w/ 6 FEBs
- Sledge for sliding DUTs in and out irradiation channel for quick irradiation start/stop
- Reactor always on during test

Sledge

![](_page_14_Picture_0.jpeg)

#### **Test Results: Cross Sections**

• 29 runs, total irradiation time 14 hours, on average 29 minutes per run

![](_page_14_Figure_3.jpeg)

![](_page_15_Picture_0.jpeg)

#### Impact on Readout: C<sup>3</sup> Vs SEM

- Failure defined as readout interrupted or data corrupted
- Two sets of runs
  - A single C3 implemented in Merger
  - A SEM implemented in each FEB (total of 6 SEMs)

| FEB #0              | Single C <sup>3</sup>      | FEB #0           | 6 SEMs   |
|---------------------|----------------------------|------------------|----------|
| FEB #1<br>Spartan-6 | Merger                     | FEB #1           | Merger   |
| FEB #2              | C <sup>3</sup><br>Virtex-5 | S6 SEM           | Virtex-5 |
| Spartan-6<br>FEB #4 |                            | S6 SEM<br>FEB #4 |          |
| Spartan-6<br>FEB #5 |                            | S6 SEM<br>FEB #5 |          |
| Spartan-6           |                            | S6 SEM           |          |

| Test summary                     | C <sup>3</sup>      | SEMs               |
|----------------------------------|---------------------|--------------------|
| ·                                | in Merger           | in FEBs            |
| # of runs w/ readout testing     | 13                  | 11                 |
| Test time (h)                    | 8.0                 | 4.8                |
| # of read out failures           | 13                  | 10                 |
| Average upset rate per FEB (1/s) | 1.26                | 1.26               |
| Readout MTBF (s)                 | $2.2 \cdot 10^{3}$  | $1.7 \cdot 10^{3}$ |
| Readout MTBF (upsets)            | 2.8·10 <sup>3</sup> | $2.2 \cdot 10^{3}$ |

![](_page_15_Picture_8.jpeg)

30% improvement moving from SEM to C<sup>3</sup>

# ROAL

### Upset Correction Capability: C<sup>3</sup> vs SEM

![](_page_16_Figure_2.jpeg)

- Residual upsets in FEBs at the end of the run
- SEM lets upsets accumulate over time
- C<sup>3</sup> avoids accumulation
  - Small amount residual related to stop (or failure) of C<sup>3</sup> at the end of the run before verify

![](_page_16_Figure_7.jpeg)

![](_page_16_Figure_8.jpeg)

![](_page_17_Picture_0.jpeg)

#### Number of Upsets per Frame

![](_page_17_Figure_2.jpeg)

- Distribution of the number of bitflips per frame (multiplicity) in each SEU event detected by C3
- Average multiplicity 2.24 upsets per SEU event
- 65% of events have multiplicity > 1 (not correctable by SEM)
- Total events 165k
- Includes also few tens of events w/ up to 256 flips, likely configuration SEFIs

## ROAL

#### Integration in Belle II

![](_page_18_Figure_2.jpeg)

- C<sup>3</sup> fully integrated and running in Belle II TDAQ since the middle of 2020 spring run
- SEUs monitored via EPICS slow control system
  - Detected SEU map and SEU trends related to last two weeks of 2020 spring run
  - Up to 20 SEUs per FEB group
- FEB firmware is now robust against SEUs, in the view of SEU rate increase with the foreseen SuperKEKB luminosity increase (2·10<sup>34</sup> -> 8·10<sup>35</sup> cm<sup>-2</sup> s<sup>-1</sup>)

![](_page_18_Figure_8.jpeg)

![](_page_19_Picture_0.jpeg)

#### Conclusions and...

- Developed a scrubber (C<sup>3</sup>) to majority vote configuration across FPGAs connected in a star topology
- Fast detection by means of parallel readback and correction by partial reconfiguration
- Completed a radiation test at a nuclear reactor
- Results show
  - $-\sigma$  of upsets in Merger almost two order of magnitude lower than in FEBs
  - $-\sigma$  of failures in C<sup>3</sup> almost four orders of magnitude lower than upset  $\sigma$  in FEBs
  - C<sup>3</sup> limits accumulation of upsets in configuration memory and improves MTBF of data read out w.r.t. Xilinx SEM by 30%
  - No hard failures of Merger (Virtex-5) or FEBs (Spartan-6)
- System installed and fully operational in Belle II

![](_page_20_Picture_0.jpeg)

#### ...Acknowledgments

- We wish to thank
  - A. Boiano, A. Vanzanella, A. Pandalone, E. Masone from SER (Electronics and Detectors Service) of INFN Napoli for their technical support to this activity

![](_page_20_Picture_4.jpeg)

– JSI TRIGA staff for their technical support during the irradiation test

![](_page_20_Picture_6.jpeg)