1–6 Oct 2023
Geremeas, Sardinia, Italy
Europe/Zurich timezone

Fault Tolerance Evaluation Study of a RISC-V Microprocessor for HEP Applications

3 Oct 2023, 15:40
20m
Sirocco Room

Sirocco Room

Oral Programmable Logic, Design and Verification Tools and Methods Programmable Logic, Design and Verification Tools and Methods

Speaker

Alexander Walsemann

Description

The use of a radiation-hard microprocessor or the application of a System-on-Chip (SoC) design methodology has a considerable beneficial impact on the future design of ASICs within the HEP community. The STRV (SEU-tolerant-RISC-V) is a Triple Modular Redundancy (TMR) protected RSIC-V microprocessor designed to withstand Single Event Effects (SEE) and operate close to a beamline or interaction point.
The results of evaluation studies on the impact of SEEs on the reliability of a RISC-V microprocessor-based system are presented. These evaluation studies include information derived from extended fault injection simulation and heavy ion testing of STRV-R1 samples.

Summary (500 words)

The STRV-R1 is a RISC-V microprocessor designed for the harsh radiation environment found in locations close to the beamline or interaction point, such as the ATLAS and CMS detectors installed at the Large Hadron Collider (LHC). Therefore, the microprocessor must tolerate a total ionizing dose (TID) of 1 GRad and a high single event effect (SEE) rate due to a particle flux of up to 1.5 GHz/cm2, which is not possible with current unprotected RISC-V implementations.
The use of a microprocessor, as opposed to the custom digital logic used in current ASICs, is intended to allow the implementation of reprogrammable functions and algorithms, thus enabling flexible and reconfigurable embedded systems. The applications of a RISC-V microprocessor could range from a low-performance control and monitoring of on-detector electronics to high-performance variants intended for data processing of physical events. The integration of a pre-existing verified RISC-V core would enable a SoC design flow that reduces the required design and verification effort, since compared to a fully customized ASIC, only a small portion of the code is customized for the target application, while the rest is handled by the RISC-V core and other pre-existing IP blocks connected to an interconnect bus. The result would be faster development turnarounds and better reusability across multiple designs and applications.
The STRV-R1 has a RISC-V core that implements the RV32I ISA variant and has a three-stage pipeline. An ISA extension for accelerated multiplication and division is also integrated. The system is designed to run at a base clock frequency of 50 MHz and has 32 kB of available SRAM for data and instructions. While the STRV-R1 is intended as a reference platform for evaluation, its specifications are designed to meet the requirements of control and monitoring systems for detector electronics. The STRV-R1 uses Triple Modular Redundancy (TMR) to mitigate errors caused by SEE. The STRV-R1 incorporates a self-refresh algorithm that periodically refreshes the instructions and data stored in the SRAM content independently of the RISC-V core to prevent the critical accumulation of SEUs. TID effects are targeted through the use of 65nm process technology in combination with thin gate oxide transistors.

This contribution will present the results of evaluation studies on the fault tolerance against SEEs achievable with a TMR-based protection scheme. The evaluation studies are performed using a simulation-based fault injection framework to replicate the behavior of different types of single-event effects to trigger functional interrupts in the RISC-V core. The evaluation studies will provide an understanding of the major contributors to functional interrupts and the susceptibility to failure of the primary components of a microprocessor system, such as the pipeline and memories.
Heavy ion irradiation was performed on samples from the STRV-R1. The results of the heavy ion tests will be correlated and compared with the information derived from the simulation-based evaluation of the impact of single event effects.

Author

Co-authors

Michael Karagounis (Fachhochschule Dortmund Univ. of Applied Sciences and Arts (DE)) Mr Alexander Stanitzki (Fraunhofer IMS) Mr Dietmar Tutsch (University of Wuppertal)

Presentation materials