Speaker
Description
The LHC planned two phases of upgrades to improve the instantaneous luminosity. An accompanying upgrade of the readout electronics for the ATLAS detectors is planned to handle the increased trigger rates and readout data bandwidth. Due to high flexibility and short development-cycle, FPGA-based systems are increasingly popular within the high-energy physics community. We present here a multi-layer SEU mitigation scheme to strengthen the system against the harsh radiation environment experienced by the electronics within ATLAS. The scheme is quite general, and therefore may benefit experiments and applications beyond ATLAS and high-energy physics. Both design and experimental results will be discussed.
Summary
Within the next decade, the Large Hadron Collider (LHC) has several planned upgrade stages to increase the provided luminosity to the associated LHC experiments, such as ATLAS. Coping with the increased luminosity is a great challenge to the ATLAS electronics system and will require substantial upgrades. In comparison to an ASIC-based system, FPGA-based systems have reduced cost, shorter development cycles, and the flexibility of updating application circuits at any time through reconfiguration. Several typical successful FPGA based systems has been implemented within detector systems such as the ALICE TPC and ATLAS MDT. Unfortunately, SRAM-based FPGAs have a high sensitivity to radiation and single-event effects (SEE) occur via buildup of charge through interaction with the radiation field. In order to develop an FPGA design that is reliable, these affects must be properly addressed to mitigate the system link failure rate to an acceptable level for the experiment.
In this paper, we concentrate on a multi-layer SEU mitigation strategy to accommodate functionalities for two Xilinx-based boards that utilize low cost FPGA’s which have been quailed for total ionizing dose (TID) resistance according to the ATLAS Radiation Test Criteria (RTC). This strategy consists of a scheme with three layers of recovery to protect the FPGA design at different levels of granularity. For the first layer (Layer 1 - i.e. logic cell level), we applied triple modular redundancy (TMR) and soft error mitigation (SEM) controller. This can handle single bit upsets easily. However, multi-bit upsets (MBU) were observed in previous testing which may break FPGA embedded error correction codes and TMR. A second layer (Layer 2) of protection was introduced, which utilizes multi-boot auto reconfiguration to scrub the entire firmware, utilizing on board flash memory at the device level for re-configuration. Finally, there is a non-zero possibility that radiation induced errors can occur in the FPGA device control elements. A third layer (Layer 3), which provides board-wide initialization. On our customized prototype board, we have a power disenable function controlled by a configuration board which simulates the ATLAS downlink from the back-end control. This allows the prototype board to be power cycled. Finally, a remote JTAG scrubbing channel off-detector is also available to re-initialize on board flash memory when needed.
We evaluated this scheme on an Artix7 FPGA for our customized prototype at two different neutron facilities, LANSCE and NCSR “Demokritos”, with 800MeV and 25MeV beam energy, respectively. The test environment and setup is very similar to that of the final detector application. The results are analyzed to produce a prediction for the data loss rate after 10 years of the detector operation. Further, a relationship is also observed between the SEU cross-section and beam energy. No permanent damage was observed on any of prototype DUT boards in both facilities.