20–24 Sept 2010
Aachen, Germany
Europe/Zurich timezone

Use of Triple Modular Redundancy (TMR) technology in FPGAs for the reduction of faults due to radiation in the readout of the ATLAS Monitored Drift Tube (MDT) chambers

23 Sept 2010, 16:00
2h
Aula

Aula

Poster Programmable Logic, design tools and methods POSTERS Session

Speaker

Mr Markus Fras (Max-Planck-Institut fuer Physik)

Description

The Triple Modular Redundancy (TMR) technology allows protection of the functionality of FPGAs against single event upsets (SEUs). Each logic block is implemented three times with a 2-out-of-3 voter at the output. Thus, the correct logical value is available even if there is an upset bit in one location. We applied TMR to the configuration code of a Virtex-II-2000 FPGA, which serves as the on-chamber readout processor of the ATLAS MDTs. We describe the code implementation, results of performance measurements and discuss several limitations of the method. Finally, we present a supplementary technology called “scrubbing”. It permanently re-writes the configuration memory while the FPGA is operating, correcting upset configuration bits.

Summary

The readout of the ATLAS MDT chambers uses a XILINX Virtex-II-2000 FPGA as the main processor for data transmission, channeling data from up to 432 drift tubes via optical fibers to the Readout Drivers (ROD) in the experimental hall. The FPGA is sitting on the Chamber Service Module (CSM), which in turn is mounted onto the MDT chambers in the ATLAS cavern. The FPGA is thus exposed to a considerable rate of strongly ionizing tracks, in particular in the highest-eta region of the end-caps of the ATLAS muon spectrometer. The resulting SEUs will not only corrupt user data but may also change the firmware code running on the FPGA, which may lead to malfunction of the chip.
A common mitigation scheme for SEUs is Triple Modular Redundancy (TMR). Functional blocks are implemented three times in parallel and fed by the same inputs. A majority voter uses the three independent outputs to determine the correct logic value, even if one on the blocks suffers from an upset. This provides good protection against an SEU in the user data and some protection against an upset in the configuration memory. The drawbacks of TMR are increased usage of logic and routing resources, increased power dissipation, and harder timing closure.
We used the TMRTool software package supplied by the XILINX corporation to apply TMR to the most critical parts of the design. Individual modules can be configured to be triplicated or to be untouched by the TMR process. In addition, the software takes care of some FPGA-specific elements for which SEUs are particularly problematic. In our case the code generated with TMRTool uses about 92% of the FPGA logic resources, whereas the normal usage is about 41%. The TMR’ed firmware was tested in the laboratory and at a cosmic ray test facility. It worked well in both cases.
A supplementary technology, called “scrubbing”, consists in continuously re-writing the configuration memory from a source, which is highly immune to SEUs. The upset configuration bits are thus permanently overwritten by the correct values. While TMR is a good mitigation for a few SEUs, scrubbing continuously corrects wrong bits, preventing an accumulation of SEUs. In Xilinx Virtex-II devices, this can be done while the FPGA is running normally. For the CSM we use a self-hosted scrubbing unit, which is sitting in the same FPGA that it is re-configuring. Benefits and disadvantages of this scheme will be presented.

Primary author

Mr Markus Fras (Max-Planck-Institut fuer Physik)

Co-authors

Mr Bradley Weber (Max-Planck-Institut fuer Physik) Mr Hubert Kroha (Max-Planck-Institut fuer Physik) Mr Olaf Reimann (Max-Planck-Institut fuer Physik) Mr Robert Richter (Max-Planck-Institut fuer Physik)

Presentation materials