SEE Tolerant Standard Cell Based Design While Guaranteeing Specific Distance Between Memory Elements

12 Sept 2017, 08:30
20m
Earth and Marine Sciences (E&MS) building (UCSC)

Earth and Marine Sciences (E&MS) building

UCSC

Earth and Marine Sciences (E&MS) Building
Oral Programmable Logic, Design Tools and Methods Programmable Logic, Design Tools and Methods

Speaker

Sandeep Miryala (Fermi National Accelerator Lab. (US))

Description

Single Event Effects (SEEs) comprising of Single Event Upsets (SEUs) and Single Event Transients (SETs) corrupts the data in storage nodes/registers. Triple Modular redundancy (TMR) with clock delay insertion is a system level technique that counters SEEs in storage nodes. However, such an implementation is not straight forward in standard cell based digital design which uses cad tools like Genus/RC compiler and Innovus for synthesis & Physical design. This paper presents a successful automation methodology that maps the intended registers in the Verilog RTL with triplicated cell during synthesis and guarantees minimum distance between memory elements during placement & routing leading to SEE tolerant standard cell based digital design.

Summary

Single Event Effects (SEEs) are very common in ASICs used for detector electronics due to the ionizing particles from the particle collisions. SEEs comprise of Single Event Upsets (SEUs) and Single Event Transients (SETs) and manifest themselves as bit flips in sequential elements and glitches in combinational gates. A Single Event Upset (SEU) in data path register results in incorrect data packets from the serial links, where as an SEU in global configuration registers makes the chip non-functional. In a joint effort between Atlas/CMS groups for RD53A pixel chip the estimated bit flips due to SEUs in global configuration registers is every ~20s per chip, whereas in pixel registers is ~60 bit flips per second per pixel per chip. Hence SEE tolerant design is unavoidable for RD53A at pixel configuration registers, global configuration registers and data path registers in digital chip bottom.
Traditionally TMR implementation is handled by the RTL designers during hardware description in Verilog/VHDL. The drawback with this approach is while doing synthesis & optimization the tools might remove the intended triplicated redundancy logic; therefore, the top-level designer must carefully verify the TMR presence in the synthesized gate level netlist, which is cumbersome and time taking in large designs. The problem aggravates when the RTL is written by one designer and the digital design flow is carried out by another designer.
Another approach is to synthesize the digital logic and replace the intended register cell with the custom triplicated standard cell. This approach is functional but has drawbacks in that it might cause routing congestion in big chips that have many TMR cells.
In this article, we introduce the automation methodology to implement SEU tolerant digital design using conventional semi-custom design tools adopted in RD53A pixel chip for Atlas/CMS experiments. The methodology is based on Cadence Genus tool for logic synthesis and Innovus for the physical design, which are commonly used tools for digital design in high energy physics community.
We demonstrate the methodology by introducing additional stages/steps during synthesis and physical design. The registers in the Verilog RTL are mapped with triplicated cell during synthesis and additional constraints during place and route guarantees minimum distance between memory elements. This methodology is verified on a simple design but it can be easily scalable to large designs consisting of multi-million standard cells. The TMR mapping is demonstrated in following cases which are quite relevant during digital design:
1. Triplicating the register which has tmr as the instance name in RTL
2. Triplicating all the registers in the RTL
3. Triplicating registers in one of the module in RTL
4. Triplicating the register along with correction mechanism
Through this approach, it is possible to implement TMR in any register of the design, avoids routing congestion as the triplicated flops are spread across the entire core area of the chip and guarantees minimum distance between memory elements to counter multiple bit flips in TMR at any time.

Primary author

Sandeep Miryala (Fermi National Accelerator Lab. (US))

Presentation materials