Speaker
Description
The Low-power Gigabit Transceiver (lpGBT) is a radiation-tolerant ASIC used in high-energy physics experiments for multipurpose high-speed bidirectional serial links. In 2023, almost 200,000 lpGBTs V1 were tested with a production test system that exercises the entire ASIC functionality to ensure its correct operation. Furthermore, qualification tests (Total Ionizing Dose, Single-Event Upsets…) were done for a dozen of lpGBTs. Despite the thorough production and qualification tests, a design issue named “stuck at power-up” was discovered, affecting a maximum of 0.9 % of delivered devices. The tools developed for the characterisation of this behaviour and the results obtained are given.
Summary (500 words)
The Low-power Gigabit Transceiver (lpGBT) is a radiation-tolerant ASIC that is used to implement multipurpose high speed bidirectional optical links for high-energy physics experiments and in particular for the HL-LHC upgrades of ATLAS and CMS. It provides a single bidirectional link to be used simultaneously for data readout, trigger data, timing and experiment control and monitoring.
Almost 200,000 chips (version 1) have been tested, which represents an unprecedented production volume of ASICs for the HEP community. During the production testing, for each chip, all I/O connectivity and internal functions were tested at both ambient and cold (-30° C). Furthermore, qualifications tests were done for a dozen of chips to test the behaviour of the lpGBT under radiation effects as Total Ionising Dose (TID) and Single-Event Upsets (SEU) as well as other specific use-cases.
Despite all these testing campaigns, an issue affected a minority of the devices. The problem was reported by the ATLAS ITk team during the testing campaign of the EoS Board, where one of the devices of the hundred already tested occasionally did not respond after power-on at low temperature. This did not happen systematically, but when the chip was unresponsive, only a power cycle could sometimes get the chip out of this state. This issue was referred to as “stuck at power-up”.
After an in-depth study of the troublesome lpGBT, first housed in its original EoS board and then in an VLDB+, it was discovered that the flip-flops of the register controlling one of the lpGBT features could inappropriately disable the clock signal propagation in the core logic at power-on. The occurrence rate among all lpGBTs was statistically estimated to be 0.9 % at most and required the submission of a new version of the chip (version 2) to fix it.
To know the real impact of this problem, a characterisation campaign was arranged. On one side, flip-flop matrices of the same technology as the lpGBT (65 nm) were used to assess their variability depending on the environmental conditions such as temperature, rise times, TID and ageing. On the other side, a PCB housing 64 lpGBTs was designed to characterise real chips at many different rise times and temperatures while detecting lpGBTs being stuck. This presentation focusses on the latter characterisation strategy.
The “lpGBT 8x8 PCB” hosts 64 lpGBTs with independent I2C communication lines for each lpGBT and four power sources feeding each block of 16 chips. The lpGBTs are controlled by two Ethernet-to-I2C boxes and fed by a power supply with rise time control for which a minimum rise time of 40 μs can be achieved. Moreover, with this board each lpGBT can be removed independently from the power input in case of possible shorts that could happen during assembly. Characterisation of 640 lpGBTs V1 and 640 lpGBTs V2 was performed.
Results and conclusions from this large-scale test programme gave additional information for the different users in the HL-LHC community and provided confidence that the issue was fixed in the lpGBT V2.