Speakers
Description
The ATLAS experiment requires a high-precision bunch clock distribution for the High-Luminosity upgrade of the Large Hadron Collider. A new trigger and timing distribution system based on FPGA transceivers and high-speed serial links will replace the existing one. In preparation for this upgrade, we characterized the clock phase uncertainty of AMD UltraScale+ transceivers after reset. We found the performance of the GTH type to be adequate, and we implemented a workaround to correct the behavior of the GTY type receiver. In addition, we extensively studied the effect of silicon temperature variations on the phase and implemented a compensation algorithm.
Summary (500 words)
The bunch-clock distribution scheme of the ATLAS experiment will be upgraded for the High-Luminosity runs of the Large Hadron Collider (HL-LHC). The clock, embedded into high-speed serial streams, is recovered at each receiving node, re-embedded, and transmitted to the next node, until it reaches the Low-power GigaBit Transceiver front-end ASIC. To cope with the increased pile-up of the HL runs, certain timing detectors in the LHC experiments require a particularly precise clock, with a phase uncertainty significantly smaller than the 100 ps unit-interval (UI) of the data stream.
In the context of the new hardware development for the Central Trigger and timing distribution system (L0CT) we studied the phase dependency on temperature fluctuations and the phase uncertainty after link reset. Our initial tests have shown that the phase stability of the FPGA-to-FPGA links targeted for our back-end does not fit our requirements. Thus, we launched an extensive program of performance tests and studied methods to improve it. Using evaluation kits, custom mezzanine cards and climate chambers, we characterized both available types of multi-gigabit transceivers (MGT), GTH and GTY, of our target FPGA family (AMD UltraScale+).
Temperature fluctuations. The Timing-Compensated Link (TCLink) team has shown successfully in the past the compensation of the phase drifts caused by variations of the temperature of the optical fibers, detected by a Digital Dual-Mixer Time Difference (DDMTD) module in the transmitting side. We show that the typical silicon temperature fluctuations of the transmitting and receiving FPGAs are two other significant contributors, and that the DDMTD in TCLink cannot detect phase shifts due to its own temperature variation. We characterized the phase-drift coefficient of the MGTs by controlling the chamber temperature. We implemented linear algorithms which, fed with the FPGA die temperatures and DDMTD phase readouts, effectively compensate the phase shifts by controlling the interpolator inside the transmitting MGT.
Post-reset uncertainty. While the GTH receiver has an acceptable phase determinism after reset, we observed that the GTY type is significantly worse and may affect certain detectors with tight requirements. The challenge is to detect the sub-UI phase jumps on the receiver without having a stable reference clock to compare against. To solve this problem, we devised a workaround based on a specific feature of the UltraScale+ MGT receivers, the “Run-Length” (i.e. the maximum number of consecutive identical symbols that cause a previously locked link to unlock). We implemented the workaround, which requires simple hardware, firmware and software modifications, and we successfully demonstrated that it fixes the behavior of the GTY MGT, effectively matching the good one of the GTH type.
In conclusion, as the evolution of MGTs targets throughput and latency, the stability of the phase of the recovered clock is a growing concern for new timing detectors. We addressed it developing workable solutions which we successfully tested on evaluation hardware. We also implemented them on the first prototype of the L0CT’s Local Trigger Interface board, which we expect to receive in mid-2024 and use to select the best options for the production hardware.