Speaker
Description
The FELIX system, initially deployed for ATLAS in LHC Run 3, will evolve for Run 4, serving all subdetectors. The system will consist of 350 servers with new custom PCIe FELIX cards and 200 GbE interfaces, handling data at 1 MHz readout rate for a 4.6 TB/s throughput. The new PCIe cards, featuring an AMD Versal Premium FPGA/SoC and advanced connectivity, run upgraded firmware to decode data and manage timing and control information. Preliminary reviews in 2022 and 2024 confirmed the validity of the firmware and card design. Server and software upgrades will also facilitate increased data and trigger rates.
Summary (500 words)
The FELIX system, introduced for ATLAS in LHC Run 3, is set to undergo significant evolution for Run 4. This evolution, dubbed the Phase-II upgrade, will expand to serve all ATLAS subdetectors and will feature about 350 servers equipped with custom PCIe FELIX cards and dual port 200 GbE network interfaces. This upgraded system will handle detector data at a readout rate of 1 MHz, resulting in a total throughput of 4.6 TB/s on the high performance network.
The latest design of the new FELIX PCIe card for Run 4, named FLX-155, with an AMD Versal Premium VP1552 FPGA/SoC at its heart comes with improved specifications, such as a PCIe Gen5x16 interface, four optical links for Timing, Trigger, and Control, 48 optical links operating at speeds up to 25 Gb/s to interface with front-end electronics and an additional 100 Gigabit Ethernet connection. The FLX-155 is a follow-up of the FLX-182 card, which is equipped with an AMD Versal Prime VM1802 FPGA/SoC, a PCIe Gen4x16 interface and 24 optical links with speeds up to 25 Gb/s. Both the FLX-155 and the FLX-182 have undergone a preliminary hardware design review in January 2024, and are foreseen to be used in the FELIX Phase-II upgrade. The first prototype of the FLX-182 was delivered in January 2023, the first prototype for the FLX-155 will be delivered in Q3 2024.
The FLX-155 and the FLX-182 will run redesigned and upgraded FELIX firmware, which has undergone a preliminary design review in January 2022. The firmware's primary objectives include decoding and transferring data from front-ends to server memory, as well as managing precise timing, trigger, and control information distribution.
New constraints on timing distribution in the order of picoseconds introduce new challenges. Most notably, accurately re-determining the phase of the clock recovered by the high-speed GTYe5 deserializers of the new AMD Versal technology after each reset is a very challenging. Methods have been designed both in hardware and firmware, allowing FELIX to guarantee a clock distribution with a reproducible phase within the requirements.
To complement PCIe card and also server hardware upgrades, the software running on the FELIX server will also be enhanced to cope with increased data and trigger rates, ensuring smooth operation during Run 4 of the ATLAS experiment.