Speaker
Description
The FELIX system is used as an interface between front-end electronics and commodity hardware in the server farm. FELIX is using RDMA through RoCE to transmit data from its host servers to the Software Readout Driver using off-the-shelf networking equipment. RDMA communication is implemented using software on both end of the links. Exploring opportunities to improve data throughput as part of the High Luminosity LHC upgrade, an implementation for RDMA support in the front-end FELIX FPGA is being developed. We present a proof-of-concept RDMA FPGA implementation, which will help inform the design of the FELIX platform for High Luminosity LHC.
Summary (500 words)
The FELIX (Front-End Link eXchange) system is used for interfacing the front-end detector electronics to the readout system and the high-level trigger farm. The system is based on a custom FPGA board which receives data from the front-end detector electronics via optical links and outputs data via a PCIe interface to a host computer which manages processing and relaying the data further to the readout system. The host computer uses the RDMA (Remote Direct Memory Access) support offered by network interface cards with RoCE (RDMA over Converged Ethernet) support to transmit data further towards the readout systems over an Ethernet network.
In the context of the High Luminosity LHC upgrade, the FELIX board needs to be able to handle a data throughput of 200Gbps, while the FELIX host, integrating two FELIX boards, will have to handle 400Gbps. At these data rates, the transfer operations over PCIe and the required local host processing may become a serious bottleneck. The current FELX board can handle a maximum theoretical throughput of 128Gbps, which is the maximum bandwidth possible with PCIe 3 x16. A possible solution is the implementation of RDMA support in the FELIX FPGA itself. This would mean not using the PCIe interface and the host computer anymore, thus simplifying the data path from front-end detector electronics to the readout system. Not using the host computer anymore, apart from avoiding the limitations of the PCIe 3 bus, would mean improved throughput and latency because data would not be moved anymore from the FELIX board to the host computer but processed directly on the FPGA before being sent out downstream. Moreover, this drastically reduces the performance requirements of the host system, potentially reducing cost.
The challenge is in exploring the available options and finding the most suitable way of implementing a complex protocol such as RDMA inside the FELIX FPGA. An open source RDMA HLS core has been used as a starting point, which has been modified and expanded in order to make it compatible with existing off-the-shelf networking equipment and integreate our application- specific functionality.
Additionally, software had to be developed to make it possible to set-up connections and exhange data between the RDMA FPGA implementation and a RDMA receiver/server, as well as perform performance tests.
Development of this RDMA FPGA implementation is done on a Xilinx VCU128-Evaluation kit (Virtex Ultrascale+) which integrates on-board DRAM memory and integrated HBM memory. Separately, an Xilinx Alveo system is used as an alternative development platform, particularly taking advantage of the HLS acceleration tools provided by Xilinx. Two PC hosts, with Mellanox
ConnectX-4 and ConnectX-5 cards that support Infinband and RoCEv2 are used toghether with a RDMA-enabled 100Gb Ethernet switch. The main goal is to achieve a 100 Gbps data rate per port, which is the maximum bandwith supported by the available Mellanox cards. In parallel, a Xilinx VCU129 development board with integrated 56Gbps PAM-4 serializers is used to explore the possibility of implementing 400Gbps Ethernet links out of the FPGA.