Speaker
Description
In the High-Performance Computing (HPC) field, fast and reliable interconnects remain pivotal in delivering efficient data access and analytics.
In recent years, several interconnect implementations have been proposed, targeting optimization, reprogrammability and other critical aspects. Custom Network Interface Cards (NIC) have emerged as viable alternatives to commercially available products, which often come with high price tags and limited or no customization options.
In this field, the APEnet project has been and continues to be engaged in developing custom FPGA-based NICs tailored for toroidal interconnection systems dedicated to scientific computing and simulations: leveraging a custom network protocol and being easily portable and reconfigurable, it ensures adaptability across various scientific domains spanning from High Energy Physics to Brain Simulation; it implements a 3D direct torus interconnect, which nested in a multi-tier topology, enables high path diversity, short cabling at low dimension and high efficiency.
In this work, we present the latest advancements for the APEnet NIC, APEnetX, which integrates cutting-edge Xilinx Ultrascale+ technologies with custom hardware and software components to enable Remote Direct Memory Access (RDMA) functionalities targeting both the remote hosts and accelerators such as GPUs. A custom network protocol is used, accompanied by Quality-of-Service (QoS) functionalities, to ensure efficient data transfers between nodes even in the event of critical congestion states. Finally, we developed the necessary libraries to replicate APEnetX in a simulated environment (Omnet++): the emulation of the network at large scale enables us to tailor the architecture for specific scientific applications.