Speakers
Description
Authors:
Gustavo Alonso, Maximilian Jakob Heer, Benjamin Ramhorst
As Moore’s Law and Dennard Scaling reach their limits, computing is shifting toward heterogeneous hardware for large-scale data processing. Cloud vendors are deploying accelerators, like GPUs, DPUs, and FPGAs, to meet growing computational demands of ML and big data.
While FPGAs offer great flexibility and performance, practically integrating them in larger systems remains challenging due to the long development cycles and expertise required. To address this, we introduce Coyote v2, an open-source FPGA shell with high-level, OS-like abstractions. Broadly speaking, Coyote v2 strives to simplify the application deployment process and enable developers to solely focus on their application logic and its performance, rather than infrastructure development. By providing clear and simple-to-use interfaces in both hardware and software, Coyote v2 allows everyone to leverage the mentioned abstractions for customized acceleration offloads and build distributed and heterogeneous computer systems, consisting of many FPGAs, GPUs and CPUs. Coyote v2 has been re-engineered for flexibility of use as the basis platform for multi-tenant accelerators, SmartNICs, and near-memory accelerators.
This tutorial will cover Coyote v2's vFPGAs, which enable users to seamlessly deploy arbitrary applications on FPGAs, the built-in networking stacks for distributed applications and, finally, the shared virtual memory model, enabling FPGA interaction with other hardware (CPU, GPU, storage). Additionally, we will showcase Coyote's high-level software API, which enables easy, yet high performance, interaction from C++ with the FPGA. Finally, we will showcase Coyote's integration with hls4ml, performing inference on a PCIe-attached FPGA from a few lines of Python.