Speaker
Description
Modern DAQ systems typically use the FPGA-based PCIe cards to concentrate and deliver the data to a computer used as an entry node of the data processing network.
This paper presents a QEMU-based methodology for the co-development of the FPGA-based hardware part, the Linux kernel driver, and the data receiving application. That approach enables quick verification of the FPGA firmware architecture, organization of control registers, the functionality of the driver, and the user-space application.
The developed design may be tested in different emulated architectures with a changeable type of CPU, IOMMU, size of memory, and the number of DAQ cards.
Summary (500 words)
The data acquisition chain for the BM@N experiment is planned to use the standard commercial PCIe board equipped with Virtex 7 FPGA.
Such a flexible solution allows developing a highly optimized FPGA-based data concentrator and DMA engine delivering the data to the computer working as an entry note of the data acquisition and processing network. However, the joint development of the FPGA firmware, the associated Linux kernel driver, and the user-space application responsible for data reception and processing is usually an iterative process with a long modification and testing cycle.
Modification and resynthesizing of the FPGA firmware requires significant effort and time. Bugs in the bus-mastering DMA engine or the kernel driver may result in system crashes or even in filesystem corruption.
Additionally, testing the driver and the whole system with multiple boards may be limited by hardware availability.
The QEMU emulator offers efficient emulation of the PCIe-capable computers and may be easily extended with the models of user-defined hardware written in C.
The developed methodology offers a possibility to simulate the data concentrator working either with the data generator included in the model or with the data delivered from external applications (e.g., the database of archived signals or the detector simulator) to the QEMU using the ZeroMQ protocol.
With that approach, multiple developers may work simultaneously developing the device model, driver, and application, testing them on their computers. The development may be started before the hardware is available and may even help in selecting the FPGA platform.
The organization of the delivered data in the host memory may be quickly tested and modified.
The proposed methodology was successfully used to develop a complete system consisting of the simple bus-mastering DMA engine, kernel driver, and data receiving application.
The system uses hugepages-backed buffers allocated by the target data-processing application, allowing zero-copy implementation of data delivery.
Due to the use of hugepages, no boot-time reservation of the memory or CMA-enabled kernel is required. The emulated machine uses a standard Debian/testing Linux system. The boot time of the emulated system was below 20 seconds (on a host with Intel i7-4790 CPU), enabling very quick crash recovery.
Recompilation of each component - the QEMU model of the device, the driver, and the application consumed less than 1 minute, which enabled a quick development cycle.
The created DAQ system was successfully tested on a virtual machine with 16 GB of RAM (the host had 32 GB of RAM) with eight simulated DAQ boards. Each board used a 1 GB data buffer consisting of 512 hugepages with a size 2 MB each.
The data were delivered with ZeroMQ via a local TCP/IP socket from the local data-generating applications.
The results suggest that the proposed methodology may be a valuable tool in developing the new FPGA-based DAQ firmware.
The C-implemented model must be translated into the HDL implementation suitable for synthesis. Further research is needed to investigate if that process may be automatized using the HLS technology.