The FPGA Developers’ Forum (FDF) is a platform to discuss and exchange information, experiences, implementation ideas, tips, and tricks as well as challenges faced with design tools, specific FPGA technologies.
The 1st FDF meeting will take place at CERN on 11-13 June 2024.
We will discuss several FPGA related topics, look at the scientific programme on the sidebar for details.
We are open to new ideas and the list will be adapted depending on your contributions. Have you got something to share with the community?
Submit an abstract now for your presentation through the button below.
There is no registration fee, you’re very welcome to participate in the FDF meeting even if you’re not giving a talk. And remember that FDF is open to anyone, not only CERN users.
The FDF aims to form a topical community of digital designers — especially on FPGAs — working in physics and beyond, and to discuss details that very rarely see the light of day in typical workshops in our field.
We will focus on the ‘how’ digital designs are implemented rather than their scientific end-goal, and novelty is not the only criterion. Sharing tips on how to avoid pitfalls, or other ideas and recommendations that could save your colleagues precious time, are considered equally important.
A CERN account is needed for the submission and registration. If you do not have one, it can be created here.
See you at CERN!
What to do next? Where to go?
Open Source revolutionised software development by promoting collaboration, innovation, and openness. As FPGA designers, you can leverage this approach and share your HDL designs with the world.
In this presentation, we will discuss a step-by-step process to help you open-source your HDL designs effectively. We will begin by addressing the key question of where to host your HDL. We will then provide guidance on selecting an appropriate licence. Essential elements to ensure a smooth collaboration experience include setting up version control, issue management, documentation, verification and testing procedures. After a brief description of these best practices, we will show how adopting them can encourage contributions, and help maintain high quality in your project. Lastly, we will examine the role of your employer in open-sourcing FPGA designs, using CERN as an example and describing how its newly-established Open Source Program Office (OSPO) can help in this process.
Despite being used regularly by all FPGA designers, very few people know how to properly and reliably constrain a clock domain crossing (CDC). Timing constraints are indeed one of the hardest parts of FPGA design. It is an elusive art that is impossible to google and impossible to verify.
In this session we will discuss a few common CDC topologies. Analyze them, discuss some common mistakes, and discover their error modes. Most importantly, we will discuss how to make them robust using timing constraints. We will also discuss how scoped constraint files enable reusable CDC blocks, so the user never has to write a single line of TCL/XDC.
We will introduce the library of generic and reusable CDC blocks available in the open-source hdl-modules project. Peer-reviewed, proven in use, and constructed after thorough discussion and analysis, to give as reliable operation as possible.
In modern Data Acquisition (DAQ) gateware, developers use many basic parts to make custom features. These parts come from vendors or are made by developers themselves. This often leads to a fragmented codebase difficult to test, integrate, and use with different tools. To fix this, a new open-source core library has been made.
This library is a collection of commonly used cores and blocks developed in pure VHDL, which ensures vendor-independence. The functionality of these blocks is verified through a set of self-checking testbenches and, wherever possible, formal verification tests.
This library not only addresses the current challenges but also offers benefits such as improved code maintainability, reduced development time, and enhanced interoperability across various vendor environments
In conclusion, the proposed open-source core library stands as a robust solution to the challenges of gateware development, offering a pathway to more efficient and reliable system implementation.
The CERN control group (in particular the BE-CEM-EDL section, previously BE-CO-HT) is at the origin of the White Rabbit technology. But in addition to this well known project, the section has also developed a set of generic cores (named general-cores), a tool to automatically build project for simulators and synthesizers starting from a python description (hdlmake), as well as a tool to generate HDL, header files and documentation from a register map (Cheby).
I will present how we develop designs using those tools and library.
The development, testing and operation of FPGA algorithms require the implementation
of flexible and efficient real-time monitoring. This can be achieved
via the insertion of dedicated buffers between the logical blocks of the FPGA
firmware. These buffers are implemented in the firmware to spy the dataflow
between the internal blocks (Spybuffers). They must provide configurable size
and are equipped with a playback feature that allows to inject simulated data
into the firmware path. A dedicated control software sets the Spybuffer mode
(monitoring or playback), performs memory readout and analyses the results.
In this talk we discuss the SpyBuffer design for monitoring and playback operations,
the interface of the SpyBuffer with the AXI Chip to Chip interface, as
well as the software layer to control the SpyBuffers and their operating modes.
1
Since 2017 we started R&D on framework development for co-designing (HW/SW) computational systems, targeting mainly FPGAs. The main innovation of the project, named BondMachine (BM), is the creation of a new type of architecture, dynamically adapted to the specific problem. The framework contains a set of tools to manipulate the architectures, spanning from the creation to the simulation and the implementation in terms of HDL code. We also developed the support to enable the creation of BMs staring from high-level languages. To this end a compiler allow to build the BM while compiling the code; an assembler transforms fragments of assembly code into BMs and uses them as building blocks for more complex systems.
This talk will provide an overview of the described framework detailing also how it can be used to put Neural Networks and Quantum Computing simulators on FPGAs.
Website
Github
Timing closure is possibly the most challenging task in the FPGA algorithms design, with the placer quickly becoming the limiting factor at higher frequencies. AMD encourages to do hierarchical placement and turn to gate-level placement as a last resort. I would like to discuss a methodology to do fine-grained hierarchical placement, based on python generation of constraint files, and that allows replicating the layout in different areas of the FPGA. The script takes into account the target FPGA architecture and the resource utilization of each design block, and allows the user to easily place the design to optimize the data flow, with arbitrarily fine-grained detail on the challenging paths, putting focus on design maintainability.
Other solutions to common development problems will be presented, such as a methodology to implement record-to-vector and vector-to-record converters for data storage in RAM, and a means to help with the arbitration of data delivery between related clocks.
Advancements in design automation technologies, such as high-level synthesis (HLS), have raised the input abstraction level and made the design entry process for FPGAs more friendly to software programmers.
In contrast, the backend compilation process for implementing designs on FPGAs is considerably more lengthy compared to software compilation.
While software code compilation may take just a few seconds, FPGA compilation times can often span from several minutes to hours due to the complexity of the underlying toolchain and the ever-growing device capacity.
In this presentation, we provide an overview of the current advancements in fast compilation techniques for FPGAs.
Furthermore, We present a very fast compilation methodology that generates in a matter of seconds placed-and-routed kernel designs for AMD FPGAs.
This approach accelerates the C-to-FPGA implementation process by up to 33x with only 0.9x of degradation in Fmax compared to a conventional implementation flow.
Phase determinism in timing distribution systems is often a requirement in detectors for High Energy Physics. Because of the new goals of high-luminosity, the rate of particle collisions is increasing. To distinguish almost superposed collisions it is required a very accurate timing signal, in the order of a few picoseconds. Commercial components do not met by default this stringent requirement. However, it is possible to find solutions for a cutting-edge phase determinism. This presentation is focussed on the transmitter of the AMD transceiver. Since the transmission is not frame aligned to its reference clock, at each startup the data stream has a random phase delay. The proposed solution consists in configuring a particular clocking architecture in the transceiver IP Core, allowing for monitoring the phase of interest and for implementing a correction, all within the FPGA. The result is a data stream with a fixed phase relation its reference clock, with picosecond-grade precision.
Radio Frequency System-on-Chip (RFSoC) is a new type of device produced by Xilinx AMD which combines SoC (Programmable Logic + Processing System) with wideband and high speed and resolution ADCs and DACs. This makes it a great candidate for data-acquisition systems as well as calibration units for various astroparticle experiments, in particular the ones detecting radio frequency signals. The prospects of utilizing the RFSoC devices for both receivers and transmitters along with their specific configuration strategies will be discussed in this talk.
Firmware design is a major challenge in LHC experiment upgrades, often leading to significant project delays. While non configurable systems were immediately operational, recent experiences show firmware and hardware readiness can take years. This underscores the need for innovative methods to speed up firmware design and deployment. This study utilizes advanced firmware design techniques, like High-Level Synthesis (HLS), for the ATLAS Liquid Argon Calorimeter trigger processor. HLS simplifies the design process by focusing on essential functions rather than intricate hardware details such as clock networks or signal interfaces. This method allows for easy trade-offs between latency and area, essential for optimizing firmware performance. It enhances firmware maintenance, latency, logic area usage, and timing accuracy. The HLS application has the potential to streamline firmware design, reducing project delays, and increasing efficiency in large-scale experiments like the LHC upgrades.
Ultra-high-energy (UHE) neutrinos can be detected via radio antennas installed in polar ice sheets. In this work, we present a trigger system utilizing a convolutional neural network to process the antenna signals. This system can increase the neutrino detection rate by up to a factor of two at negligible additional costs, which would substantially advance UHE neutrino science. The trigger algorithm written in pure VHDL will be implemented in an existing digitizer hardware utilizing a 4-channel 500Msps flash ADC and a Cyclone V FPGA from Intel (Altera). Incoming data are processed in-flight by 45 DSP blocks, delivering trigger with a latency of a few clock cycles, thus meeting the main design requirement of low latency. We also present a relation between the clock speed and the power consumption, another critical factor. Finally, we give an outlook of new hardware development and expected performance gains from increased computing resources of more powerful FPGAs.
Artificial intelligence (AI) is everywhere. Automated image analysis, autonomous driving, industrial inspection, there are many applications today that could benefit from AI. Deep Learning is the most successful solution for image-based object classification, and for most practical applications it requires performant platforms like FPGAs and SoCs.
Designing AI for embedded devices such as FPGAs and SoCs is challenging because of resource constraints, the complexity of programming in Verilog or VHDL, and the hardware expertise needed for prototyping on an FPGA or SoC.
In this presentation I will explain, how to:
- Prototype and deploy Deep Learning-based vision applications using a Deep Learning Processor (DLP).
- Analyze profiling metrics and use compression methods like quantization and pruning to improve performance.
- Optimize the Deep Learning Processor configuration for the chosen AI models.
TTNs are hierarchical tensor structures commonly used for representing many-body quantum systems but can also be applied to ML tasks such as classification or optimization. The algorithmic nature of TTNs makes them easily deployable on FPGAs, which are naturally suitable for concurrent tasks like matrix multiplications. Moreover, the hardware resource limitation can be optimally tuned exploiting the intrinsic properties of said networks. We study the deployment of TTNs in high-frequency real-time applications, showing different classifier implementations on FPGA, and performing inference on synthetic ML datasets for benchmarking. A projection of the needed resources for the HW implementation of a classifier will also be provided by comparing how different degrees of parallelism affect physical resources and latency. The full firmware has been developed in VHDL, exploiting Xilinx IPs for explicit DSP declaration and AXI Stream and AXI Lite communication protocols.
The escalating demand for data processing in particle physics research has spurred the exploration of novel technologies to enhance efficiency and speed of calculations. This study presents the development of a port of MADGRAPH, a widely used tool in particle collision simulations, to FPGA using High-Level Synthesis (HLS).
Experimental evaluation is ongoing, but preliminary assessments suggest a promising enhancement in calculation speed compared to traditional CPU implementations. This potential improvement could enable the execution of more complex simulations within shorter timeframes.
This study describes the complex process of adapting MADGRAPH to FPGA using HLS, focusing on optimizing algorithms for parallel processing. These advancements could enable faster execution of complex simulations, highlighting FPGA's crucial role in advancing particle physics research.
An implementation of an ultra-low latency BDT fully evaluated on an FPGA was introduced for 2024 data taking in the ATLAS experiment with inference latency of 60 ns at 200 MHz and full pipelining. The BDT model is synthesized using Conifer and integrated into existing firmware written in VHDL. I will discuss some technical details on how I transferred HLS-generated code into existing firmware. For example, as the BDT is auto-generated, the signal delays must be adjusted at each retraining of the model, which is not practical to do manually. I'll show how I tackled this issue by implementing a python package to automatically compute sums and delays of signals.
Magnetic Resonance Fingerprinting (MRF) is a fast quantitative MR Imaging technique able to obtain multi-parametric maps with a single acquisition, but data processing is limited by escalating memory and computation needs. Neural Networks (NNs) accelerate reconstruction, but training still requires significant resources. We propose an FPGA-based NN for real-time brain parameter reconstruction from MRF data. After a traditional software validation, the NN is reduced through Quantization Aware Training to meet the available resources of the FPGA hardware accelerator, creating a quantized model that uses lower precision without affecting the NN performance. Training the NN is estimated to take 1000 to 10000 seconds, representing a significant improvement over standard CPU-based training, which can be up to 36 times slower. This approach has the potential to enable real-time brain analysis on mobile devices, potentially revolutionizing clinical decision-making and telemedicine.
Modern experiments in particle physics and astrophysics rely on quantum detectors for superior energy resolutions. These detectors require specialized readout electronics employing frequency division multiplexing. Operational challenges include managing a high number of tones in the transmission lines, which further complicates the FPGA firmware. For instance, the ECHo experiment plans to operate ~12,000 MMCs to study the upper limit of electron neutrino mass. Similarly, BULLKID-DM will employ ~3,000 KIDs to search for dark matter. Room-temperature electronics handle digital synthesis of microwave tones and real-time data processing. A polyphase channelizer (PPC) and digital downconversion (DDC) facilitate sub-band separation and variable tone filtering. This FPGA-based channelization stage is adaptable to various experiments. Methods for modifying PPC and DDC for different detector parameters are also discussed, along with characterization techniques for assessing their performance.
The control of superconducting qubits, central to quantum computing,
demands precise manipulation of fast microwave pulses. FPGAs offer ideal
versatility for this task. However, due to FPGA complexity, institutions
often opt for costly, pre-made solutions, limiting customization.
We therefore presented Qibosoq, an open-source software package designed
for radio frequency system on chip (RFSoC) platforms. Qibosoq bridges
the RFSoC firmware provided by QICK, a Quantum Instrumentation Control
Kit, with Qibo, an open-source quantum computing framework.
By using RFSoC boards, it is possible to significantly lower the cost of
a qubit-control hardware, while also maintaining high development
flexibility.
We present the Qibosoq software package, as well as the result of
testing it with multiple qubits both as a characterization platform and
as an algorithm executor, where we demonstrated the capability of the
RFSoC board to accurately perform Quantum Machine Learning applications.
hls4ml (High Level Synthesis for Machine Learning) is a tool for translating Neural Networks to synthesizable gateware for FPGAs. The tool is Python software that presents a user-friendly interface to achieve efficient Machine Learning inference in hardware. hls4ml interfaces to the main ML training libraries, as well as their extensions targeting quantized NNs. At the backend of hls4ml are HLS implementations of Neural Network operations targeting high performance for latency, throughput, and power usage. HLS workflows of multiple vendors are supported. In this talk we present the status of the project, recent developments, and future plans.
The conifer library is a tool for translating Decision Forests (ensembles of Decision Trees) for latency-optimised inference on FPGAs. Developments to use conifer for trigger selections at the LHC experiments in 2024 are reaching maturity. The tool supports a variety of frontends for the most popular DF training libraries such as xgboost, scikit-learn, and yggdrasil. Multiple FPGA inference implementations are provided: VHDL, Xilinx HLS, and the Forest Processing Unit (FPU). The VHDL and HLS implementations map a given DF directly onto FPGA logic, while the FPU is a reconfigurable design - implemented with HLS - that supports loading and reloading of different DFs with one implementation. After introducing the tool and some applications, this talk will go “under the canopy” to discuss implementation aspects of wider interest, with perspectives on: programming FPGAs using HDL vs HLS; implementing branching algorithms for FPGAs; and implementing configurable designs with HLS.
UVVM is the fastest growing FPGA verification methodology – independent of language. This is due to the improvement UVVM yields in both FPGA quality and development time. This open source Library and Methodology has the most extensive VHDL verification support available and lets you verify complex DUTs in a very efficient manner with great testbench overview. And if you have a really simple DUT, then you just use the simple part of UVVM. UVVM has been significantly updated through several ESA (European Space Agency) UVVM extension projects over the last few years.
UVVM provides a testbench kick start with open source BFMs and verification components for UART, SPI, AXI, AXI-lite, AXI stream, Avalon MM + Stream, I2c, GPIO, SBI, GMII, RGMII, Ethernet, Wishbone, Clock generator, and Error injector.
This presentation will give you a brief introduction to UVVM and also show the most important features and explain how they will help you make a better testbench and develop this much faster.
LoCod (French acronym for “codesign software”) is an open-source hardware/software codesign tool, targeting Zynq UltraScale+ and NanoXplore NG-Ultra systems-on-chip and could be extended to any heterogenous target including FPGA and processor.
From a C language source code, developers can choose, with basic code decoration, which functions of the algorithms should be implemented on the FPGA and which are to run on the CPU. LoCod then automatically performs the code conversion and hardware implementation, as well as the interfaces to transmit data between CPU functions and FPGA functions. It is easy to explore different implementation architectures by moving a function from the CPU to the FPGA (or FPGA to CPU). The presentation will provide technical insights about these steps.
LoCod has been developed by CNES and Viveris Technologies with a mix of in-house developments and existing open-source tools like PandA/Bambu HLS framework.
As the technology advances, FPGA devices become more powerful and enable more complex projects. As a result, developers with diverse backgrounds, including different hardware description languages, are required to work together. This is increasingly challenging since the current implementation tools impose constraints on mixed language designs. One key hindrance is that custom type libraries are not shared between languages, resulting in error prone practices. Another is that only basic signal types can be used between modules of different languages, preventing elaborated custom types. This contribution will describe the YML2HDL, a tool that provides the means to overcome those issues by allowing the description of custom types in a series of centralized YAML files. This is used then to generate libraries for each language, containing also resources to easily convert signals between custom and basic types. It is already used by multiple upgrade projects of the ATLAS Experiment at CERN.
Large FPGA firmware designs, such as the ones used in the trigger systems of
HEP experiments, typically contain many hundreds of configuration/status
registers and memories. Managing the required HDL code and software for these
can become challenging. We therefore developed a dedicated tool, called
HardwareCompiler, which parses an XML description of the registers and memories
and generates the required HDL code as well as C++ access functions used to
configure and monitor the modules. The tool has been successfully applied in
several generations of FPGA-based modules developed for the ATLAS Central
Trigger system, greatly simplifying their development and testing. We present
the capabilities of the HardwareCompiler with examples of generated VHDL
register packages and address decoders as well as low-level C++ software. The
latter is also used to generate wrappers for Python, which simplifies the
development of scripts for configuring and testing the hardware.
Traditionally, assertion-based formal verification is performed after RTL development is complete, by a separate team of verification engineers, to comprehensively prove conformance of a design. While this provides the highest safety guarantees, it is also a lengthy endeavour. But it is not necessary to aim to fully prove everything about a design to take advantage of property checkers' abilities. Instead, they can also be used as "simulation on steroids".
Using the SBY property checker from YosysHQ, this talk will demonstrate some approaches for incorporating formal tools as a debugging aid into the process of RTL development, complementary to simulation. This includes:
- using cover statements to create testbenches
- using properties to confirm invariants that the design relies on
- validating subsystem interactions
- bug hunting with assertions
If you've ever stared at an ILA trace and thought "But how can it get into this state? That's not possible!", this talk is for you.
Verification of digital systems is an art, and often implemented through testbenches and functional verification.
Formal verification is an alternative approach where we describe properties representing the expected behaviour of the system. It allows to prove these properties are fulfilled through assertions. It complements traditional behavioural simulations and allows to detect issues that can be very hard to find with traditional methods, as well as to exhaustively validate properties which would otherwise imply prohibitively-long simulations
In this presentation we will show how we used SymbiYosis, an open source tool, to formally prove the correctness of parts of a space probe FPGA design thanks to assertions written in PSL.
Through some practical examples, easily understandable for FPGA developers, we will illustrate the power of formal verification as well as its complementarity with respect to traditional testbenches.
The mid-range FPGA market currently sees the introduction of new FPGA(-SoC) devices with attractive specs. This presentation highlights three key areas to avoid vendor lock-in by leveraging OSS model-based source code generation. The use case is a tabletop 3D laser scanner, implemented on FPGA(-SoC) devices of all major vendors.
Firstly, in FPGA designs requiring (CPU) host-based control, design flexibility is increased by abstracting away the underlying UART/SPI/AXILite connection at the CPU-FPGA interface. This is achieved by co-generating C++ and VHDL source code for command and data passing. Secondly, IP blocks for SDRAM/MIPI/PCIe/HDMI are vendor specific. Simple parametrized wrappers help achieve vendor-independence for the most-employed interface features. Finally, live probing of FPGA designs via non-OSS protocols is tightly bound to vendor IDE's. Leveraging aforementioned CPU-FPGA interface, integrated RAM and case-specific RTL code generation helps bypass this dependency.
The coordination of firmware development among numerous developers is a major issue in any collaboration.
This requires standardised tools for ensuring binary file traceability and firmware synthesis with Place and Route repeatability.
To address these problems, we present Hog, a free and open-source tool for maintaining HDL on git.
Hog integrates within HDL IDEs (Intel Quartus, MicroSemi Libero, AMD Vivado and ISE) on both Windows and Linux platforms, minimizing overhead labour, and easing the use of advanced git features.
Hog is a set of Tcl/Shell scripts with an appropriate workflow for managing HDL designs in a git repository.
Hog is included as a submodule, a simple method of maintaining HDL code on git requiring no further installation.
This method allows for automatic detection of any change in the source code, embedding the git tag and commits SHA in the bitstream.
Hog exploits the use of the git CI to automatically compile and simulate the project generating tags and releases.