Speaker
Description
As Moore’s Law comes to an end, domain-specific architectures (DSA) are considered the next direction for performance improvements in compute. Unfortunately, the development environment of DSAs falls short in comparison to that of general-purpose architectures (e.g., CPUs). The transition from general-purpose to DSA is hindered by the fact that software engineers lack the knowledge to transition to hardware design. Even if they do, the tool landscape from which they must choose is highly partitioned with multiple competing technologies for tasks such as simulation, verification, synthesis, etc.
At CERN, not only software engineers are interested in hardware designs, but also physicists that must develop latency-critical triggers that filter interesting particle interactions or collisions. Such triggers can be implemented as machine learning (ML) models performing classification. To aid in this endeavour, hls4ml is an open-source project that unifies multiple high-level synthesis (HLS) toolkits and ML libraries. Internally, a ML library is chosen as the frontend, then it is translated into the hls4ml intermediate representation (IR), and finally is written as input for an HLS backend that synthesizes the result with the generated hardware description language.
Externally, it allows physicists or software engineers to directly translate their ML models to field-programmable gate array (FPGA) designs. Having multiple HLS toolkits acting as backends extends vendor compatibility while reducing the complexity of experimenting and finding the best suitable backend. By using hls4ml, most of the complexities of hardware design are abstracted away.
With this poster, we present the implementation of the XLS backend in the hls4ml project. XLS is an open source HLS toolkit from Google that is suitable for the hls4ml flow with no licensing requirement. XLS distinguishes itself by providing its domain specific language (DSL) called DSLX. Most HLS tools provide an extension of C/C++ as their frontend, which is criticised to be ill-suited for circuit design due to its Von Neumann architecture nature. DSLX claims to be a dataflow DSL based on Rust. In combination with the fact that it was built for hardware design, it is well-suited for both hardware and ML designs. This is seen by enforcing many good practices —already integrated in the hls4ml IR, such as per-layer bit precision inference—in the language itself. Another benefit is demonstrated by the ease of implementing other advanced features. Using ``procs’’, streaming convolutional layers and stateful LSTM cells can be implemented with reduced development time.
In our preliminary results, in the fully parallel setting with an initiation interval of one, we obtain two times fewer Look-Up Tables and three times latency improvement compared to the Vitis HLS backend. On top of the development benefits, we note the potential hardware efficiency benefits that could be achieved with this XLS backend.
We wish to highlight the main design decisions, the advantages and the difficulties when writing the XLS backend, while also comparing the synthesized results with already existent backends.