



# Faster FPGA firmware synthesis with hls4ml

FastML group

Sarai Sokolovsky Supervised by Vladimir Loncar

#### THE LHC BIG DATA PROBLEM



Deploy ML algorithms very early Challenge: strict latency constraints!



- Fast processing of raw data
- Flexibility and modularity



## Field-programmable gate arrays (FPGAs



- ✓ Reprogrammable integrated circuits
- ✓ Massively parallel= low latency
- ✓ Low power





#### **CURRENT ARCITECTURE**









inputs

#### SOLUTION



ap done 0

| layer8\_out\_0\_V\_ap\_vld\_0 | layer8\_out\_1\_V\_ap\_vld\_0 | layer8\_out\_2\_V\_ap\_vld\_0 | layer8\_out\_3\_V\_ap\_vld\_0

layer8\_out\_4\_V\_ap\_vld\_0
layer8 out 5 V ap vld 0

third layer 0

layer8 out 0 V ap vld

dividing the neural network into blocks that can be synthesized in parallel

reduce the latency and the synthesis time



Less gates -> less output capacitance -> less time for the signal to go through



#### **IMPROVEMENTS**



Reduced by 2x-20x!

Reduced by 7%!



|     | Synthesis (s) | Latency<br>(ns) | Power (W) | Total<br>BRAM | Total<br>DSP | Total<br>LUT | Total<br>FF |
|-----|---------------|-----------------|-----------|---------------|--------------|--------------|-------------|
|     |               |                 |           |               |              |              |             |
| old | 50            | 145             | 464       | 3             | 408          | 16149        | 11656       |
| new | 20-30         | 135             | 459       | 3             | 408          | 16149        | 11656       |



Same amount!



### THANK YOU!

