Explored and/or implemented:
- FIFO Depth optimisation for Vitis HLS => performance improved over non-FIFO optimised Vitis HLS code
- Layer latency matching => performance improved over FIFO optimised only solution (WIP?)
- SepConv resource strategy => performance of SepConv latency strategy improved by implementing SepConv resource strategy
- Vitis accelerator backend => performance of AXI master solution can improve by using AXI stream (is it implemented/used or WIP?)
- QONNX ingestion => all the quantisation is handled and propagated even for accumulators (WIP but works)
Next possible paths:
- DSP packing
- Pruning => we can see what Vladimir's group has ready next Wed at their NGT meeting (he invited us)
- KD applied to layers or set of layers and substitute them with SR (Maurizio)
- Splitting IP => for sure can improve the time to get the synthesis done, maybe useful for the layer latency matching
- Others?