Explored and/or implemented:

FIFO Depth optimisation for Vitis HLS => performance improved over non-FIFO optimised Vitis HLS code
Layer latency matching => performance improved over FIFO optimised only solution (WIP?)
SepConv resource strategy => performance of SepConv latency strategy improved by implementing SepConv resource strategy
Vitis accelerator backend => performance of AXI master solution can improve by using AXI stream (is it implemented/used or WIP?)
QONNX ingestion => all the quantisation is handled and propagated even for accumulators (WIP but works)

Next possible paths:

DSP packing
Pruning => we can see what Vladimir's group has ready next Wed at their NGT meeting (he invited us)
KD applied to layers or set of layers and substitute them with SR (Maurizio)
Splitting IP => for sure can improve the time to get the synthesis done, maybe useful for the layer latency matching
Others?