Edge SpAIce CERN Technical Meeting

Europe/Zurich
40/5-A01 (CERN)

40/5-A01

CERN

45
Show room on map
Zoom Meeting ID
63468087547
Host
Sioni Paris Summers
Useful links
Join via phone
Zoom URL

Attending: Nicolò, Stelios, Noemi, Sioni, Maurizio (Zoom)

  • Currently 1 order of magnitude smaller than target on parameters size, 2 orders of magnitude better than target on pixels / W / s
  • Currently resource limited by LUTs
    • Stelios to look at main consumer of LUTs
  • Can latency matching be automated?
  • Vladimir’s fellow will work on layer IP stitching
    • Not yet shareable - keep an eye on it
  • Pruning from Vladimir’s NGT group
    • Trying 4 different pruning methods
    • Nicolò to follow meeting
    • Potential to feedback to Agenium on methods
  • Maurizio to provide EPFL paper on structured pruning
  • Propose to look at intermediate layers KD
  • Proposals for next steps:
  • Study dataflow in more detail
    • Where are bottlenecks?
  • Small reusable blocks?
    • Need some reconfigurability in layer code (e.g. non constant image dimensions)
    • Save resources at the cost of throughput
  • Different clock frequencies for different layers
    • Yes it’s possible, but needs split layer IPs
  • Partial reconfiguration
    • Look into how fast that is
  • Nicolò to deliver the CI / reproducibility pipeline
  • Short term plan:
    • look into some of the ideas in sandbox environment
    • meet again in two weeks
    • bring material (slides, diagrams, anything)
      • ideally aim to quantify the potential of each idea
      • how much will it realistically impact the resources, latency, throughput etc?
      • how feasible is the implementation?
  •  
There are minutes attached to this event. Show them.
    • 10:00 10:15
      Current status overview 15m
      Speakers: Nicolo Ghielmetti (CERN), Stylianos Tzelepis (National Technical Univ. of Athens (GR))

      Explored and/or implemented:

      • FIFO Depth optimisation for Vitis HLS => performance improved over non-FIFO optimised Vitis HLS code
      • Layer latency matching => performance improved over FIFO optimised only solution (WIP?)
      • SepConv resource strategy => performance of SepConv latency strategy improved by implementing SepConv resource strategy
      • Vitis accelerator backend => performance of AXI master solution can improve by using AXI stream (is it implemented/used or WIP?)
      • QONNX ingestion => all the quantisation is handled and propagated even for accumulators (WIP but works)

       

      Next possible paths:

      • DSP packing
      • Pruning => we can see what Vladimir's group has ready next Wed at their NGT meeting (he invited us)
      • KD applied to layers or set of layers and substitute them with SR (Maurizio)
      • Splitting IP => for sure can improve the time to get the synthesis done, maybe useful for the layer latency matching
      • Others?
    • 10:15 11:00
      Brainstorming next Xilinx phase 45m
      Speakers: Maurizio Pierini (CERN), Nicolo Ghielmetti (CERN), Noemi D'Abbondanza (Sapienza Universita e INFN, Roma I (IT)), Sioni Paris Summers (CERN), Stylianos Tzelepis (National Technical Univ. of Athens (GR))
      • Currently 1 order of magnitude smaller than target on parameters size, 2 orders of magnitude better than target on pixels / W / s
      • Currently resource limited by LUTs
        • Stelios to look at main consumer of LUTs
      • Can latency matching be automated?
      • Vladimir’s fellow will work on layer IP stitching
        • Not yet shareable - keep an eye on it
      • Pruning from Vladimir’s NGT group
        • Trying 4 different pruning methods
        • Nicolò to follow meeting
        • Potential to feedback to Agenium on methods
      • Maurizio to provide EPFL paper on structured pruning
      • Propose to look at intermediate layers KD
      • Proposals for next steps:
      • Study dataflow in more detail
        • Where are bottlenecks?
      • Small reusable blocks?
        • Need some reconfigurability in layer code (e.g. non constant image dimensions)
        • Save resources at the cost of throughput
      • Different clock frequencies for different layers
        • Yes it’s possible, but needs split layer IPs
      • Partial reconfiguration
        • Look into how fast that is
      • Nicolò to deliver the CI / reproducibility pipeline
      • Short term plan:
        • look into some of the ideas in sandbox environment
        • meet again in two weeks
        • bring material (slides, diagrams, anything)
          • ideally aim to quantify the potential of each idea
          • how much will it realistically impact the resources, latency, throughput etc?
          • how feasible is the implementation?
      •