Edge SpAIce CERN Technical Meeting

Name: Edge SpAIce CERN Technical Meeting
Start: 2024-11-05T10:00:00+01:00
End: 2024-11-05T11:15:00+01:00
Location: CERN

Tuesday 5 Nov 2024, 10:00 → 11:15 Europe/Zurich

40/5-A01 (CERN)

40/5-A01

CERN

Show room on map

63468087547

Sioni Paris Summers

Join via phone

Hide

Attending: Nicolò, Stelios, Noemi, Sioni, Maurizio (Zoom)

Currently 1 order of magnitude smaller than target on parameters size, 2 orders of magnitude better than target on pixels / W / s
Currently resource limited by LUTs
- Stelios to look at main consumer of LUTs
Can latency matching be automated?
Vladimir’s fellow will work on layer IP stitching
- Not yet shareable - keep an eye on it
Pruning from Vladimir’s NGT group
- Trying 4 different pruning methods
- Nicolò to follow meeting
- Potential to feedback to Agenium on methods
Maurizio to provide EPFL paper on structured pruning
Propose to look at intermediate layers KD
Proposals for next steps:
Study dataflow in more detail
- Where are bottlenecks?
Small reusable blocks?
- Need some reconfigurability in layer code (e.g. non constant image dimensions)
- Save resources at the cost of throughput
Different clock frequencies for different layers
- Yes it’s possible, but needs split layer IPs
Partial reconfiguration
- Look into how fast that is
Nicolò to deliver the CI / reproducibility pipeline
Short term plan:
- look into some of the ideas in sandbox environment
- meet again in two weeks
- bring material (slides, diagrams, anything)
  - ideally aim to quantify the potential of each idea
  - how much will it realistically impact the resources, latency, throughput etc?
  - how feasible is the implementation?

There are minutes attached to this event. Show them.

- 10:00 → 10:15
  Current status overview 15m
  
  Speakers: Nicolo Ghielmetti (CERN), Stylianos Tzelepis (National Technical Univ. of Athens (GR))
  
  edge_space_v1.pdf
  
  Edge SpAIce - FastML.pptx.pdf
  
  poster_hls4ml_developments.pdf
  Explored and/or implemented:
  
  FIFO Depth optimisation for Vitis HLS => performance improved over non-FIFO optimised Vitis HLS code
  
  Layer latency matching => performance improved over FIFO optimised only solution (WIP?)
  
  SepConv resource strategy => performance of SepConv latency strategy improved by implementing SepConv resource strategy
  
  Vitis accelerator backend => performance of AXI master solution can improve by using AXI stream (is it implemented/used or WIP?)
  
  QONNX ingestion => all the quantisation is handled and propagated even for accumulators (WIP but works)
  
  Next possible paths:
  
  DSP packing
  
  Pruning => we can see what Vladimir's group has ready next Wed at their NGT meeting (he invited us)
  
  KD applied to layers or set of layers and substitute them with SR (Maurizio)
  
  Splitting IP => for sure can improve the time to get the synthesis done, maybe useful for the layer latency matching
  
  Others?
- 10:15 → 11:00
  Brainstorming next Xilinx phase 45m
  
  Speakers: Maurizio Pierini (CERN), Nicolo Ghielmetti (CERN), Noemi D'Abbondanza (Sapienza Universita e INFN, Roma I (IT)), Sioni Paris Summers (CERN), Stylianos Tzelepis (National Technical Univ. of Athens (GR))
  Currently 1 order of magnitude smaller than target on parameters size, 2 orders of magnitude better than target on pixels / W / s
  
  Currently resource limited by LUTs
  
  Stelios to look at main consumer of LUTs
  
  Can latency matching be automated?
  
  Vladimir’s fellow will work on layer IP stitching
  
  Not yet shareable - keep an eye on it
  
  Pruning from Vladimir’s NGT group
  
  Trying 4 different pruning methods
  
  Nicolò to follow meeting
  
  Potential to feedback to Agenium on methods
  
  Maurizio to provide EPFL paper on structured pruning
  
  Propose to look at intermediate layers KD
  
  Proposals for next steps:
  
  Study dataflow in more detail
  
  Where are bottlenecks?
  
  Small reusable blocks?
  
  Need some reconfigurability in layer code (e.g. non constant image dimensions)
  
  Save resources at the cost of throughput
  
  Different clock frequencies for different layers
  
  Yes it’s possible, but needs split layer IPs
  
  Partial reconfiguration
  
  Look into how fast that is
  
  Nicolò to deliver the CI / reproducibility pipeline
  
  Short term plan:
  
  look into some of the ideas in sandbox environment
  
  meet again in two weeks
  
  bring material (slides, diagrams, anything)
  
  ideally aim to quantify the potential of each idea
  
  how much will it realistically impact the resources, latency, throughput etc?
  
  how feasible is the implementation?

Choose timezone

Edge SpAIce CERN Technical Meeting

40/5-A01

CERN