Compute Accelerator Forum - FPGAs

Europe/Zurich
Virtual (Zoom)

Virtual

Zoom

Description

 

To receive annuoncements and information about this forum please subscribe to compute-accelerator-forum-announce@cern.ch

 

Introduction to FPGA Acceleration

  • An example question [Ben M]
  • On sparse matrices, are even zeros stored, or is a specialized data structure used ?[Vince]
    • Will need specialized data structures to fit large SMs in memory, and this can degrade performance on GPUs. It’s here that FPGAs help [Marco]
  • How well does compilation from ML-defined comp. Graphs (tensorflow) to FPGA work (asked below by Lukas]
  • Are there libraries available for common things, say random number generation, or is hand crafting needed in many cases? [Ben] (answered below in reply to Marcel]
  • In the MC/Simulation case,
  • Talked about working well for graph algorithms, but these are typically memory bound, so how well can FPGA handle them[Stephen]
    • Could partition graph into blocks with high locality, use double buffering and locality on chip to exploit parallelism. Won’t saturate memory bandwidth [Marco]
    • What happens when you go beyond the 54MB on chip? [Stephen]
    • This is where the double buffering comes in - only use part of the on chip memory to work on, use the rest for transfer [Marco]
    • What about random access outside the 54MB limit? {Stephen]
    • Algorithms can help here to partition data by locality. Also have to consider CPU latency [Marco]
  • Can you say anything about the FPGA backend to SYCL compared to a HLS interpretation [Charles, Attila]?
    • Not much experience with SYCL yet, though looking at it and oneAPI [Marco].
    • One thing to note on compilers is how they generate code, maxCompiler(?) gives greater control which is valuable at this level.
    • Are there any metrics for this [Charles]?
    • Not directly, but did see some slowdown due to memory latency [Marco].
    • HEP codes may not be so affected/or need by memory saturation[Attila].
    • Used Xilinx HLS, have observed better results with this than Verilog, but depends on algorithm [Bruno].
  • What’s the ecosystem for programming like, e.g. libraries [Marcel]?
    • Don’t expect the same experience as other software! E.g. especially for graph algorithms. Some research at Imperial with others towards an API for graph processing. For maxCompiler, nothing yet, but do have a few functions for basic/elementary functions (trigonometry, sorting, lookup tables). So do need to write a lot yourself [Marco].
  • On the recent projects, did you run parts of Geant4 on the FPGA [Witek]?
    • Yes, but down the line - at the moment doing the work with DPM library. Looking at single scattering in isolation at the moment. [Marco].
  • Any experience with synthesing from ML (e.g. tensorflow) to graphs [Lukas]?
    • In my experience, don’t get the performance that by-hand, e.g. matrix multiplication. May change as toolchains improve [Marco].
    • What about TF as a programming model (not just pure ML), e.g. pre-design operations? [Lukas]
    • Again, difficult at the moment to organise/optimize memory transfers [Marco]
  • What’s the speed difference between single and double precision [Beomki]?
    • Slide 44: 2 DSPs needed for single precision, but need more for double. Number depends on the FP multiplier, e.g. Xilinx needs 8 [Marco].
There are minutes attached to this event. Show them.