Madgraph5 GPU development - !! ATTENTION THURSDAY !!

Name: Madgraph5 GPU development - !! ATTENTION THURSDAY !!
Start: 2022-06-30T15:00:00+02:00
End: 2022-06-30T16:00:00+02:00
Location: Virtual (Zoom)

Thursday 30 Jun 2022, 15:00 → 16:00 Europe/Zurich

Virtual (Zoom)

Stefan Roiser

stefan.roiser@cern.ch

+41 75 4115334

Hide

# Dev meeting 30.06.22

Present: AV, SR, TC, NN, CV, OM, WH

## Rehearsal of ICHEP talk

OM: conclusions confucing
- do not repeat the numbers here. just say that we are faster because of SIMD
(do not mention helicity recycling here)
(single vs double remove as well?)
TC not sw engineers, we can run on GPUs,
and give an x5-x8 on high ened CPUs
- PFs in red

Dropping slides?
- OM: slides 3 and 4 too detailed, put them in backup (maybe have a less detailed one)
- NN: maybe avoid outline for a short talk! (AV: maybe keep the three areas of development)

Three points?
AV maybe remove slide 5? put it in backup
OM otherwise just remove the previous three
Do keep the DOI however

References?
AV do I need to keep Taylor's references?
TC: no not important

Slides on lockstep and MC dice?
TC: keep the roulette, a physics understands that, remove the SIMD/SIMT

Slide on epochs: OM go to backup slides

Motivation: lighter

MG and madgraph4gpu: merge as a single slides

OM: On the results table, remove fortran?
Or redo it with

Slide madevent: too dense
WH: tables is difficult, should use a bar plot instead...
TC: point out amdahl, we sped up the ME as much as we can

OM: same table (or plot) format for cudacpp and for madevent

Agreed!
- table, not bar plot
- BUT only the factors in the table
- and for madevent, put BOTH the ME factor and the overall factro

Some discussion on AVX512 or not - just give CERN table
AV: mention we reach theoretical limits
AV: and mention we reach theoretical limit somewhere

OM: people dont know what AVX512 is... AV say 512 bits?

Slide on Portability Framework?
TC: move it to backup, just introduce them in a light way
OM: just a motivation slide with three logos of PFs and three logos of vendors

AV: mini title slides?

Results slides on PFs?
TC: you gave maybe a bit too little details compared to others?
TC: for first slide was ok, mention filling GPU and

NN: for sycl being hgogher thatshamay be related to clang?

AV: how about we remove the eemumu plots? maybe it includes memory copies...
TC: we should check, I remember that initially we were handling diferemtly the data copies
AV: yes exactly

AV: note that sycl is better than cuda in v100 for ggttgg! so that I would leav in

NN: would prefer to leave it in

agreed: keep eemumu in first slide (mention overheads), remove it in the second
(overheads of launchking kernels etc too)

slide on cpu and pfs
AV: keep it or leave it?
TC: keep it, and mention doing apples to apples

PFs code runs out of the box with reasonable performance also on CPUs
The cudacpp implementation handles both vectorization and threading at a much lower level

WH/OM: slide 11 to backup?
TC/NN: ok, but add one sentence on previous slide "PFs also run out of the box on CPUs"
(performance under investigation)

Slide on outlook in MEs?
Agree to skip it and move it to backup
AV: Maybe extract two-three main outlook points frfom both here and madevent

Slide on outlook in madevent?
OM: more interesting, at leas t top part, maybe in conclusions?

Slide on madevent reengineering
OM: remove bullet two?
In the ened we agree to keep it

There are minutes attached to this event. Show them.

- 15:00 → 15:10
  
  News 10m
- 15:10 → 15:30
  
  Topical discussion 20m
  
  20220708-MG5aMConGPU-ICHEP-AV-v009b-rehearsal.pdf
  
  20220708-MG5aMConGPU-ICHEP-AV-v009b-rehearsal.pptx
  
  Taylor's GoogleDoc
- 15:30 → 15:50
  
  Round table 20m
- 15:50 → 16:00
  
  AoB 10m