- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
# MGonGPU dev meeting Wed 08.06.2022
Present: SR, OM, AV (notes), NN, TC, CV, WH
## Taylor - skeleton of slides for ICHEP
Taylor shows a skeleton for the slides for ICHEP,
https://docs.google.com/presentation/d/1TDNqpHY5GBj1FLnYst0lwA9QfbmJ9JbmwF4yWn2ef4w/edit?usp=sharing
AV: would mention upfront that we have two lines of development cudacpp and abstraction layers
and would also mentionm that we plan to integrate with madevent
WH/NN: agree that we should compare with cuda
OM: agree that it is important to mention the madevent, just that we do not have enough time
TC: true we have little time but we can touch all of these subjects
AV: agree we can mention the main messages for all these things
OM: two main messages is
- abstraction layers are performant
- we have a way to integrate
AV: I would also strongly mention that lockstep processing is fit for event generation
WH: have metrics for that?
AV: for lockstep on vectorization and GPUs yes
OM: tru, but who do you want to convince?
AV: other generators team, sherpa, but also whizard koralw etc...
TC: choice of processes? eemumu and ggtt?
AV: would show one simple process (eemumu? ggtt?) and one complex process (ggttgg? ggttggg?)
NN: for abstraction layers we have all 5 processes, and we see that for eemumu asbtraction layers look better
while for complex processes direct cpp seems better (openMp)
AV: would mention several dimensions of speedup, vectorization, multithreading, GPU and multi-architecture port
By having only the 'maximum' it is difficult to describe separately the parts.
More in detail: comparing the max for GPU is ok, but for CPU it is difficult to disentangle MT and vectorization.
TC: agree, but we should give a message to the HEP community about usability of the layers
AV: yes but we can say we do not understand some thing syet (eg for MT and vectorization on CPU
TC: ok agree
NN shows some plots for thread sacling on CPU
AV do not understand those increasing above 2^8 (also, check.exe CL arguments should have no impact?)...
better understand the peak at 2^6 and then drops in overcommitting from OMP_NUM_THREADS
AV/TC long discussion about vectorization...
TC do we have in the code a specific use of vectorization?
AV yes this is the neppV in the code, but also the -mhaswell build flags
Maybe you can use the build flags in sycl and see if it gives any benefit from autovectorization?
TC another option is that in the talk we only show the GPU results?
Discussion on GPU, we seem to agree, all good.
Discussion on CPU, a bit more complex to decide what to show.
TC: there are two good discussions, usability and ability to exploit fully the hardware.
WH: agree two different discussions, GPU and CPU
SR: we could give two messages here, knowing that WLCG now are mainly CPU
For CPU we can say we are able to leverage vectorization
For GPU we can use the abstraction layers
TC: we also have some mpirun plots
NN: ggtt does not run on alpaka
AV: lets use eemumu as simple process and ggttgg as comple process
AV: stress that I am not convinced so far that we have any SIMD vectorization in abstraction layers
(at least not the code we have now)... if you could show that it would be great
## Andrea
Present some slides
OM: you can try to use eemumu which does not have color
OM: note that move to 340 is done, but this wil be within the "311 branch".
If you use the 311 branch of github, you see it says "madgraph 340"!
## Round table
OM: progressing on colors
TC/NN/WH: nta
CV: trying to compile sycl code and having some issues
What is a sycl compiler, do we need to build it ourselves? Is clang++ ok?
NN: it must tbe the sycl build of clang, they have a branch
CV: also submitted a pull request for the alpaka
## AOB
Next meeting? Mon 13 June agreed
TC: will try to get some plots in the slides by the end of the week
AV: can I just add some message to the slides?