Madgraph on GPU dev meeting Tue 17 Sep 2024
https://indico.cern.ch/event/1355161/
Present: OM, SR, ZW, AV (notes), AT, TC
## AT
AT: CV cannnot join today, apologies
AT: very busy with university, had no time to make progress yet, will come back to you
OM: we can schedule a meeting also with SR and AV if you want
AT: that would be good
## ZW
(1) Fixed many things in reweighting, merged it recently
It is in my forks of madgraph4gpu
AV: can you make a presentation at some point?
ZW: yes absolutely
ZW does a short demo interactively
Generally tests this with l+l- to l+l- because there are multiple subprocesses
The cards need a 'change gpucpp True' to enable ZW's reweighting
Then in the script adds several parameter changes, and launch after each one of them
OM: quite impressive! and so much faster than the original version!
AV: very nice! can you write down some doc?
ZW: yes will do, and in two weeks will show a couple of slides
SR: code generation and everything else is ready?
ZW: yes everything is ready
(2) Also discussing with OM and Marco Zaro about NLO
AV: is the vectorizingNLO branch in mg5amcnlo your work? this appeared recently
ZW: yes this is what we are doing with MZ, "simulating SIMD"
SR: still struggling with bugs here?
ZW: no, with MZ we fixed the bugs we had
## SR
SR: Stumbled across a build failure #1004 with vector type refences
SR/AV: will look at it
SR: what I really wanted to do here was to look at going to many particles in final state
This is using c++11 features for pre instantiating templates into separate objects
AV: end result may be similar to what I did with helinl=l,
but very nice to have a different approach with c++11 features for templates
## TC
TC: not much to report
Nathan started a staff position so has many other committments
TC discussing with NN to get what he did into the repo
## AV
AV shows the slides attached.
Discussion about the options for packaging.
OM: option 3 is also interesting.
We could maintain a (new renamed cleaned up) madgraph4gpu that contains mg5amcnlo.
Then mg5amcnlo users would download a ~tarball of cudacpp from the madgraph4gpu repo, just like other plugins/models do now.
AV: very interesting discussion, you are convincing me that option 3 may be the easiest,
that with less work to be done with respect to now, and also does not preclude option 1 and 2.
OM: note, presently models/plugins just download the latest available, here we should be a bit more precise
AV: very nice, means we can have a specific mg5amcnlo commit as submodule in madgraph4gpu,
but then also a specific commit of madgraph4gpu to identify a ~tarball to download in mg5amcnlo,
this makes the bidirectional dependency better controlled.
Agreed: AV will look more at option 3 concrete scenarios, OM will look at creating a database of versions.
Discussion about the DY+4j preliminary results.
AV: first time we see DY+4j speedup from SIMD and it is quite nice
AV: This is why it would be useful to have the multi backend gridpacks, and also the profiling infrastructure.
Discussion: this profiling infrastructure is complementary to flamegraphs, both are very useful in different ways
(eg flamegraphs for first look in detail with no pre-categorization, instrumenting for systematic cuda/simd/fortran comparisons).
## OM
OM Worked on the points described by AV.
OM Working with a new student on reducing matrix1.f files.
OM Also there is a meeting at CERN in Feb 2025, will forward the email.
OM Checked mismatch of xsec between sde=1 and sde=2, did not find any real bug
Quite clear however that we should use sde=1