Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

Madgraph5 GPU development

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Videoconference
Madgraph5 GPU development
Zoom Meeting ID
63816708295
Host
Stefan Roiser
Useful links
Join via phone
Zoom URL

Madgraph on GPU dev meeting Tue 17 Sep 2024
https://indico.cern.ch/event/1355161/
Present: OM, SR, ZW, AV (notes), AT, TC

## AT

AT: CV cannnot join today, apologies

AT: very busy with university, had no time to make progress yet, will come back to you
OM: we can schedule a meeting also with SR and AV if you want
AT: that would be good

## ZW

(1) Fixed many things in reweighting, merged it recently
It is in my forks of madgraph4gpu

AV: can you make a presentation at some point?
ZW: yes absolutely
ZW does a short demo interactively
Generally tests this with l+l- to l+l- because there are multiple subprocesses
The cards need a 'change gpucpp True' to enable ZW's reweighting
Then in the script adds several parameter changes, and launch after each one of them

OM: quite impressive! and so much faster than the original version!

AV: very nice! can you write down some doc?
ZW: yes will do, and in two weeks will show a couple of slides

SR: code generation and everything else is ready?
ZW: yes everything is ready

(2) Also discussing with OM and Marco Zaro about NLO

AV: is the vectorizingNLO branch in mg5amcnlo your work? this appeared recently
ZW: yes this is what we are doing with MZ, "simulating SIMD"

SR: still struggling with bugs here?
ZW: no, with MZ we fixed the bugs we had

## SR

SR: Stumbled across a build failure #1004 with vector type refences
SR/AV: will look at it

SR: what I really wanted to do here was to look at going to many particles in final state
This is using c++11 features for pre instantiating templates into separate objects
AV: end result may be similar to what I did with helinl=l, 
but very nice to have a different approach with c++11 features for templates

## TC

TC: not much to report
Nathan started a staff position so has many other committments
TC discussing with NN to get what he did into the repo

## AV

AV shows the slides attached.

Discussion about the options for packaging.
OM: option 3 is also interesting.
We could maintain a (new renamed cleaned up) madgraph4gpu that contains mg5amcnlo.
Then mg5amcnlo users would download a ~tarball of cudacpp from the madgraph4gpu repo, just like other plugins/models do now.
AV: very interesting discussion, you are convincing me that option 3 may be the easiest,
that with less work to be done with respect to now, and also does not preclude option 1 and 2.
OM: note, presently models/plugins just download the latest available, here we should be a bit more precise
AV: very nice, means we can have a specific mg5amcnlo commit as submodule in madgraph4gpu,
but then also a specific commit of madgraph4gpu to identify a ~tarball to download in mg5amcnlo,
this makes the bidirectional dependency better controlled.
Agreed: AV will look more at option 3 concrete scenarios, OM will look at creating a database of versions.

Discussion about the DY+4j preliminary results.
AV: first time we see DY+4j speedup from SIMD and it is quite nice
AV: This is why it would be useful to have the multi backend gridpacks, and also the profiling infrastructure.
Discussion: this profiling infrastructure is complementary to flamegraphs, both are very useful in different ways
(eg flamegraphs for first look in detail with no pre-categorization, instrumenting for systematic cuda/simd/fortran comparisons).

## OM

OM Worked on the points described by AV.

OM Working with a new student on reducing matrix1.f files.

OM Also there is a meeting at CERN in Feb 2025, will forward the email.

OM Checked mismatch of xsec between sde=1 and sde=2, did not find any real bug
Quite clear however that we should use sde=1

There are minutes attached to this event. Show them.