Madgraph5 GPU development

Name: Madgraph5 GPU development
Start: 2022-12-05T15:00:00+01:00
End: 2022-12-05T16:30:00+01:00
Location: CERN

Monday 5 Dec 2022, 15:00 → 16:30 Europe/Zurich

513/1-024 (CERN)

513/1-024

CERN

Show room on map

Stefan Roiser

stefan.roiser@cern.ch

+41 75 4115334

Hide

# Madgraph meeting Mon 05.12.2022

Present: SR, ZW, AV, SH, JT, TC, NN, WH
Excused: OM

## Round table

### ZW

ZW: working on reweighting project.
Can already read in an LHE file and reproduce a cross sections.
Can do many things except change the parameters.
Code is on personal fork for the moment, will prepare a MR eventually.

### AV

AV1: gave a talk in Paris at QCD@LHC,
was very nice to meet many people from Madgraph/Lund and Sherpa teams

AV2: merged the "cleanup MR", bring ing back epochX/cudacpp to what
it was before the merging of kokkos/sycl
AV: is epochX/fortran up to date or not? and are sycl/kokkos using bazaar?
NN: sycl/kokkos are using the usptream 3.1 branch in github, not bazaar
alpaka is still using old code (it is no longer maintained)
TC: about fortran, will have a look if it is just old doc, or old code,
in case we can just delete this

JT: can I get a mattermost message when you merge?
AV: you can get automatic emails
SH: you need to 'follow' the repo on github

AV3: then there is hack2/hack3 MR on the way, this is stuff from the Lugano hackathon,
for instance the mixed precision implementation that was used for the ACAT plots.
There is also more stuff done in Lugano that needs to be merged in master,
like OM/AV work on splitting kernels, and SH work on shared memory.
But will do this later, after the random color and helicity.
So essentially, plan to merge hack2/hack3 and then discuss random color/helicity with OM.

### JT

Good progress in making plots.
Cleaning up as contract is about to end.

### NN

Was working on a couple of branches.

One is to get better performance on sycl code.
Did some profiling of sycl code on V100 device.
Experimented with different complex class implementations.
Move to thrust and can now get better perf in sycl than cuda.
The sycl speedup is maybe 20% on average, will try to quantify that.
NN: now working on a custom complex class to experiment.

NN: also working on splitting up the kernels.
Got slightly worse performance in some cases.
TC: some of this work was inspired by openmP threading on CPU

NN: looking at fptype vectors and what it does in CPU/GPU
AV: note that AOSOA on CPU is essential for vectorization,
but for GPU it "only" gives some percent as it does coalesced memory access,
but with huge kernels memory access is not the bottleneck anyway

SR: note that JT did some tests where ggttgg has similar cuda/sycl,
but with ggttggg it seems that cuda is consistently quite a bit better.
Did you have some checks for ggttggg?
NN: jobs failed, but will check again
NN: anyway maybe this may be with older complex version,
but with thrust it may better also on ggttggg
(TC: also had some issues building ggttggg
NN: it did work on nvidia and amd gpus)
NN: anyway will show some results for 16k page on ggttgg/ggttggg next time

NN: also added some timers in kokkos/sycl to count MEs/s
AV: anyway ggttgg/ggttggg the transpose should have no overhead

### SH

Back from vacationb, ntr

### WH

Nice that bridge is working in cudacpp and PFs.
We have some resources at Argonne and we need to produce some MCs.
Maybe we can try to use those processes?
At LO we use MG only for signal samples, eg some SUSY.
It should work out of the box?
AV: not clear, maybe some dependencies between parameters
(eg dependency of susy parameters on alphas) may break,
but we should try it, just send some processes to generate
SR: yes absolutely send us the generation

### TC

Working with ESP project to put effort on madgraph
Starting to test how to run at scale on Aurora
For instance using random seeds with MPI to distribute at scale
between nodes and then communicate over the high speed network
This benefits from experience seven years ago with alpgen,
had observed some issues with queing of requests

AV: but this is not really madgraph specific?
Essentially ou send one random nunmber seed as input to each node,
then get one LHE file as output, but you could be using another generator to do that?
TC: correct, this is how it goes, then there is a process merging the LHE files
but this is not madgraph specific
TC: note that in the past we did this for alpgen, but for instance sherpa
was not scaling well enough to do this sort of tests

### SR

Submitted a CHEP 2023 abstract.

Also did discussions with both CMS and ATLAS.

Gave a talk at the WLCG Workshop.
Some of this will be propagated to the LHCC, shich is good.

Contacted the ALLEN people in LHCb to look at their CUDA/HIP header.

No progress on cublas, still work in progress.

## Jorgen's plots

See https://madgraph4gpu-db.web.cern.ch/d/u30yDJDVk/main-dashboard?orgId=1
These are plots with master, cuda is around 1.5x faster than sycl for ggttggg
NN: it makes sense, the improvements show today for sycl are not in master yet

## AOB

SR: next meeting before Xmas or after Xmas?

WH: do we need to have a discussion on abstraction layers soon?
SR: no we can do it after Xmas
NN: ok to have the discussion after the break, anyway it seems we are converging on sycl

SR: how about January 9th?
AV: not sure
WH: should be ok for Jan 9
SR: fix now for Jan 9, else we'll do it later

Enjoy the end of season break!!!

There are minutes attached to this event. Show them.

- 15:00 → 15:10
  
  News 10m
- 15:10 → 15:30
  
  Topical discussion 20m
- 15:30 → 15:50
  
  Round table 20m
- 15:50 → 16:00
  
  AoB 10m