CMS Mg5amc@NLO integration

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Zoom Meeting ID
64902271783
Host
Zenny Wettersten
Alternative host
AVC support account
Useful links
Join via phone
Zoom URL

Tue 27 Aug 2024 - Madgraph meeting with CMS
https://indico.cern.ch/event/1373475/

Present: SR, AV (took minutes), ZW, Jin, Robert, Sapta, OM (after 30 minutes)
Excused: OM (could join only later)

## Zenny / reweighting

ZW: have a framework working for cases with pdf but only a single ME amplitude
Now working on fixing the case with multiple ME amplitudes
Problem is in Zenny's reweighting code, not in cudacpp
For instance e+e- to e+e- or mu+mu- have different Feynman diagrams

## Jin

Jin shows his slides

Slide 6
- JC: only 16 cores with cuda, could we use 32?
  AV: we have seen issues with both CPU memory and GPU memory
  AV: but true if you were able to increase nb_core, then it would probably go faster
  because we know that the bottleneck with CUDA MEs is the CPU non-ME part
- JC: also have many nodes with multiple GPUs, can we use these?
  AV: two different problems, one is that cudacpp cannot use many GPUs in the same job, <=== todo
  [after the meeting: AV commented in https://github.com/madgraph5/madgraph4gpu/issues/836] 
  two is that many jobs using one GPU each should be able to choose. <=== todo
  [after the meeting: AV opened https://github.com/madgraph5/madgraph4gpu/issues/989] 
  AV: the second should be doable with an env variable
  SR: set the env variable in the shell before launching the job

Slide 9
- JC would be nice if we could use many GPUs in a gridpack production eg for tt+3jets
  AV: interesting, this is problem number three, change the python/bash of MG to send jobs to multiple GPUs <=== todo
  [after the meeting: AV opened https://github.com/madgraph5/madgraph4gpu/issues/990] 
  AV: in general, a lot of tuning has to be done by users, but MG must provide some tuning hooks,
  this is one example where a tuning hook (use many GPUs for gridpack production) is missing

Slide 10
- AV: question (due to my ignorance), are you using CMS specific settings that require high precision?
  In my tests I have the impression that producing a gridpack takes me 1h in Fortran, not 24h
  ZW: maybe check the cuts
  JC: may depend on pdf? AV using default, which should be faster than lhapdf
  SB/JC: maybe best compare card by card in runcards <=== todo

Slide 3
- SR: probably not possible to rewuest specific AVX512 through condor
  JC: could go to low level condor and check if avx512
  AV: unfortunately, note that AVX512 per se is not enough, should check if it has one or two FMA units
  (typically Silver or Gold/Platinum Intel CPUs), and this is not even published in O/S variables,
  will follow this up with WLCG anyway <=== todo [after the meeting: AV following up]

Other points
- SR: tuning the number of events produced in a single job is important for GPU, must put it higher
  AV: thanks good point, is this hardcoded now?
  OM: no should be already available in a runcard, will check <=== todo

Slides 11-13
- AV: very nice, but it would be nice to get results for event generation from DY+3j and tt+3j
  JC: problem is that the gridpack production is slow
  SR: with NextGen trigger we will hav every powerful machines with 200+ cores, but probably not for CHEP
  AV: one aternative (for tests! not physics production yet!) could be my suggested multi-backend gridpacks,
  you compile/build for all backends, then optimise vegas with cuda (fastest!), but can compare evgen with cpp/fortram
  OM: yes technically possible, gridpack production language independent from language where generated
  [after the meeting: this is https://github.com/madgraph5/madgraph4gpu/pull/948 but is very much in WIP, no progress]

Slide 14
- JC: it seems that multi-jet is not linearly scalable
  OM: very strange
  AV: could profile it... anyway seems to go in the right direction, throughput increases as you increase events

Discussion
- AV: so in general it looks like results are better than 2-3 weeks ago?
  JC: yes there seem to 
- SR: is CDR closed or not yet? deadline?
  SB: not closed, so we can try to incoroporat esome of the CHEP results 
  SB: deadline was last week, but we can go on... we should keep in contact with Daniel
- SR to AV: you did a lot of improvements, would you plan to present them here in a next meeting?
  AV: thanks, yes I can if people find it useful, I sent the dev slides to Sapta/Jin already
  AV: anyway, we should discuss this afternoon first at the dev meeting and see wha will be merged

There are minutes attached to this event. Show them.