CMS Mg5amc@NLO integration

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Zoom Meeting ID
64902271783
Host
Zenny Wettersten
Alternative hosts
AVC support account, Stefan Roiser
Useful links
Join via phone
Zoom URL

Tue 27 Aug 2024 - Madgraph meeting with CMS
https://indico.cern.ch/event/1373475/

Present: SR, AV (took minutes), ZW, Jin, Robert, Sapta, OM (after 30 minutes)
Excused: OM (could join only later)

## Zenny / reweighting

ZW: have a framework working for cases with pdf but only a single ME amplitude
Now working on fixing the case with multiple ME amplitudes
Problem is in Zenny's reweighting code, not in cudacpp
For instance e+e- to e+e- or mu+mu- have different Feynman diagrams

## Jin

Jin shows his slides

Slide 6
- JC: only 16 cores with cuda, could we use 32?
  AV: we have seen issues with both CPU memory and GPU memory
  AV: but true if you were able to increase nb_core, then it would probably go faster
  because we know that the bottleneck with CUDA MEs is the CPU non-ME part
- JC: also have many nodes with multiple GPUs, can we use these?
  AV: two different problems, one is that cudacpp cannot use many GPUs in the same job, <=== todo
  [after the meeting: AV commented in https://github.com/madgraph5/madgraph4gpu/issues/836] 
  two is that many jobs using one GPU each should be able to choose. <=== todo
  [after the meeting: AV opened https://github.com/madgraph5/madgraph4gpu/issues/989] 
  AV: the second should be doable with an env variable
  SR: set the env variable in the shell before launching the job

Slide 9
- JC would be nice if we could use many GPUs in a gridpack production eg for tt+3jets
  AV: interesting, this is problem number three, change the python/bash of MG to send jobs to multiple GPUs <=== todo
  [after the meeting: AV opened https://github.com/madgraph5/madgraph4gpu/issues/990] 
  AV: in general, a lot of tuning has to be done by users, but MG must provide some tuning hooks,
  this is one example where a tuning hook (use many GPUs for gridpack production) is missing

Slide 10
- AV: question (due to my ignorance), are you using CMS specific settings that require high precision?
  In my tests I have the impression that producing a gridpack takes me 1h in Fortran, not 24h
  ZW: maybe check the cuts
  JC: may depend on pdf? AV using default, which should be faster than lhapdf
  SB/JC: maybe best compare card by card in runcards <=== todo

Slide 3
- SR: probably not possible to rewuest specific AVX512 through condor
  JC: could go to low level condor and check if avx512
  AV: unfortunately, note that AVX512 per se is not enough, should check if it has one or two FMA units
  (typically Silver or Gold/Platinum Intel CPUs), and this is not even published in O/S variables,
  will follow this up with WLCG anyway <=== todo [after the meeting: AV following up]

Other points
- SR: tuning the number of events produced in a single job is important for GPU, must put it higher
  AV: thanks good point, is this hardcoded now?
  OM: no should be already available in a runcard, will check <=== todo

Slides 11-13
- AV: very nice, but it would be nice to get results for event generation from DY+3j and tt+3j
  JC: problem is that the gridpack production is slow
  SR: with NextGen trigger we will hav every powerful machines with 200+ cores, but probably not for CHEP
  AV: one aternative (for tests! not physics production yet!) could be my suggested multi-backend gridpacks,
  you compile/build for all backends, then optimise vegas with cuda (fastest!), but can compare evgen with cpp/fortram
  OM: yes technically possible, gridpack production language independent from language where generated
  [after the meeting: this is https://github.com/madgraph5/madgraph4gpu/pull/948 but is very much in WIP, no progress]

Slide 14
- JC: it seems that multi-jet is not linearly scalable
  OM: very strange
  AV: could profile it... anyway seems to go in the right direction, throughput increases as you increase events

Discussion
- AV: so in general it looks like results are better than 2-3 weeks ago?
  JC: yes there seem to 
- SR: is CDR closed or not yet? deadline?
  SB: not closed, so we can try to incoroporat esome of the CHEP results 
  SB: deadline was last week, but we can go on... we should keep in contact with Daniel
- SR to AV: you did a lot of improvements, would you plan to present them here in a next meeting?
  AV: thanks, yes I can if people find it useful, I sent the dev slides to Sapta/Jin already
  AV: anyway, we should discuss this afternoon first at the dev meeting and see wha will be merged

There are minutes attached to this event. Show them.
    • 13:00 13:50
      Discussion 50m
      Speakers: Jin Choi (Seoul National University (KR)), Saptaparna Bhattacharya (Wayne State University (US))