Madgraph5 GPU development - !! ATTENTION Wednesday 15:00 CEST !!

Name: Madgraph5 GPU development - !! ATTENTION Wednesday 15:00 CEST !!
Start: 2021-09-15T15:00:00+02:00
End: 2021-09-15T16:00:00+02:00
Location: Virtual (Zoom)

Wednesday 15 Sept 2021, 15:00 → 16:00 Europe/Zurich

Virtual (Zoom)

Videoconference

Madgraph5 GPU development

Zoom Meeting ID: 63368133283
Host: Stefan Roiser
Useful links: Join via phone
Zoom URL

Stefan Roiser

stefan.roiser@cern.ch

+41 75 4115334

Hide

# Madgraph4GPU dev meeting Wed 15.09.2021
https://indico.cern.ch/event/1063932/

Present: SR, OM, AV, Laurence, David, Taylor, Walter, Josh
Excused: StephanH

## Round table

Olivier
- Had issues with rapidity cuts before the summer. Now all this is fixed. Should be able to do phase space generation with cuts now, to be interfaced to the new code with vectorization. (As long as we do LO and no MLM merging - note that CKKWL merging would not be a problem as it is delegated to Pythia).
- SR: reminder that we need a function signature for phase space integration, any progress? OM: not yet, will work on that.

David
- Nothing to report, have not worked on Alpaka for months. But would like to resume to have the 'golden tag'.

Andrea
- Was on holiday for 5 weeks!
- Before the holidays, created a benchmarking container. Some issues (memory exceeded) when running multiple copies.
- Before the holidays, was making some tests on Skylake Gold with gcc10, but lost the results. Will repeat and also test Icelake if possible. Got some useful feedback from Intel (thanks also to Laurence).
- (Also busy with Josh and others on the LHCC review of generators)
- To do in random order: agree on golden tag, multithreading, inte

Taylor
- Debugged Kokkos/Cuda and 20% discrepancy. Traced this back mainly to kokkos complex types. Was also running an older version of cal_wavefunction with different calling order, now moved to what OM/AV are using. A lot of profiling on the way, which was very useful.
- Showed plots. Now Kokkos is only ~10% slower than CUDA. For double precision get roughly twice the throughput on V100 than on AMD Mi100 (knowing that they are around 8 and 11 TFlops respectively, so AMD should deliver much more - Kokkos issue for AMD?).
- Looking forward to tomorrow's meeting!

Josh
- Not much to report, still no time to get hands dirty

Laurence
- More contacts with Intel. They looked at Cuda vs Sycl, they spotted we were using fastmath in cuda but not in sycl, and single precision in cuda but double in Sycl; when the same, throughputs are much more similar. Second, issues with AVX512 are due to a bug in the hardware, they recommend Icelake, and we have one in openlab, or can use devcloud. Devcloud is very interesting, with tutorials and a jupyter notebook interface.
- TC: is your sycl code in the repo? LF: yes, based on sycl - previously had a oneAPI version from a fellow. Tyler left but we have now a new postdoc that could work on something related, eg work with LF if useful. SR: make sure you push your latest changes.
- AV: did you manage or do you plan to use sycl on an Intel GPU? LF: kind of, only on my integrated GPU. TC: could help with some Intel CPUs from HPC centers.
- AV: forgot to say that also tried the new clang-based intel compiler, and it looks quite promising.
- SR to LF: could you make a presentation on your findings, you or an Intel expert? LF: not much to show, there were a couple of bugs now fixed in compiler option and precision. SR: just some slides? LF: maybe better wait for the "golden tag".

Stefan
- Andy has finished his thesis (but not public yet, will go to cds at cern eventually)
- Have a school student looking for a project till December. He will contribute by taking Andy's work and put it in a notebook.
- Have looked myself at maxregcount, saw that you get a large difference but only if not at the maximum grid size.
- Working on a PR for splitting the kernel in smaller pieces (eemumu). Compiles but does not run yet. Using one kernel for ixxx, one for oxxx, one for FFV etc etc. Intermediate results are in GPU global memory.

## AOB and next plans

SR: golden tag? which process, eemumu or ggttgg? with vectorization or not?
Discussion, paper now or later with a later implementation.
Also depends how complicated code generating code is.

Next regular dev meeting: Tue 28 at 3pm (Monday 27 Olivier has holiday in Belgium, and Andrea has another meeting)

Next meeting: tomorrow Thu 16 at 3pm - Olivier's code generating workshop

There are minutes attached to this event. Show them.

- 15:00 → 15:10
  
  News 10m
- 15:10 → 15:30
  
  Topical discussion 20m
- 15:30 → 15:50
  
  Round table 20m
- 15:50 → 16:00
  
  AoB 10m

Choose timezone

Madgraph5 GPU development - !! ATTENTION Wednesday 15:00 CEST !!

Virtual (Zoom)

Share this page

Direct link

Social networks

Calendaring