# Discussion on SYCL Wed 21.09.2022
Present: SR, NN, TC, JT; AV (10 minutes late)

(AV missed the beginning)
SR explained that we (and JT) can do: 
1 breaking down the helicity loop and splitting the kernel
2 also looking at vectorization
while NN is mainly looking at madevent integration now

SR one thing we are looking into is the tensor cores for color algebra
NN discussed with codeplay developers, only available in single precision for the moment

SR just wanted to check that we do not waste work doing the same thing
NN good idea! seems this is not the case

NN Jorgen had a question about SYCL for targeting intel gpus, there is a flag for JIT compilation
Note also there is a --device-info (in the madgraph code?) that shows which devices are available

JT showed some graphs yesterday, NN this does not look like what we observe
AV: specifically JT sees ggttggg has cuda more performant than sycl
NN: no we actually see sycl more performant across all processes
AV: ok so this is one thing to check... JT/NN agree
NN: also the absolute numbers seem much lower (one order macgnitude?) lower than what we see

SR: Jorgen now also looking at other hardware, a different intel gpu
NN: be sure to include the right drivers, eg through the public oneAPI

AV: are the hardware features eg #threads and #blocks and warps the size?
NN: for AMD warp size (called something different) is 64 against 32 for nvidia, 
intel varies depending on hardware (older maybe 8? settling to 32 now more or less)

NN: one thing not available in sycl is streams

AV: is tensor cores specific to nvidia?
NN: yes not aware of anything similar in amd or intel
NN: heard that cannot use tensor cores at the same time in parallel?
AV/SR: yes Paul confirmed that, but tensor cores faster for matrix multiplication so that would still be good



