Compute Accelerator Forum - Codeplay SYCL and CERN GPU Update

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

 

To receive annuoncements and information about this forum please subscribe to compute-accelerator-forum-announce@cern.ch

 

Videoconference
Compute Accelerator Forum
Zoom Meeting ID
69560339820
Host
Graeme A Stewart
Alternative hosts
Thomas Nik Bazl Fard, Benjamin Morgan, Maria Girone, Stefan Roiser
Useful links
Join via phone
Zoom URL

CERN IT GPU Infrastructure News 

  • [Gordon] The A100s have different amounts of memory, how much did CERN get?    

    • A. 40GB AFAICR.

  • [Gordon] Will partitions be used or can you get a whole card?

    • A. It will be a mixed layout, depending on request. One limitation is that one cannot re-partiion “live” (have to stop workloads).

  • [Attila] How are the partitions visible to the user? Does this look like a single GPU?

    • A. A full GPU would be 1 PCI device; partitioned cards appear as additional PCI devices. It’s one PCI device per partition (even the fat ones). Limitation is that only certain splits are allowed (the horizontal lines in the diagram).

  • [Attila] The current drivers (470) are now a bit old. Is there a problem with having the latest versions all the time?

    • A. If you have a VM you can use your own drivers, at whichever version. The batch version is managed by the batch system, etc. There was not a big push to update the drivers - more for the CUDA version.

  • [Antonio] Instabilities with vGPUs - what are they?

    • A. Started to see kernel freezes on PCI passthrough for some workloads. Have contacted Nvida support, but not so predictive to date. Freeze kills all workloads on that physical machines. Hence reluctance to make this general.

  • [Antonio] ATS-IT indico is closed.

    • A. Ricardo will check this.

  • [Charles] Can you request systems with multiple instances? To test multi-GPU workloads?

    • A. Yes, subject to pre-partitioning, we can give multiple slots to a single “workload”. 

An Introduction to using SYCL with Nvidia GPUs and beyond

Useful links:

 

  • [Attila] I took a look at the code - the scripts don’t specify any optimisation levels. It uses default. The Intel compiler is much more aggressive ‘out of the box’ than other compilers. For host code some optimisations are enabled.

    • A. CUDA is maybe overeager with loop unrolling, there is more branchy code. Yes, will look at compiler flags. However, experience was the default DPC++ flags were not that aggressive. 

    • [Attila] managed to get very similar performance with SYCL, but never got code running faster in SYCL.

  • [Attila] DPCT is an Intel product. DId Codeplay help develop this tool?

    • A. No, we didn’t develop this. Worked on DPC++.

  • [Attila] WIth the Intel acquisition of Codeplay, is there a plan for ComputeCPP?

    • A. All very fresh, so can’t really say anything yet. ComputeCPP has a lot of good features.

  • [Attila] Would really like to be able to use DPC++ with plugins, so that we don’t have to recompile the compiler.

    • A. This is a tricky thing. Some gotchas are old CL headers in CUDA, can be a real pain to get rid of.

  • [Charles] How does SYCL handle atomic when the backend is a multi-core system (like 70 CPU cores)?

    • A. Not completely sure - there are software fallbacks.

    • [Attila] this would be in the openCL cpu driver?

There are minutes attached to this event. Show them.