Compute Accelerator Forum - Codeplay SYCL and CERN GPU Update

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

 

To receive annuoncements and information about this forum please subscribe to compute-accelerator-forum-announce@cern.ch

 

Videoconference
Compute Accelerator Forum
Zoom Meeting ID
69560339820
Host
Graeme A Stewart
Alternative hosts
Thomas Nik Bazl Fard, Benjamin Morgan, Maria Girone, Stefan Roiser
Useful links
Join via phone
Zoom URL

CERN IT GPU Infrastructure News 

  • [Gordon] The A100s have different amounts of memory, how much did CERN get?    

    • A. 40GB AFAICR.

  • [Gordon] Will partitions be used or can you get a whole card?

    • A. It will be a mixed layout, depending on request. One limitation is that one cannot re-partiion “live” (have to stop workloads).

  • [Attila] How are the partitions visible to the user? Does this look like a single GPU?

    • A. A full GPU would be 1 PCI device; partitioned cards appear as additional PCI devices. It’s one PCI device per partition (even the fat ones). Limitation is that only certain splits are allowed (the horizontal lines in the diagram).

  • [Attila] The current drivers (470) are now a bit old. Is there a problem with having the latest versions all the time?

    • A. If you have a VM you can use your own drivers, at whichever version. The batch version is managed by the batch system, etc. There was not a big push to update the drivers - more for the CUDA version.

  • [Antonio] Instabilities with vGPUs - what are they?

    • A. Started to see kernel freezes on PCI passthrough for some workloads. Have contacted Nvida support, but not so predictive to date. Freeze kills all workloads on that physical machines. Hence reluctance to make this general.

  • [Antonio] ATS-IT indico is closed.

    • A. Ricardo will check this.

  • [Charles] Can you request systems with multiple instances? To test multi-GPU workloads?

    • A. Yes, subject to pre-partitioning, we can give multiple slots to a single “workload”. 

An Introduction to using SYCL with Nvidia GPUs and beyond

Useful links:

 

  • [Attila] I took a look at the code - the scripts don’t specify any optimisation levels. It uses default. The Intel compiler is much more aggressive ‘out of the box’ than other compilers. For host code some optimisations are enabled.

    • A. CUDA is maybe overeager with loop unrolling, there is more branchy code. Yes, will look at compiler flags. However, experience was the default DPC++ flags were not that aggressive. 

    • [Attila] managed to get very similar performance with SYCL, but never got code running faster in SYCL.

  • [Attila] DPCT is an Intel product. DId Codeplay help develop this tool?

    • A. No, we didn’t develop this. Worked on DPC++.

  • [Attila] WIth the Intel acquisition of Codeplay, is there a plan for ComputeCPP?

    • A. All very fresh, so can’t really say anything yet. ComputeCPP has a lot of good features.

  • [Attila] Would really like to be able to use DPC++ with plugins, so that we don’t have to recompile the compiler.

    • A. This is a tricky thing. Some gotchas are old CL headers in CUDA, can be a real pain to get rid of.

  • [Charles] How does SYCL handle atomic when the backend is a multi-core system (like 70 CPU cores)?

    • A. Not completely sure - there are software fallbacks.

    • [Attila] this would be in the openCL cpu driver?

There are minutes attached to this event. Show them.
    • 1
      News
      Speakers: Benjamin Morgan (University of Warwick (GB)), Graeme A Stewart (CERN), Dr Maria Girone (CERN), Michael Bussmann (Helmholtz-Zentrum Dresden - Rossendorf), Stefan Roiser (CERN)
    • 2
      CERN IT GPU Infrastructure News
      Speakers: Ricardo Brito Da Rocha (CERN), Ricardo Rocha (CERN)
    • 3
      An Introduction to using SYCL with Nvidia GPUs and beyond

      The National Laboratories in the United States, European groups including ENCCS, and the UK Exascale program are all adopting SYCL as a way to support existing and future supercomputers from different vendors including AMD, Intel, Nvidia and beyond. By using SYCL developers can widen their target architectures from a single code base. SYCL is an industry defined multiarchitecture programming interface that can be used to target multiple accelerator architectures. SYCL supports a wide range of targets using standard C++ syntax and semantics, with libraries developed using SYCL for math and neural network operations.

      This session will help you to understand how you can migrate your development environment from CUDA to SYCL whilst continuing to target Nvidia GPUs and retain performance. Beyond this, the same code can be run on other processors including Intel. Using nbody simulation project code written in CUDA we will show how the code is automatically translated to SYCL and then compiled using the DPC++ compiler. Furthermore we will present some performance tips and tricks to ensure you can get the best performance from your SYCL code on Nvidia GPUs.

      Speaker: Joe Todd (Codeplay)