# COMPUTE & ACCELERATOR FORUM

STEFAN ROISER, CERN GDB, 10 JAN 2024

#### WHY, WHAT, WHEN, WHO

- Initially devised as a series of seminar style presentations for in-depth presentations and discussions on latest advances in compute acceleration and heterogeneous computing
- ▶ Second Wednesday of the month, 16:30, max 1 ½ hours
  - Usually right after GDB in the same room (but different zoom)
  - Presentations also recorded and linked at the event pages
- Agenda category @ <a href="https://indico.cern.ch/category/12741/">https://indico.cern.ch/category/12741/</a>
- Co-organised by Maria Girone, Graeme Stewart, SR (all CERN), Ben Morgan (Univ Warwick), Michael Bussmann (Helmholtz)
  - Contact us at compute-accelerator-forum-organizers@cern.ch

### TOPICS SO FAR



In order to reflect on the very diverse set of topics being touched on the series title changes as of this year to "Compute & Accelerator Forum"

# FLASHING A FEW PRESENTATIONS WITH POSSIBLE CONNECTIONS / OVERLAP TO THE GDB

Please see the indico links for details and video recording

### DATA PROCESSING INFRASTRUCTURES



A Herten (Jülich), 9 Jun 2021, https://indico.cern.ch/event/975011/

Condevelopment and status

Condevelopment and st

#### Code structure and technology choices

- Code organized in Python packages.
- User interactions via Python scripts.
- Data structures described, allocated and fully exposed (r/w) in Python including GPU (avoid unnecessary copy).
- Performance critical code written in C with using automatically generated C-API from Python.
- Code compiled at run time and on-demand:
  - Allows problem specific optimizations essential on GFU
  - · Reduce writing, testing, executing cycle
- Dependencies:
- numpy: allocate and exchange memory
- cffi (and a C compiler): generates binary Python modules that can be imported at run time and prepare arguments
- cupy (and a cuda driver): implements rich numpy-like array on device and compiles cuda kernels
- pyopencl (and OpenCL drivers): wraps OpenCL API and implement a basic numpy-like array

latest updates on of the CERN GPU infrastructure

What's New

Action items from the last update in this forum (Oct 2021)

https://indico.cern.ch/event/975015/

Upgrade Nvidia drivers (470.82.x) for CUDA 11.4 (done)

Support for GPU profiling in vGPU nodes (done)

Reminder: vGPU is not physical partitioning but time sharing up to 4x

Profiling possible with vGPUs - with new drivers (done)

General availability of vGPU setup (instabilities detected)

R Rocha (CERN), CERN GPU Infrastructure Updates 8 June 2022, <a href="https://indico.cern.ch/event/1073643/">https://indico.cern.ch/event/1073643/</a> 13 Oct 2021, <a href="https://indico.cern.ch/event/975015/">https://indico.cern.ch/event/975015/</a>

20 Oct 2020, https://indico.cern.ch/event/950196/

R De Maria (CERN),

8 Dec 2021, <a href="https://indico.cern.ch/event/975017/">https://indico.cern.ch/event/975017/</a>



C Mayr (TU Dresden), 14 Sep 2022, <a href="https://indico.cern.ch/event/1073646/">https://indico.cern.ch/event/1073646/</a>

#### What is RISC-V? (on one slide)

Code using SSE instructions (omitting setup code)

inc

jne

ret

done:

rcx

rax, rcx scalar

- Open Standard Instruction Set Architecture (ISA)
  - Specifications are open source, no royalty fees
  - RISC-V cores can be open or proprietary



化电子化图片化图片化图片 图 的现代

≥ ≥ 930

2/44

- ► Started at the University of California, Berkley, in 2010
- ► Since 2020 published by RISC-V International located in Switzerland
- ► Modular design: base ISA with very few (integer) instructions
  - Many standard extensions and possibility for custom instructions

```
vectorized:
               xmm2, xmmword ptr [rsi + r8]
                                                 # load x
                                                 # multiply with a (in xmm1)
                xmm2, xmm1
                xmm3, xmmword ptr [rdx + r8]
                                                 # load y
                xmm3, xmm2
                                                 # ax + y
                xmmword ptr [rdx + r8], xmm3
        movups
                                                # store y
                r8, 16
                                  # compute next offset (+ 4 elements)
        add
                rdi, r8
                                  # compare to final offset to process
                                  # with vectorized loop (pre-computed)
                vectorized
        jne
                                  # check if elements remain
                rcx, rax
        cmp
                done
                                  # otherwise done
scalar:
                xmm1, dword ptr [rsi + 4*rcx]
                                                 # load x
                xmm1, xmm0
                                                 # multiply with a
        mulss
                                                # load and add y
                xmm1, dword ptr [rdx + 4*rcx]
        addss
                dword ptr [rdx + 4*rcx], xmm1
                                                # store y
        movss
```

J Hahnfeld (CERN), 13 Sep 2023, <a href="https://indico.cern.ch/event/1264300/">https://indico.cern.ch/event/1264300/</a>

#### GPU DEVELOPMENTS IN THE CONTEXT OF LHC EXPERIMENTS



J Niermann (Univ Goettingen), 14 Dec 2022, <a href="https://indico.cern.ch/event/1160623/">https://indico.cern.ch/event/1160623/</a>



D Rohr (CERN), 12 July 2023, https://indico.cern.ch/event/1264298/



A Bocci (CERN), 9 Mar 2022, https://indico.cern.ch/event/1073640/



A Valassi (CERN), 8 Feb 2023, <a href="https://indico.cern.ch/event/1207838/">https://indico.cern.ch/event/1207838/</a>

#### MORE HEP PROJECTS



C Legget (LBNL), 14 June 2023, <a href="https://indico.cern.ch/event/1264297/">https://indico.cern.ch/event/1264297/</a>



D Giordano (CERN), 14 Feb 2024, https://indico.cern.ch/event/1329686/

#### **C&A FORUM AND GDB?**

- ▶ Last year organised a combined event "<u>CA Forum & HSF Reco WG</u>" -> also w/ GDB?
- Would you be interested in more topics with a GDB connection?
  - Individual sites adoption of GPUs, FPGAs
  - WLCG strategy towards hardware acceleration
  - Scheduling of hardware accelerated applications
  - • •
- ▶ Please contact any of us (\*) or at <u>compute-accelerator-forum-organizers@cern.ch</u>

(\*) Stefan Roiser, Graeme Stewart, Maria Girone, Ben Morgan, Michael Bussmann

#### WHAT DID WE LEARN IN THE PAST 3 YEARS

- Many in-depth presentations on interesting tools, languages, etc. GPU abstraction layers, CADNA,
- The computing world is getting ever more diverse with more architectures and chip types coming up
- No clear winner to heard the cats: oneAPI/SYCL, Alpaka/Llama, Kokkos, pragmas, ...
  - Changing the code conceptually is the major effort, how to abstract across platforms is less of an issue
  - C++ standard is also / will provide ways to cope
- New languages to cope with the new hardware landscape: D, Julia, ...

#### THANK YOU!!

Info on upcoming meetings: compute-accelerator-forum-announce@cern.ch

10 Jan 2024, J Pivarski (Princeton), "Garbage Collectors: Java, Python, Julia", <a href="https://indico.cern.ch/event/1329685/">https://indico.cern.ch/event/1329685/</a>

14 Feb 2024, D. Giordano, "HEPIX Benchmarking", <a href="https://indico.cern.ch/event/1329686/">https://indico.cern.ch/event/1329686/</a>

I EXT

## **BACKUP**

#### TOPICS POSSIBLY OVERLAPPING / INTERESTING FOR THE GDB

- Data processing infrastructures, e.g. JUWELS, GPUs @ CERN, GPUs at CERN Beams
- Hardware platforms: SpiNNaker2, RISC-V,
- Status of GPU developments at the LHC: ACTS, Patatrack, Madgraph, Adept,
   Celeritas, ALICE O2, LHCb Allen, Belle II
- More HEP projects: HEP-CCE, HEPIX Benchmarking,