Alice Weekly Meeting: Software for Hardware Accelerators
→
Europe/Zurich
-
-
10:30
→
10:50
Discussion 20mSpeaker: David Rohr (CERN)
Color code: (critical, news from this week: blue, news from last week: purple, no news: black)
Sync reconstruction
Async reconstruction
- Need to investigate short GPU stall problem.
- Limiting factor for pp workflow is now the TPC time series, which is to slow and creates backpressure (costs ~20% performance on EPNs). Enabled multi-threading as recommended by Matthias - need to check if it works.
- Test with GPU GRID jobs at NERSC pending.
- Will tune existing 16-core settings, add a SITEARCH for 16core CPU, and 16coreCPU + generic NVIDIA / AMD GPU, like for 8 core.
- Will retune EPN async workflow for TPC + ITS on GPU on 2025 data.
GPU ROCm / compiler topics:
- Problem with building ONNXRuntime with MigraphX support.
- Need to find a way to build ONNXRuntime with support for CUDA and for ROCm.
- Try to find a better solution for the problem with __device__ inline functions leaking symbols in the host code.
- Miscompilation / internal compiler error fixed in new clang for ROCm 7.x, SDMA engine synchronization bug still not fixed.
- Serialization bug pending.
- Miscompilation on MI 100 leading to memory error pending.
- New miscompilation on MI 50 with ROCm 7.0 when RTC disabled.
- New miscompilation on MI 50 on ROCm 6.3 and 7.0 when RTC enabled, with latest software. Have a workaround for Pb-Pb data taking, but not compatible to latest tracking developments.
- Waiting for ROCm 7.2, which could fix the MI100 serialization issue for good. Not clear yet with regards to miscompilation problems.
- Dev2 server available with MI210 and ROCm 7.2
- Runs stable on MI50 and MI210 GPUs, didn't check correctness yet.
- Serialization issue on MI100 seems fixed, or at least does no longer trigger immediately.
- Instead, having other random crashes on MI100 now.
- Seeing a crash on RTX6000 NVIDIA GPU with some data sets, must be either a bug in our code or on the NVIDIA side.
TPC / GPU Processing
- WIP: Use alignas() or find a better solution to fix alignment of monte carlo labels: https://its.cern.ch/jira/browse/O2-5314
- Waiting for TPC to fix bogus TPC transformations for good, then we can revert the workaround.
- Final solution: merging transformation maps on the fly into a single flat object:
- Sergey is checking. Differente in treatment outside of the measured region. New treatment implemented in the old maps, to check if that causes the differences.
- Need to check the problem with ONNX external memory allocator.
- Next high priority topic: Improvements for cluster sharing and cluster attachment at lower TPC pad rows. PR: https://github.com/AliceO2Group/AliceO2/pull/14542
Other topics:
- Molex connectors and cables for GPU CI server arrived, need to assemble them.
- Removed bogus setting of rocm path in LD_LIBRARY_PATH by ONNXRuntime recipe. Seems to have no side effects.
- Build failures without clear error when using old aliBuild versions. Added a feature to query the version to aliBuild, will disable all GPU builds for old versions ins the future.
EPN GPU Topics:
-
10:50
→
10:55
TPC ML Clustering 5mSpeaker: Christian Sonnabend (CERN, Heidelberg University (DE))
Cluster error parameterization
Current approach:
- O2/GPU/GPUTracking/Merger/GPUTPCGMTrackParam.cxx:
prop.GetErr2(err2Y, err2Z, param, zz, cluster.row, clusterState, cluster.sector, time, invAvgCharge, invCharge);#ifndef GPUCA_GPUCODEfprintf(fpdumperr, "%d,%d,%f,%f,%d,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f\n", iTrk, cluster.num, err2Y, err2Z, clusterState, xx, yy, zz, mP[0], mP[1], mP[2], mP[3], mP[4], mC[0], mC[2], mC[5], mC[9], mC[14]);#endif - For every dumped cluster: Parse the cluster.num coloumn and only keep the latest cluster
- Training the NN:
-
labels_x = ["clusterState", "xx", "yy", "zz", "mP[2]", "mP[3]", "mP[4]", "mC[0]", "mC[2]", "mC[5]", "mC[9]", "mC[14]"]labels_y = ["yy", "zz", "mP[0]", "mP[1]"]
- data_Y[0] = data_Y["yy"]**2 - data_Y["mP[0]"]**2
- data_Y[1] = data_Y["zz"]**2 - data_Y["mP[1]"]**2
-
- Normalized mC[0] and mC[2] using log10() because their values where O(10^4) to O(10^7)
- Training pipeline is set up but training doesn't converge well yet...

Checking the distributions


- O2/GPU/GPUTracking/Merger/GPUTPCGMTrackParam.cxx:
- 10:55 → 11:00
-
11:00
→
11:05
Efficient Data Structures 5mSpeaker: Dr Oliver Gregor Rietmann (CERN)
NextGenTrigger Task 1.7
- Implementing our SoA code in MadGraph, together with
- Stefan Roiser
- Daniele Massaro
- No news
Implement NGT SoA Code in O2 standalone benchmark
- Working on this fork of the AliceO2 repo, with a CI pipeline:
- Running on NGT hardware with 4 different GPUs (Nvidia and AMD)
- Extended CI-pipline to fail if GPU.out changes
- Implemented SoA in:
- SectorTracker
- GPUTPCBaseTrackParam
- GPUTPCTrackParam
- GPUTPCTracklet
- SectorTracker
- Maybe go back from CRTP to classical inheritance
- New way of providing iterators:
- Less code and simpler code: https://godbolt.org/z/jMaa3c48E
- Supports the following sort functions:
- std::sort
- thrust::sort
- our custom sort function
- Next Steps:
- Make better use of SoA to improve performance
- Try David's suggestion
- Implementing our SoA code in MadGraph, together with
-
11:05
→
11:10
Following up GPU to-dos 5mSpeaker: Dr Vikas Singhal (Department of Atomic Energy (IN))
- 11:10 → 11:15
-
11:15
→
11:20
ITS Tracking 5mSpeakers: Felix Schlepper (CERN, Heidelberg University (DE)), Gabriele Cimador (CERN, Università and INFN Torino), Matteo Concas (CERN)
Felix: Last PR contained some bug, will probably revert for now, tests crashed on all jobs on the epns. Hard to reproduce though, get a single sporadic invalid read for processing 300 TFs (with compute-sanitizer memcheck/racecheck). Have to think about a better strategy.
Gabriele: Fixed bug in trackleting (or at least now tracklets are found), will resume work on ITS vertexing
-
11:20
→
11:25
System Run Coordination Topics 5mSpeaker: Ernst Hellbar (CERN)
-
10:30
→
10:50