Alice Weekly Meeting: Software for Hardware Accelerators / PDP-SRC
-
-
10:00
→
10:20
Discussion 20mSpeakers: David Rohr (CERN), Giulio Eulisse (CERN)
Color code: (critical, news during the meeting: green, news from this week: blue, news from last week: purple, no news: black)
EPN GPU Topics:
GPU Benchmarks in HS23 Contribution from ALICE
- Had a meeting last week, Gabriele will report on the status
Sync reconstruction
Async reconstruction
- Need to investigate short GPU stall problem.
- Limiting factor for pp workflow is now the TPC time series, which is to slow and creates backpressure (costs ~20% performance on EPNs). Enabled multi-threading as recommended by Matthias - need to check if it works.
- We can not set the GPU architectures to build fore in the environment variable field of Jenkins builds.
- Managed to run the o2-gpu-standalone-benchmark from an async build on CVMFS in the default GRID job container on the NERSC perlmutter site running on their A100 GPU.
GPU ROCm / compiler topics:
- Issues that disappeared but not yet understood: random server reboot with alma 9.4, miscompilation with ROCm 6.2, GPU getting stuck when DMA engine turned off, MI100 stalling with ROCm 5.5.
- Problem with building ONNXRuntime with MigraphX support, to be checked.
- Need to find a way to build ONNXRuntime with support for CUDA and for ROCm.
- Try to find a better solution for the problem with __device__ inline functions leaking symbols in the host code.
- LLVM Bump to 20.1: status?
- ROCm 6.4.1 status:
- AMD is checking the reproducer. I have some idea how to narrow down where it miscompiles using different compile flags in per-kernel mode.
- Improved Standalone Benchmark CI, can now run RTC test for CUDA also with no GPU installed.
- Updating alidist/gpu-system to be build_requires only, and to generate a dummy modulefile (even if not used), as requested by Giulio.
TPC / GPU Processing
- WIP: Use alignas() or find a better solution to fix alignment of monte carlo labels: https://its.cern.ch/jira/browse/O2-5314
- Waiting for TPC to fix bogus TPC transformations for good, then we can revert the workaround.
- Waiting for TPC to check PR which uses full cluster errors including average charge and occupancy map errors during seeding.
- Final solution: merging transformation maps on the fly into a single flat object: Draft version by Sergey exists but still WIP.
- Pending OpenCL2 issues:
- printf not working due to confirmed bug in clang, fix is being prepared. Prevents further debugging for now.
- Crash in merger, which can be worked around by disabling clang SPIRV optimization. Probably bug in clang, but need to fix printf first to debug.
- Also with optimization disabled, crashing later in TPC merging, need printf to debug.
- printf not working due to confirmed bug in clang, fix is being prepared. Prevents further debugging for now.
- Next high priority topic: Improvements for cluster sharing and cluster attachment at lower TPC pad rows.
- Need to check the problem with ONNX external memory allocator.
-
10:20
→
10:25
TPC ML Clustering 5mSpeaker: Christian Sonnabend (CERN, Heidelberg University (DE))
Arranged a meeting with Alex Schmah for this time to discuss the right settings for PID analysis and SC distortion map fetching, so I can't join today or only later (maybe continue with someone else first). Presentation for Silvia (only first draft, will add more stuff) is attached to this meeting slot. Otherwise some updates on MC below and update on real data analysis in presentation (LHC24as full period). Update on GPU timing ready by this afternoon.
Physics
Correctly attached, wihtout fake tracks

- Even with extremely tight thresholds on the network, the total correctly attached attached clusters (non-fake) show improvement over GPU cluster finder

- Very similar behaviour for the non-fake normalised clusters (almost quadratic, like shown last week):

- At the point where correctly attached clusters cross with GPU CF, ~7% of tracks are lost while ~17% total clusters are saved

All tracks (including fakes)
- Now, correclty attached clusters drops monotonically with increasing threshold

- Interestingly the 2D networks keep the same number of correctly attached for all thresholds -> But they keep also much more fakes

- Also verified by the attachment efficiency vs. fake-rate

-
10:25
→
10:30
GPU Parameter Optimizations 5mSpeaker: Gabriele Cimador (Universita e INFN Torino (TO))
-
10:30
→
10:35
Efficient Data Structures 5mSpeaker: Dr Oliver Gregor Rietmann (CERN)
-
10:35
→
10:40
Following up GPU to-dos 5mSpeaker: Dr Vikas Singhal (Department of Atomic Energy (IN))
-
10:40
→
10:45
TPC Clusterization / OpenCL / Highly Ionizing Particles 5mSpeaker: Felix Weiglhofer (Goethe University Frankfurt (DE))
-
10:45
→
10:50
ITS Tracking 5mSpeaker: Matteo Concas (CERN)
-
10:50
→
10:55
Following up JIRA tickets 5mSpeaker: Ernst Hellbar (CERN)
-
10:00
→
10:20