Alice Weekly Meeting: Software for Hardware Accelerators / PDP-SRC

Name: Alice Weekly Meeting: Software for Hardware Accelerators / PDP-SRC
Start: 2025-07-11T10:00:00+02:00
End: 2025-07-11T11:00:00+02:00
Location: No location set

Friday 11 Jul 2025, 10:00 → 11:00 Europe/Zurich

61230224927

David Rohr

Join via phone

- 10:00 → 10:20
  Discussion 20m
  
  Speakers: David Rohr (CERN), Giulio Eulisse (CERN)
  Color code: (critical, news during the meeting: green, news from this week: blue, news from last week: purple, no news: black)
  
  EPN GPU Topics:
  
  GPU Benchmarks in HS23 Contribution from ALICE
  
  Had a meeting last week, Gabriele will report on the status
  
  Sync reconstruction
  
  Async reconstruction
  
  Need to investigate short GPU stall problem.
  
  Limiting factor for pp workflow is now the TPC time series, which is to slow and creates backpressure (costs ~20% performance on EPNs). Enabled multi-threading as recommended by Matthias - need to check if it works.
  
  We can not set the GPU architectures to build fore in the environment variable field of Jenkins builds.
  
  Managed to run the o2-gpu-standalone-benchmark from an async build on CVMFS in the default GRID job container on the NERSC perlmutter site running on their A100 GPU.
  
  GPU ROCm / compiler topics:
  
  Issues that disappeared but not yet understood: random server reboot with alma 9.4, miscompilation with ROCm 6.2, GPU getting stuck when DMA engine turned off, MI100 stalling with ROCm 5.5.
  
  Problem with building ONNXRuntime with MigraphX support, to be checked.
  
  Need to find a way to build ONNXRuntime with support for CUDA and for ROCm.
  
  Try to find a better solution for the problem with __device__ inline functions leaking symbols in the host code.
  
  LLVM Bump to 20.1: status?
  
  ROCm 6.4.1 status:
  
  AMD is checking the reproducer. I have some idea how to narrow down where it miscompiles using different compile flags in per-kernel mode.
  
  Improved Standalone Benchmark CI, can now run RTC test for CUDA also with no GPU installed.
  
  Updating alidist/gpu-system to be build_requires only, and to generate a dummy modulefile (even if not used), as requested by Giulio.
  
  TPC / GPU Processing
  
  WIP: Use alignas() or find a better solution to fix alignment of monte carlo labels: https://its.cern.ch/jira/browse/O2-5314
  
  Waiting for TPC to fix bogus TPC transformations for good, then we can revert the workaround.
  
  Waiting for TPC to check PR which uses full cluster errors including average charge and occupancy map errors during seeding.
  
  Final solution: merging transformation maps on the fly into a single flat object: Draft version by Sergey exists but still WIP.
  
  Pending OpenCL2 issues:
  
  printf not working due to confirmed bug in clang, fix is being prepared. Prevents further debugging for now.
  
  Crash in merger, which can be worked around by disabling clang SPIRV optimization. Probably bug in clang, but need to fix printf first to debug.
  
  Also with optimization disabled, crashing later in TPC merging, need printf to debug.
  
  Next high priority topic: Improvements for cluster sharing and cluster attachment at lower TPC pad rows.
  
  Need to check the problem with ONNX external memory allocator.
- 10:20 → 10:25
  TPC ML Clustering 5m
  
  Speaker: Christian Sonnabend (CERN, Heidelberg University (DE))
  
  AW_072025_UPD.pdf
  Arranged a meeting with Alex Schmah for this time to discuss the right settings for PID analysis and SC distortion map fetching, so I can't join today or only later (maybe continue with someone else first). Presentation for Silvia (only first draft, will add more stuff) is attached to this meeting slot. Otherwise some updates on MC below and update on real data analysis in presentation (LHC24as full period). Update on GPU timing ready by this afternoon.
  
  Physics
  
  Correctly attached, wihtout fake tracks
  
  Even with extremely tight thresholds on the network, the total correctly attached attached clusters (non-fake) show improvement over GPU cluster finder
  
  Very similar behaviour for the non-fake normalised clusters (almost quadratic, like shown last week):
  
  At the point where correctly attached clusters cross with GPU CF, ~7% of tracks are lost while ~17% total clusters are saved
  
  All tracks (including fakes)
  
  Now, correclty attached clusters drops monotonically with increasing threshold
  
  Interestingly the 2D networks keep the same number of correctly attached for all thresholds -> But they keep also much more fakes
  
  Also verified by the attachment efficiency vs. fake-rate
- 10:25 → 10:30
  
  GPU Parameter Optimizations 5m
  
  Speaker: Gabriele Cimador (Universita e INFN Torino (TO))
- 10:30 → 10:35
  
  Efficient Data Structures 5m
  
  Speaker: Dr Oliver Gregor Rietmann (CERN)
- 10:35 → 10:40
  
  Following up GPU to-dos 5m
  
  Speaker: Dr Vikas Singhal (Department of Atomic Energy (IN))
- 10:40 → 10:45
  
  TPC Clusterization / OpenCL / Highly Ionizing Particles 5m
  
  Speaker: Felix Weiglhofer (Goethe University Frankfurt (DE))
- 10:45 → 10:50
  
  ITS Tracking 5m
  
  Speaker: Matteo Concas (CERN)
- 10:50 → 10:55
  
  Following up JIRA tickets 5m
  
  Speaker: Ernst Hellbar (CERN)

Choose timezone

Alice Weekly Meeting: Software for Hardware Accelerators / PDP-SRC