10:00 → 10:20

Discussion 20m

Speaker: David Rohr (CERN)

Color code: (critical, news from this week: blue, news from last week: purple, no news: black)

Sync reconstruction

Async reconstruction

Need to investigate short GPU stall problem.
Limiting factor for pp workflow is now the TPC time series, which is to slow and creates backpressure (costs ~20% performance on EPNs). Enabled multi-threading as recommended by Matthias - need to check if it works.
Test with GPU GRID jobs at NERSC pending.
Will tune existing 16-core settings, add a SITEARCH for 16core CPU, and 16coreCPU + generic NVIDIA / AMD GPU, like for 8 core.
Will retune EPN async workflow for TPC + ITS on GPU on 2025 data.

GPU ROCm / compiler topics:

Problem with building ONNXRuntime with MigraphX support.
Need to find a way to build ONNXRuntime with support for CUDA and for ROCm.
Try to find a better solution for the problem with __device__ inline functions leaking symbols in the host code.
Need to check ROCm 7.2 corrtecness.
Need to understand and fix crash on RTX Pro 6000.

TPC / GPU Processing

WIP: Use alignas() or find a better solution to fix alignment of monte carlo labels: https://its.cern.ch/jira/browse/O2-5314
Waiting for TPC to fix bogus TPC transformations for good, then we can revert the workaround.
Final solution: merging transformation maps on the fly into a single flat object:
- Maps now yielding correct results, but 1.5x performance regression running on GPUs. Must be investigated.
Need to check the problem with ONNX external memory allocator.
Next high priority topic: Improvements for cluster sharing and cluster attachment at lower TPC pad rows. PR: https://github.com/AliceO2Group/AliceO2/pull/14542
TODO: Workaround for wrong field used for encoding online, make memory scaling factors configurable via ConfigurableParam

Other topics:

Need to bump ONNXRuntime to 1.24, Giulio is checking, needed for ROCm 7.2 - Status?
Status of bumping CMake and boost (https://github.com/alisw/alidist/pull/6135):
- Remaining issues:
  - libwebsockets kernel headers
  - Ernst following up issues with ODC/DDS
  - aarch uses old python, trying to compilg old xgboost, which is incompatible.
  - One compilation problem on MacOS with boost histogram.

EPN GPU Topics:

10:20 → 10:25

TPC ML Clustering 5m

Speaker: Christian Sonnabend (CERN, Heidelberg University (DE))

Cluster error parametrization

Tried with cluster error dump within the track interpolation + further cleanup
2D grid searches for best scaling parameters for scaleY*err2Y, scaleZ*err2Z
Stuck to learning MSE, ((track_pos - cluster_pos)^2 - track_cov), where track_cov is now taken in intermediate interpolation stage

Combining several grid runs

Cubic spline interpolation

This is not the full picture -> Very sensitive to long / short tracks

A "good" configurations (scaleY, scaleZ): (0.01, 0.1)

6% track reduction (default reconstruction: 388k tracks on the given dataset)

Getting close but not there yet

10:25 → 10:30

GPU Parameter Optimizations 5m

Speaker: Gabriele Cimador (CERN, Università and INFN Torino)

GPU Parameters news

Used MI50 RTCCache with RadeonVII, still same performance downgrades
Will try to make the standalone benchmark run with MI210

10:30 → 10:35

Efficient Data Structures 5m

Speaker: Dr Oliver Gregor Rietmann (CERN)

NextGenTrigger Task 1.7

Name: Alice Weekly Meeting: Software for Hardware Accelerators
Start: 2026-03-11T10:00:00+01:00
End: 2026-03-11T11:30:00+01:00
Location: No location set

CHEP talk together with Jolly got accepted
Idea to increase memory bandwidth in the CLUE clustering algorithm of CMS

Implement NGT SoA Code in O2 standalone benchmark

Working on this fork of the AliceO2 repo, with a CI pipeline:
- Running on NGT hardware with 4 different GPUs (Nvidia and AMD)
- Extended CI-pipline to fail if GPU.out changes
Implemented SoA in:
- SectorTracker
  - GPUTPCBaseTrackParam
  - GPUTPCTrackParam
  - GPUTPCTracklet
  - GPUTPCTrack
The SoA library changed a lot in the course of implementing it in O2. Thus I have now adapted:
- Unit tests
- Documentation
- Benchmarks (see plot below)
Generate CSV file of the standalone benchmark
Compute e.g. standard deviations of the benchmark and compare to baseline (dev branch)
Display the CSV file in the GitHub Pipelines
Next Steps:
- Make better use of SoA to improve performance
- Try David's suggestion

Issues

Even if I provide a .par file for AMD W7900, the outcome still depends on the settings in GPUParameters.csv
Can't really make sense of the standalone benchmarks timers:

GPUTPCCFDeconvolution	144541	141221	140670	0	0	0	0	0
GPUTPCCFClusterizer	63644	59016.6	58720.8	0	0	0	0	0
GPUTPCCFGather	0	0	0	0	0	0	0	22659.3
GPUTPCCreateOccupancyMap_fill	0	0	0	0	0	0	0	78840
GPUTPCCreateOccupancyMap_fold	0	0	0	0	0	0	0	850.289
GPUTPCCreateTrackingData	138091	138115	137295	138089	109361	109173	108133	109082
GPUTPCNeighboursFinder	47747.5	48140.9	45961.9	45711.5	38250.3	37788.4	37471.3	36531.6
GPUTPCNeighboursCleaner	5622.41	5699.05	5537.97	5512.89	4524.68	4494.48	4463.64	4443.52

10:35 → 10:40

Following up GPU to-dos 5m

Speaker: Dr Vikas Singhal (Department of Atomic Energy (IN))

10:40 → 10:45

TPC Clusterization / OpenCL / Highly Ionizing Particles 5m

Speaker: Felix Weiglhofer (CERN)

OpenCL

No news.

GPU Servers

CI server installed and has IP.

Both GPUs seem to be running. But Nvidia card is very flaky. Riser cable? Will try to set PCIe version in BIOS.

Handing node to Giulio and Sergio next for CI integration.

Highly Ionizing Particles

Had some issues with running simulations. Couldn't test on simdata yet.

10:45 → 10:50

ITS Tracking 5m

Speakers: Felix Schlepper (CERN, Heidelberg University (DE)), Gabriele Cimador (CERN, Università and INFN Torino), Matteo Concas (CERN)

Felix: playing around with ITS raw data to 'validate' the staggered readout&reconstruction.

Gabriele:

Changed algorithm
Moved to algorithm inspired by Run2 implementation
Basically it is a 3D histogramming, but "Raytracing vertexer" sounds cooler
Define a 3d voxelisation around the beamline
Define tracklets as lines parametrized as p(t) = p₀+tu
where p₀ is cluster on innermost ITS layer, u is direction of the line
For each track
- Calculate t_enter and t_exit for the voxelisation
- Sample the line at uniform t_step between this range
- Each time the line steps into a voxel, update the count

Select the local maximums that surpass a threshold
Then fit the lines that passes through the vertex candidates
Below video where low and high mult vertices are detected, sliding through z
video
Result with only candidates finding, without fitting:
Precision: 0.9924, Recall: 0.8667, F1: 0.9253
Total true vertices: 150
Matched vertices: 130
Missed true vertices: 20
Total fitted vertices: 131
Fake vertices: 1
Mean residuals (x,y,z): [0.00014556 0.00058302 0.00027437]
Std residuals (x,y,z): [0.00432429 0.00439 0.00784673]
Next steps:
- Just select highest mult vertex
- do an histogram along that z to find lower mult vertices

10:50 → 10:55

System Run Coordination Topics 5m

Speaker: Ernst Hellbar (CERN)