Alice Weekly Meeting: Software for Hardware Accelerators

Europe/Zurich
Zoom Meeting ID
61230224927
Host
David Rohr
Useful links
Join via phone
Zoom URL
    • 10:00 10:20
      Discussion 20m
      Speaker: David Rohr (CERN)

      Color code: (criticalnews from this week: blue, news from last week: purple, no news: black)

      Sync reconstruction

       

      Async reconstruction

      • Need to investigate short GPU stall problem.
      • Limiting factor for pp workflow is now the TPC time series, which is to slow and creates backpressure (costs ~20% performance on EPNs). Enabled multi-threading as recommended by Matthias - need to check if it works.
      • Test with GPU GRID jobs at NERSC pending.
      • Will tune existing 16-core settings, add a SITEARCH for 16core CPU, and 16coreCPU + generic NVIDIA / AMD GPU, like for 8 core.
      • Will retune EPN async workflow for TPC + ITS on GPU on 2025 data.

       

      GPU ROCm / compiler topics:

      • Problem with building ONNXRuntime with MigraphX support.
      • Need to find a way to build ONNXRuntime with support for CUDA and for ROCm.
      • Try to find a better solution for the problem with __device__ inline functions leaking symbols in the host code.
      • Need to check ROCm 7.2 corrtecness.
      • Need to understand and fix crash on RTX Pro 6000.

       

      TPC / GPU Processing 

      • WIP: Use alignas() or find a better solution to fix alignment of monte carlo labels: https://its.cern.ch/jira/browse/O2-5314
      • Waiting for TPC to fix bogus TPC transformations for good, then we can revert the workaround.
      • Final solution: merging transformation maps on the fly into a single flat object:
        • Maps now yielding correct results, but 1.5x performance regression running on GPUs. Must be investigated.
      • Need to check the problem with ONNX external memory allocator.
      • Next high priority topic: Improvements for cluster sharing and cluster attachment at lower TPC pad rows. PR: https://github.com/AliceO2Group/AliceO2/pull/14542
      • TODO: Workaround for wrong field used for encoding online, make memory scaling factors configurable via ConfigurableParam

       

      Other topics:

      • Need to bump ONNXRuntime to 1.24, Giulio is checking, needed for ROCm 7.2 - Status?
      • Status of bumping CMake and boost (https://github.com/alisw/alidist/pull/6135):
        • Remaining issues:
          • libwebsockets kernel headers
          • Ernst following up issues with ODC/DDS
          • aarch uses old python, trying to compilg old xgboost, which is incompatible.
          • One compilation problem on MacOS with boost histogram.

       

      EPN GPU Topics:

       

    • 10:20 10:25
      TPC ML Clustering 5m
      Speaker: Christian Sonnabend (CERN, Heidelberg University (DE))

      Cluster error parametrization

      • Tried with cluster error dump within the track interpolation + further cleanup
      • 2D grid searches for best scaling parameters for scaleY*err2Y, scaleZ*err2Z
      • Stuck to learning MSE, ((track_pos - cluster_pos)^2 - track_cov), where track_cov is now taken in intermediate interpolation stage

       

      Combining several grid runs

      Cubic spline interpolation

      • This is not the full picture -> Very sensitive to long / short tracks

      • A "good" configurations (scaleY, scaleZ): (0.01, 0.1)

       

      • 6% track reduction (default reconstruction: 388k tracks on the given dataset)

       

      Getting close but not there yet

       

    • 10:25 10:30
      GPU Parameter Optimizations 5m
      Speaker: Gabriele Cimador (CERN, Università and INFN Torino)

      GPU Parameters news

      • Used MI50 RTCCache with RadeonVII, still same performance downgrades
      • Will try to make the standalone benchmark run with MI210
    • 10:30 10:35
      Efficient Data Structures 5m
      Speaker: Dr Oliver Gregor Rietmann (CERN)
       

      NextGenTrigger Task 1.7

      • CHEP talk together with Jolly got accepted
      • Idea to increase memory bandwidth in the CLUE clustering algorithm of CMS

      Implement NGT SoA Code in O2 standalone benchmark

      • Working on this fork of the AliceO2 repo, with a CI pipeline:
        • Running on NGT hardware with 4 different GPUs (Nvidia and AMD)
        • Extended CI-pipline to fail if GPU.out changes
      • Implemented SoA in:
        • SectorTracker
          • GPUTPCBaseTrackParam
          • GPUTPCTrackParam
          • GPUTPCTracklet
          • GPUTPCTrack
      • The SoA library changed a lot in the course of implementing it in O2. Thus I have now adapted:
      • Generate CSV file of the standalone benchmark
      • Compute e.g. standard deviations of the benchmark and compare to baseline (dev branch)
      • Display the CSV file in the GitHub Pipelines
      • Next Steps:
        • Make better use of SoA to improve performance
        • Try David's suggestion

       

      Issues

      1. Even if I provide a .par file for AMD W7900, the outcome still depends on the settings in GPUParameters.csv
      2. Can't really make sense of the standalone benchmarks timers:

       

      GPUTPCCFDeconvolution 144541 141221 140670 0 0 0 0 0 0
      GPUTPCCFClusterizer 63644 59016.6 58720.8 0 0 0 0 0 0
      GPUTPCCFGather 0 0 0 0 0 0 0 22659.3 0
      GPUTPCCreateOccupancyMap_fill 0 0 0 0 0 0 0 78840 0
      GPUTPCCreateOccupancyMap_fold 0 0 0 0 0 0 0 850.289 0
      GPUTPCCreateTrackingData 138091 138115 137295 138089 109361 109173 108133 109082 0
      GPUTPCNeighboursFinder 47747.5 48140.9 45961.9 45711.5 38250.3 37788.4 37471.3 36531.6 0
      GPUTPCNeighboursCleaner 5622.41 5699.05 5537.97 5512.89 4524.68 4494.48 4463.64 4443.52 0

       

    • 10:35 10:40
      Following up GPU to-dos 5m
      Speaker: Dr Vikas Singhal (Department of Atomic Energy (IN))
    • 10:40 10:45
      TPC Clusterization / OpenCL / Highly Ionizing Particles 5m
      Speaker: Felix Weiglhofer (CERN)

      OpenCL

      No news.

      GPU Servers

      CI server installed and has IP.

      Both GPUs seem to be running. But Nvidia card is very flaky. Riser cable? Will try to set PCIe version in BIOS.

      Handing node to Giulio and Sergio next for CI integration.

      Highly Ionizing Particles

      Had some issues with running simulations. Couldn't test on simdata yet.

    • 10:45 10:50
      ITS Tracking 5m
      Speakers: Felix Schlepper (CERN, Heidelberg University (DE)), Gabriele Cimador (CERN, Università and INFN Torino), Matteo Concas (CERN)

      Felix: playing around with ITS raw data to 'validate' the staggered readout&reconstruction.

      Gabriele:

      • Changed algorithm
      • Moved to algorithm inspired by Run2 implementation
      • Basically it is a 3D histogramming, but "Raytracing vertexer" sounds cooler
      • Define a 3d voxelisation around the beamline
      •  
      • Define tracklets as lines parametrized as p(t) = p0+tu
        where p0 is cluster on innermost ITS layer, u is direction of the line
      • For each track
        • Calculate tenter and texit for the voxelisation
        • Sample the line at uniform tstep between this range
        • Each time the line steps into a voxel, update the count

      • Select the local maximums that surpass a threshold
      • Then fit the lines that passes through the vertex candidates
      • Below video where low and high mult vertices are detected, sliding through z
      • video

      • Result with only candidates finding, without fitting:
        Precision: 0.9924, Recall: 0.8667, F1: 0.9253
        Total true vertices: 150
        Matched vertices: 130
        Missed true vertices: 20
        Total fitted vertices: 131
        Fake vertices: 1
        Mean residuals (x,y,z): [0.00014556 0.00058302 0.00027437]
        Std  residuals (x,y,z): [0.00432429 0.00439    0.00784673]
      • Next steps:
        • Just select highest mult vertex
        • do an histogram along that z to find lower mult vertices
    • 10:50 10:55
      System Run Coordination Topics 5m
      Speaker: Ernst Hellbar (CERN)