Alice Weekly Meeting: Software for Hardware Accelerators / PDP-SRC

Europe/Zurich
Videoconference
ALICE GPU Meeting
Zoom Meeting ID
61230224927
Host
David Rohr
Useful links
Join via phone
Zoom URL
    • 11:00 AM 11:20 AM
      Discussion 20m
      Speakers: David Rohr (CERN), Ole Schmidt (CERN)

      Color code: (critical, news during the meeting: green, news from this week: blue, news from last week: purple, no news: black)

      High priority RC YETS issues:

      • Fix dropping lifetime::timeframe for good: No news
        • Still pending: problem with CCDB objects getting lost by DPL leading to "Dropping lifetime::timeframe", saw at least one occation during SW validation.
        • Ruben reported a similar problem in async reco, but only during last TFs of a run, probably independent bug.
        • No other instances of Dropping lifetime::timeframe seen at P2.
      • Expandable tasks in QC. Everything merged on our side.
        • EPN fixed the XML merging. Not sure if everything is yet deployed on production and was tested again.
      • Start / Stop / Start:
        • Problems in readout and QC fixed. Now 3 new problems, at least 2 on our side: No news
          • GPU multi-thread pipeline gets stuck after restart. Should be trivial to fix. https://its.cern.ch/jira/browse/O2-4638
          • Some processes are crashing randomly (usually ~2 out of >10k) when restarting. Stack trace hints to FMQ. https://its.cern.ch/jira/browse/O2-4639
          • TPC ITS matching QC crashing accessing CCDB objects. Not clear if same problem as above, or a problem in the task itself.
      • Stabilize calibration / fix EoS: New scheme: https://its.cern.ch/jira/browse/O2-4308
        • Work in progress, partial PR open.
      • Problem with bogus oldestPossible messages coming from colliding QC timers:
        • Triggered by colliding QC task names, which they should fix in any case: https://its.cern.ch/jira/browse/O2-3664
        • We can circumvent that in DPL by not forwarding timers
        • But unfortunately there might be a general problem in DPL when the same task sends data with timers and with data, since that will make the oldestPossibleTimeslice go out of sync by design.
      • Problem with FIT workflow and a single EPN causing backpressure hopefully fixed by improving metric-feedback mechanism. Validated that it doesn't break the time-frame-throttling. Still need to validate that FIT workflow is fixed.
      • Fix problem with ccdb-populator: no idea yet, no ETA.

       

      High priority framework topics:

      • See YETS issues

       

      Other framework tickets:

      • TOF problem with receiving condition in tof-compressor: https://alice.its.cern.ch/jira/browse/O2-3681
      • Grafana metrics: Might want to introduce additional rate metrics that subtract the header overhead to have the pure payload: low priority.
      • Backpressure reporting when there is only 1 input channel: no progress: https://alice.its.cern.ch/jira/browse/O2-4237
      • Stop entire workflow if one process segfaults / exits unexpectedly. Tested again in January, still not working despite some fixes. https://alice.its.cern.ch/jira/browse/O2-2710
      • https://alice.its.cern.ch/jira/browse/O2-1900 : FIX in PR, but has side effects which must also be fixed.
      • https://alice.its.cern.ch/jira/browse/O2-2213 : Cannot override debug severity for tpc-tracker
      • https://alice.its.cern.ch/jira/browse/O2-2209 : Improve DebugGUI information
      • https://alice.its.cern.ch/jira/browse/O2-2140 : Better error message (or a message at all) when input missing
      • https://alice.its.cern.ch/jira/browse/O2-2361 : Problem with 2 devices of the same name
      • https://alice.its.cern.ch/jira/browse/O2-2300 : Usage of valgrind in external terminal: The testcase is currently causing a segfault, which is an unrelated problem and must be fixed first. Reproduced and investigated by Giulio.
      • Found a reproducible crash (while fixing the memory leak) in the TOF compressed-decoder at workflow termination, if the wrong topology is running. Not critical, since it is only at the termination, and the fix of the topology avoids it in any case. But we should still understand and fix the crash itself. A reproducer is available.
      • Support in DPL GUI to send individual START and STOP commands.
      • Problem I mentioned last time with non-critical QC tasks and DPL CCDB fetcher is real. Will need some extra work to solve it. Otherwise non-critical QC tasks will stall the DPL chain when they fail.
      • DPL sending SHM metrics for all devices, not only input proxy: https://alice.its.cern.ch/jira/browse/O2-4234
      • Some improvements to ease debugging: https://alice.its.cern.ch/jira/browse/O2-4196 https://alice.its.cern.ch/jira/browse/O2-4195 https://alice.its.cern.ch/jira/browse/O2-4166
      • After Pb-Pb, we need to do a cleanup session and go through all these pending DPL tickets with a higher priority, and finally try to clean up the backlog.

      Global calibration topics:

      • TPC IDC and SAC workflow issues to be reevaluated with new O2 at restart of data taking. Cannot reproduce the problems any more.

       

      Sync processing

      • Proposal to parse InfoLogger message and alert automatically: https://alice.its.cern.ch/jira/browse/R3C-992
      • Seen crashes in pp replay: corrupt CCDB objects, but also general corruption in SHM.
        • Was due to ITS raw data corruption that was not handled correctly and let to memory corruption - fixed.

       

      Async reconstruction

      • Remaining oscilation problem: GPUs get sometimes stalled for a long time up to 2 minutes.
        • Checking 2 things: does the situation get better without GPU monitoring? --> Inconclusive
        • We can use increased GPU processes priority as a mitigation, but doesn't fully fix the issue.
      • ḾI100 GPU stuck problem will only be addressed after AMD has fixed the operation with the latest official ROCm stack.
      • Network problems on EPN farm solved, back in operation.
      • Improvement by Giulio to reduce QC memory consumption in async reco by changing ROOT serialization. Status?
      • Merged Gabriele's PR for GPU TPC track model decoding.

       

      EPN major topics:

      • Fast movement of nodes between async / online without EPN expert intervention.
        • 2 goals I would like to set for the final solution:
          • It should not be needed to stop the SLURM schedulers when moving nodes, there should be no limitation for ongoing runs at P2 and ongoing async jobs.
          • We must not lose which nodes are marked as bad while moving.
      • Interface to change SHM memory sizes when no run is ongoing. Otherwise we cannot tune the workflow for both Pb-Pb and pp: https://alice.its.cern.ch/jira/browse/EPN-250
        • Lubos to provide interface to querry current EPN SHM settings - ETA July 2023, Status?
      • Improve DataDistribution file replay performance, currently cannot do faster than 0.8 Hz, cannot test MI100 EPN in Pb-Pb at nominal rate, and cannot test pp workflow for 100 EPNs in FST since DD injects TFs too slowly. https://alice.its.cern.ch/jira/browse/EPN-244 NO ETA
      • DataDistribution distributes data round-robin in absense of backpressure, but it would be better to do it based on buffer utilization, and give more data to MI100 nodes. Now, we are driving the MI50 nodes at 100% capacity with backpressure, and then only backpressured TFs go on MI100 nodes. This increases the memory pressure on the MI50 nodes, which is anyway a critical point. https://alice.its.cern.ch/jira/browse/EPN-397
      • TfBuilders should stop in ERROR when they lose connection.
      • Need fix for XML merging for topologies with expendable tasks: Done

       

      Other EPN topics:

       

      Raw decoding checks:

      • Add additional check on DPL level, to make sure firstOrbit received from all detectors is identical, when creating the TimeFrame first orbit.

       

      Full system test issues:

      Topology generation:

      • Should test to deploy topology with DPL driver, to have the remote GUI available.
        • DPL driver needs to implement FMQ state machine. Postponed until YETS issues solved.

       

      QC / Monitoring / InfoLogger updates:

      • TPC has opened first PR for monitoring of cluster rejection in QC. Trending for TPC CTFs is work in progress. Ole will join from our side, and plan is to extend this to all detectors, and to include also trending for raw data sizes.

       

      AliECS related topics:

      • Extra env var field still not multi-line by default.

       

      GPU ROCm / compiler topics:

      • Found new HIP internal compiler error when compiling without optimization: -O0 make the compilation fail with unsupported LLVM intrinsic. Reported to AMD.
      • Found a new miscompilation with -ffast-math enabled in looper folllowing, for now disabled -ffast-math.
      • Must create new minimal reproducer for compile error when we enable LOG(...) functionality in the HIP code. Check whether this is a bug in our code or in ROCm. Lubos will work on this.
      • Found another compiler problem with template treatment found by Ruben. Have a workaround for now. Need to create a minimal reproducer and file a bug report.
      • Debugging the calibration, debug output triggered another internal compiler error in HIP compiler. No problem for now since it happened only with temporary debug code. But should still report it to AMD to fix it.
      • Next call with AMD today.
      • Now that we have the deterministic tracking, this can hopefully help AMD in debugging.
      • Checked deterministic mode on AMD GPUs, and we get the exact same results as with CPU / CUDA --> at least no miscompilation in current version which shows only in rare cases, which would have been hidden by concurrency before.

       

      TPC GPU Processing

      • Bug in TPC QC with MC embedding, TPC QC does not respect sourceID of MC labels, so confuses tracks of signal and of background events.
      • New problem with bogus values in TPC fast transformation map still pending. Sergey is investigating, but waiting for input from Alex.
      • Implemented additional debug streamers and all cluster error parameterization features.
      • Implemented edge cluster rejection based on uncrorrected track Y position. To be checked if this is enough, or if TPC wants some smooth cluster masking via errors.

       

      General GPU Processing

      • Consistency between CPU and GPU processing status:
        • Trying to get fully deterministic tracking with GPUCA_NO_FAST_MATH + additional debug options, which will introduce many intermediate sorting steps.
          • Fully deterministic GPU tracking now available. Did not find additional bugs compared to the ones previously reported, but remaining differences were due to sorting issues / real concurrency.
          • In order to use it, set CMake GPUCA_NO_FAST_MATH and configKeyValue deterministicGPUReconstruction=1
      • Started work to make O2 propagator easily usable in ITS tracking, which is not part of the GPU reconstruction library:
        • O2 propagator on  now available to external libararies - tested with ITS tracking. Only requirement is to link against a CMake object library, which will set up everything using static objects.
        • Problem at the moment is that we use the GPU polynomial field valid inside the TPC. Need a mechanism to automatically switch the parameterization. Currently checking Sergey's parameterizations and deciding on the best way to select one, and validating that they actually cover full ALICE central barrel.
      • TODO: All these new features were now implemented only for CUDA (except for deterministic tracking, which already works for HIP). Need to port run-time-compilation, per kernel compilation, and external O2 propagator on GPU, and some other thigs to HIP.
        • Ideally, we can use automatic translation of the CUDA host cxx file to HIP.
    • 11:20 AM 11:25 AM
      TRD Tracking 5m
      Speaker: Ole Schmidt (CERN)
    • 11:25 AM 11:30 AM
      TPC ML Clustering 5m
      Speaker: Christian Sonnabend (CERN, Heidelberg University (DE))

      PyTorch installation in O2

      • Started from the script for installation on EPN's: Timo managed to build in slc8-gpu container
      • Built an alidist recipe
      • Installation in O2 works, but only standalone
        • Failure for linking with other libraries -> Could be ordering of libraries, missing cxx flags, etc.
          undefined reference errors
        • Potentially an issue with pre_cxx11 binaries -> Might need to use zipped ABI's instead of python installation... (https://pytorch.org)

       

      Other options

      • Checking if ONNX is applicable for our usecase
      • For simple fully-connected NN's we could potentially use TMVA from ROOT to get us a model class (see https://root.cern/doc/master/TMVA__SOFIE__ONNX_8C.html)

      Full stack trace of pytorch linkage failure

      • Setup:
        o2_add_executable(torch
                          COMPONENT_NAME test
                          SOURCES test/test_torch_model_inference.cxx
                          PUBLIC_LINK_LIBRARIES O2::Torch O2::TPCWorkflow O2::SimulationDataFormat O2::TPCQC O2::DataFormatsTPC O2::TPCBase Boost::thread O2::GPUTracking)

      • [4888/5210] Building CXX object Detectors/TPC/workflow/CMakeFiles/O2exe-test-torch.dir/test/test_torch_model_inference.cxx.o
        ...
        [5138/5210] Linking CXX executable stage/bin/o2-test-torch
        FAILED: stage/bin/o2-test-torch
        : && /data.local1/csonnab/MyO2/sw/slc7_x86-64/GCC-Toolchain/v12.2.0-alice1-12/bin/c++ -fPIC -O2 -std=c++20 -O2 -g -DNDEBUG -Wno-unknown-warning-option Detectors/TPC/workflow/CMakeFiles/O2exe-test-torch.dir/test/test_torch_model_inference.cxx.o Detectors/TPC/workflow/CMakeFiles/O2exe-test-torch.dir/__/__/__/Common/Utils/src/fpu.cxx.o -o stage/bin/o2-test-torch -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -L/data.local1/csonnab/MyO2/sw/slc7_x86-64/FairRoot/v18.4.9-alice3-local6/lib -L/data.local1/csonnab/MyO2/sw/slc7_x86-64/pythia/v8304-alice1-16/lib -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/data.local1/csonnab/MyO2/sw/slc7_x86-64/FairRoot/v18.4.9-alice3-local6/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/pythia/v8304-alice1-16/lib:/data.local1/csonnab/MyO2/sw/BUILD/e3b6b020c5c970780d65ebda95d6a9a5b2fd9424/O2/stage/lib64:/data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/PyTorch/2.2.1-local8/lib/python/site-packages/torch/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/HepMC3/3.2.5-129/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/FFTW3/v3.3.9-50/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/Common-O2/v1.6.2-16/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/VMC/v2-0-124/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/Configuration/v2.8.0-1/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/Monitoring/v3.18.1-12/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/libInfoLogger/v2.6.0-8/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/arrow/v14.0.1-alice1-12/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/curl/7.70.0-83/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/libuv/v1.40.0-42/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/libjalienO2/0.1.4-23/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/FairMQ/v1.8.4-local2/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/TBB/v2021.5.0-48/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/FairLogger/v1.11.1-local1/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/fmt/10.2.1-local1/lib: stage/lib64/libO2Torch.so stage/lib64/libO2TPCWorkflow.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib/libboost_thread.so.1.83.0 /data.local1/csonnab/MyO2/sw/slc7_x86-64/PyTorch/2.2.1-local8/lib/python/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/data.local1/csonnab/MyO2/sw/slc7_x86-64/PyTorch/2.2.1-local8/lib/python/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed /data.local1/csonnab/MyO2/sw/slc7_x86-64/PyTorch/2.2.1-local8/lib/python/site-packages/torch/lib/libc10.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/PyTorch/2.2.1-local8/lib/python/site-packages/torch/lib/libc10.so -Wl,--no-as-needed,"/data.local1/csonnab/MyO2/sw/slc7_x86-64/PyTorch/2.2.1-local8/lib/python/site-packages/torch/lib/libtorch.so" -Wl,--as-needed stage/lib64/libO2CTPWorkflowIO.so stage/lib64/libO2GPUWorkflow.so stage/lib64/libO2TPCQC.so stage/lib64/libO2GlobalTracking.so stage/lib64/libO2FT0Reconstruction.so stage/lib64/libO2FT0Simulation.so stage/lib64/libO2HMPIDReconstruction.so stage/lib64/libO2HMPIDSimulation.so stage/lib64/libO2TOFWorkflowUtils.so stage/lib64/libO2TOFCalibration.so stage/lib64/libO2DetectorsDCS.so stage/lib64/libO2TOFWorkflowIO.so stage/lib64/libO2TOFReconstruction.so stage/lib64/libO2MFTTracking.so stage/lib64/libO2MCHTracking.so stage/lib64/libO2MCHBase.so stage/lib64/libO2TPCSimulation.so stage/lib64/libO2TPCCalibration.so stage/lib64/libO2SpacePoints.so stage/lib64/libO2TPCReconstruction.so stage/lib64/libO2GPUO2Interface.so stage/lib64/libO2GPUTracking.so stage/lib64/libO2GPUDataTypes.so stage/lib64/libO2TOFBase.so stage/lib64/libO2Steer.so stage/lib64/libO2Generators.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/pythia/v8304-alice1-16/lib/libpythia8.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/HepMC3/3.2.5-129/lib/libHepMC3.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/HepMC3/3.2.5-129/lib/libHepMC3search.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/HepMC3/3.2.5-129/lib/libHepMC3rootIO.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/FairRoot/v18.4.9-alice3-local6/lib/libGen.so stage/lib64/libO2TRDBase.so stage/lib64/libO2ITStracking.so stage/lib64/libO2ITSReconstruction.so stage/lib64/libO2ITSBase.so stage/lib64/libO2ITS3Base.so stage/lib64/libO2TPCReaderWorkflow.so stage/lib64/libO2DataFormatsGlobalTracking.so stage/lib64/libO2DataFormatsTOF.so stage/lib64/libO2DataFormatsITS.so stage/lib64/libO2DataFormatsFT0.so stage/lib64/libO2FT0Base.so stage/lib64/libO2DataFormatsHMP.so stage/lib64/libO2HMPIDBase.so stage/lib64/libO2DataFormatsMFT.so stage/lib64/libO2MFTBase.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libXMLIO.so.6.30.01 stage/lib64/libO2ITSMFTSimulation.so stage/lib64/libO2ITSMFTReconstruction.so stage/lib64/libO2DataFormatsITSMFT.so stage/lib64/libO2ITSMFTBase.so stage/lib64/libO2DataFormatsMCH.so stage/lib64/libO2DataFormatsMID.so stage/lib64/libO2DataFormatsFV0.so stage/lib64/libO2FV0Base.so stage/lib64/libO2DataFormatsFDD.so stage/lib64/libO2DataFormatsFIT.so stage/lib64/libO2FDDBase.so stage/lib64/libO2DataFormatsZDC.so stage/lib64/libO2DetectorsCalibration.so stage/lib64/libO2ZDCBase.so stage/lib64/libO2DataFormatsEMCAL.so stage/lib64/libO2DataFormatsCPV.so stage/lib64/libO2CPVBase.so stage/lib64/libO2DataFormatsPHOS.so stage/lib64/libO2DetectorsBase.so stage/lib64/libO2GPUDataTypeHeaders.so stage/lib64/libO2TPCFastTransformation.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libMinuit.so.6.30.01 stage/lib64/libO2TPCSpaceCharge.so stage/lib64/libO2TPCBase.so stage/lib64/libO2DataFormatsTPC.so stage/lib64/libO2DetectorsRaw.so stage/lib64/libO2DPLUtils.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/Common-O2/v1.6.2-16/lib/libCommon.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libROOTDataFrame.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libROOTVecOps.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib/libboost_serialization.so.1.83.0 stage/lib64/libO2PHOSBase.so stage/lib64/libO2SimConfig.so stage/lib64/libO2DataFormatsTRD.so stage/lib64/libO2SimulationDataFormat.so stage/lib64/libO2DataFormatsCalibration.so stage/lib64/libO2ReconstructionDataFormats.so stage/lib64/libO2Field.so stage/lib64/libO2DataFormatsCTP.so stage/lib64/libO2DataFormatsParameters.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/FairRoot/v18.4.9-alice3-local6/lib/libBase.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/FairRoot/v18.4.9-alice3-local6/lib/libFairTools.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/FairRoot/v18.4.9-alice3-local6/lib/libParBase.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/FairRoot/v18.4.9-alice3-local6/lib/libGeoBase.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/VMC/v2-0-124/lib/libVMCLibrary.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libEG.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libPhysics.so.6.30.01 stage/lib64/libO2DetectorsCommonDataFormats.so stage/lib64/libO2GPUUtils.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_cord.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_cordz_info.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_cord_internal.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_cordz_functions.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_cordz_handle.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_hash.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_city.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_bad_variant_access.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_low_level_hash.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_raw_hash_set.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_bad_optional_access.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_hashtablez_sampler.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_exponential_biased.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_synchronization.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_stacktrace.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_graphcycles_internal.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_symbolize.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_debugging_internal.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_malloc_internal.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_demangle_internal.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_time.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_strings.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_strings_internal.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_base.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_spinlock_wait.a -lrt /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_int128.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_civil_time.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_time_zone.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_throw_delegate.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_raw_logging_internal.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/abseil/20220623.1-11/lib64/libabsl_log_severity.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/GCC-Toolchain/v12.2.0-alice1-12/lib64/libgomp.so -lpthread stage/lib64/libO2CCDB.so stage/lib64/libO2Framework.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/Configuration/v2.8.0-1/lib/libConfiguration.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/Monitoring/v3.18.1-12/lib/libO2Monitoring.so stage/lib64/libO2FrameworkFoundation.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/arrow/v14.0.1-alice1-12/lib/libgandiva.so.1400.1.0 /data.local1/csonnab/MyO2/sw/slc7_x86-64/arrow/v14.0.1-alice1-12/lib/libarrow.so.1400.1.0 /data.local1/csonnab/MyO2/sw/slc7_x86-64/curl/7.70.0-83/lib/libcurl.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/libuv/v1.40.0-42/lib/libuv.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/libjalienO2/0.1.4-23/lib/libjalienO2.so stage/lib64/libO2CommonUtils.so stage/lib64/libO2Headers.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/FairMQ/v1.8.4-local2/lib/libfairmq.so.1.8.4 /data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib/libboost_container.so.1.83.0 -ldl -lrt /data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib/libboost_program_options.so.1.83.0 /data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib/libboost_filesystem.so.1.83.0 /data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib/libboost_atomic.so.1.83.0 /data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib/libboost_regex.so.1.83.0 /data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib/libboost_iostreams.so.1.83.0 /data.local1/csonnab/MyO2/sw/slc7_x86-64/TBB/v2021.5.0-48/lib/libtbb.so.12.5 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libTreePlayer.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libTree.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libGraf3d.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libGpad.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libGraf.so.6.30.01 stage/lib64/libO2CommonDataFormat.so stage/lib64/libO2MathUtils.so stage/lib64/libO2GPUCommon.so /data.local1/csonnab/MyO2/sw/slc7_x86-64/Vc/1.4.1-110/lib/libVc.a /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libGeom.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libHist.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libMatrix.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/FairLogger/v1.11.1-local1/lib/libFairLogger.so.1.11.1 /data.local1/csonnab/MyO2/sw/slc7_x86-64/fmt/10.2.1-local1/lib/libfmt.so.10.2.1 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libGenVector.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libMathCore.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libImt.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libMultiProc.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libNet.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libRIO.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libThread.so.6.30.01 /data.local1/csonnab/MyO2/sw/slc7_x86-64/ROOT/v6-30-01-alice3-3/lib/libCore.so.6.30.01 -lpthread -Wl,-rpath-link,/data.local1/csonnab/MyO2/sw/slc7_x86-64/FFTW3/v3.3.9-50/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/boost/v1.83.0-alice1-13/lib:/data.local1/csonnab/MyO2/sw/slc7_x86-64/libInfoLogger/v2.6.0-8/lib && :
        /data.local1/csonnab/MyO2/sw/slc7_x86-64/GCC-Toolchain/v12.2.0-alice1-12/bin/../lib/gcc/x86_64-unknown-linux-gnu/12.2.0/../../../../x86_64-unknown-linux-gnu/bin/ld: Detectors/TPC/workflow/CMakeFiles/O2exe-test-torch.dir/test/test_torch_model_inference.cxx.o: in function `defaultConfiguration(std::vector<o2::framework::ServiceSpec, std::allocator<o2::framework::ServiceSpec> >&)':
        /data.local1/csonnab/MyO2/sw/SOURCES/O2/pytorch/0/Framework/Core/include/Framework/runDataProcessing.h:89: undefined reference to `o2::framework::CommonServices::defaultServices(std::string, int)'
        /data.local1/csonnab/MyO2/sw/slc7_x86-64/GCC-Toolchain/v12.2.0-alice1-12/bin/../lib/gcc/x86_64-unknown-linux-gnu/12.2.0/../../../../x86_64-unknown-linux-gnu/bin/ld: Detectors/TPC/workflow/CMakeFiles/O2exe-test-torch.dir/test/test_torch_model_inference.cxx.o: in function `testProcess(o2::framework::ConfigContext const&, std::vector<o2::framework::InputSpec, std::allocator<o2::framework::InputSpec> >&, std::vector<o2::framework::OutputSpec, std::allocator<o2::framework::OutputSpec> >&)':
        /data.local1/csonnab/MyO2/sw/SOURCES/O2/pytorch/0/Detectors/TPC/workflow/test/test_torch_model_inference.cxx:122: undefined reference to `o2::framework::CommonServices::defaultServices(std::string, int)'
        /data.local1/csonnab/MyO2/sw/slc7_x86-64/GCC-Toolchain/v12.2.0-alice1-12/bin/../lib/gcc/x86_64-unknown-linux-gnu/12.2.0/../../../../x86_64-unknown-linux-gnu/bin/ld: Detectors/TPC/workflow/CMakeFiles/O2exe-test-torch.dir/test/test_torch_model_inference.cxx.o: in function `fair::Logger::Logger(fair::Severity, std::string const&, std::string const&, std::string const&)':
        /data.local1/csonnab/MyO2/sw/slc7_x86-64/FairLogger/v1.11.1-local1/include/fairlogger/Logger.h:191: undefined reference to `fair::Logger::Logger(fair::Severity, fair::Verbosity, std::string const&, std::string const&, std::string const&)'
        /data.local1/csonnab/MyO2/sw/slc7_x86-64/GCC-Toolchain/v12.2.0-alice1-12/bin/../lib/gcc/x86_64-unknown-linux-gnu/12.2.0/../../../../x86_64-unknown-linux-gnu/bin/ld: Detectors/TPC/workflow/CMakeFiles/O2exe-test-torch.dir/test/test_torch_model_inference.cxx.o: in function `std::string fmt::v10::format<>(fmt::v10::basic_format_string<char>)':
        /data.local1/csonnab/MyO2/sw/slc7_x86-64/fmt/10.2.1-local1/include/fmt/core.h:2835: undefined reference to `fmt::v10::vformat(fmt::v10::basic_string_view<char>, fmt::v10::basic_format_args<fmt::v10::basic_format_context<fmt::v10::appender, char> >)'
        collect2: error: ld returned 1 exit status
    • 11:30 AM 11:35 AM
      ITS Tracking 5m
      Speaker: Matteo Concas (CERN)
      • ITS GPU tracking now follows per-vertex and per-ROFspans tracking logic
        • Currently not using particular configurations  (whole TF, all vertices), focusing on the comparison with CPU counterpart. 
      • Using/testing propagator in ITS code:
        • It runs out of the box. Results are not reliable yet.
        • The number of found tracks is similar within a few %, pt spectrum.
        • Currently inspecting fitting output.
      • General performance
        • Kernels now require more registers -> reduce the number of threads (on my card) -> interestingly HIP (which uses fast correction) slows down with the same order of time.
        • Some spills in the compilation of fitTrack function, to be adjusted for performance
    • 11:35 AM 11:55 AM
      TPC Track Model Decoding on GPU 20m
      Speaker: Gabriele Cimador (Universita e INFN Trieste (IT))

      Problems:

      • When attempting to profile on Nvidia GPU: ==ERROR== ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM

      Should be solved soon

      Benchmarks:

      5.3*107 clusters TimeFrame

      Intel CPU 12 cores / Nvidia GPU

      Decoding total time comparison on GPU, on CPU, old implementation (CPU computing + decoding output transfer to GPU)

      Step0 and Step1 time comparison on GPU and CPU

      Total time for step0, step1, DMA to GPU, DMA to Host - comparison GPU vs CPU

      EPN - AMD CPU 128 cores / AMD GPU

      2.7*108 clusters TimeFrame

      EPN - AMD CPU 128 cores / AMD GPU