Color code: (critical, news during the meeting: green, news from this week: blue, news from last week: purple, no news: black)
High priority Framework issues:
Other framework tickets:
Global calibration topics:
Async reconstruction
EPN major topics:
Other EPN topics:
Raw decoding checks:
Full system test issues:
Topology generation:
QC / Monitoring / InfoLogger updates:
AliECS related topics:
GPU ROCm / compiler topics:
TPC GPU Processing
TPC processing performance regression:
General GPU Processing
Oncalls:
NN clusterization in O2
NN clusterization is fully implemented in O2 clusterization code. Initial performance tests will be shown today
Classification + regression network make the processing on CPU about 3-4x slower in terms of wall time and have a much higher CPU utilization. My assumption: Currently the ONNX model is loaded once, but called applied on a per-peak level -> Overhead of the model->Run() function is significant (measured this some time ago). Expecting also much higher speed-up on GPU's and float16. Working on GPU framework: Lubos will make an old HLT node available with docker for the build. If that succeeds, I can probably proceed with a push to alidist relatively quickly.
Comparison of the performance
Distributions & matching
Matching efficiency: 0.802463
Clone rate: 0.541587
Fake rate: 0.197537
Differential analysis works on files from smaller dataset (1Ev 50kHz PbPb) but for larger files I get:
[INFO] reading 1 data branch(es) and 1 mc branch(es)
Error in <TBufferFile::CheckByteCount>: object of class vector<char> read too many bytes: 1349358918 instead of 275617094
Warning in <TBufferFile::CheckByteCount>: vector<char>::Streamer() not in sync with data on file /lustre/alice/users/csonnab/PhD/jobs/simulation/sim_data/validation/o2sim_150324_50Ev_10000QED_PbPb_13t7p/tracks_clusters_nn_withcuts/regular_reco/tpc-native-clusters.root, fix Streamer()