Color code: (critical, news during the meeting: green, news from this week: blue, news from last week: purple, no news: black)
General:
- AMD identified the problem on the MI210. We have to disable the workaround for the MI210, which they have asked us to put in place for the MI50, which increases the resource consumption.
- Locally tested OpenCL compilation with Clang 14 bumping –cl-std from clc++ (OpenCL 2.0) to CLC++2021 (OpenCL 3.0) and using clang-internal SPIR-V backend. Arrow bump to 8.0 done, which was prerequesite.
ROCm compilation issues:
- Create new minimal reproducer for compile error when we enable LOG(...) functionality in the HIP code. Check whether this is a bug in our code or in ROCm. Lubos will work on this.
- Another compiler problem with template treatment found by Ruben. Have a workaround for now. Need to create a minimal reproducer and file a bug report.
ITS GPU Tracking and Vertexing:
- Matteo will spend 1 week working on multi-threading of ITS vertexing, then go back to GPU ITS tracking.
TPC GPU Processing
- Found bug in CPU version of GPU TPC ZS v4 (DLBZS) decoder, reported to Felix - Felix still checking.
- More investigation of random GPU crashes:
- Besides the crashes from broken GPUs, and from corrupt TPC raw data, there is definitely another type of crash, that affects all GPUs, and is not triggered by corrupt raw data.
- Quite rare, but it happens more often if the node is under heavy node (e.g. in pp run with 100 EPNs happens more often than in the same run with 200 MHz.).
- Happens both with raw and with MC pp data, but seems to happen more often in real data than in MC.
- Have never seen it happening in Pb-Pb MC data despite largest statistic. Either this is coincidence, or it can only happen when certain patters / occupancies are present in the data.
- Identified one time frame which crashed 6 times in few million runs, on 6 different EPNs which were otherwise stable.
- Need to do some special runs with extensive debug output to investigate this further.
- TPC CTF Skimming:
- Implemented in TPC entropy encoder, fully working but not yet final version. Can already be used, future improvements will cure away somewhat more clusters.
- Not yet applying eta-check on unattached clusters, but storing all compatible drift-times.
- Need to take into account TPC distortions for z / eta check. Either with some margin, or assuming some average distortion corrections.
- While implementing the CTF skimming, found a problem with decoding of some TPC track model TPC clusters. Track gets completely odd parameters, tgl > 100, and produces clusters everywhere in the time frame.
- First assumed this was a side effect of the rounding problem reported some weeks ago, but the rounding cannot have such large effects.
- Even more strange, I cannot reproduce this behavior with MC data yet (tried encoding MC CTF with CPU, NVIDIA and AMD GPU, always OK).
- Need to process some raw TFs and encode them to CTFs with EPN MI50, to try to reproduce it.
- Hopefully, there is not more (invisible) corruption, where track parameters do not get completely of.
TRD Tracking
ANS Encoding
- Waiting for PR with AVX-accelerated ANS encoding