Color code: (critical, news during the meeting: green, news from this week: blue, news from last week: purple, no news: black)
General:
- AMD identified the problem on the MI210. We have to disable the workaround for the MI210, which they have asked us to put in place for the MI50, which increases the resource consumption.
- Locally tested OpenCL compilation with Clang 14 bumping –cl-std from clc++ (OpenCL 2.0) to CLC++2021 (OpenCL 3.0) and using clang-internal SPIR-V backend. Arrow bump to 8.0 done, which was prerequesite.
ROCm compilation issues:
- Create new minimal reproducer for compile error when we enable LOG(...) functionality in the HIP code. Check whether this is a bug in our code or in ROCm. Lubos will work on this.
- Matteo implemented a workaround for the LOG(...) problem, so we can now at least use the LOG macro in the ROCm code. But the internal compiler error is not yet fixed, so it may come back.
- Another compiler problem with template treatment found by Ruben. Have a workaround for now. Need to create a minimal reproducer and file a bug report.
ITS GPU Tracking and Vertexing:
- Matteo will spend 1 week working on multi-threading of ITS vertexing, then go back to GPU ITS tracking.
TPC GPU Processing
- Felix fixed problem in clusterization which gave different results between CPU and GPU version.
- More investigation of random GPU crashes:
- Besides the crashes from broken GPUs, and from corrupt TPC raw data, there is definitely another type of crash, that affects all GPUs, and is not triggered by corrupt raw data.
- Quite rare, but it happens more often if the node is under heavy node (e.g. in pp run with 100 EPNs happens more often than in the same run with 200 MHz.).
- Happens both with raw and with MC pp data, but seems to happen more often in real data than in MC.
- Have never seen it happening in Pb-Pb MC data despite largest statistic. Either this is coincidence, or it can only happen when certain patters / occupancies are present in the data.
- Identified one time frame which crashed 6 times in few million runs, on 6 different EPNs which were otherwise stable.
- Need to do some special runs with extensive debug output to investigate this further.
- TPC CTF Skimming:
- Implemented in TPC entropy encoder, fully working but not yet final version. Can already be used, future improvements will cure away somewhat more clusters.
- Not yet applying eta-check on unattached clusters, but storing all compatible drift-times.
- Need to take into account TPC distortions for z / eta check. Either with some margin, or assuming some average distortion corrections.
- Problem with bogus tracks during CTF skimming was due to incorrectly configured B field and TF length. Will have a fix today. We should also add b field strength and tf length to TPC CTF metadata, to check for consistency during decoding.
TRD Tracking
ANS Encoding
- Waiting for PR with AVX-accelerated ANS encoding