Changed normalzation of NN input for pad normalization as: pad / max_pad[row]
Checked NN performance pT differentially and found that network seems to learn the momentum-Z direction (not well yet, but at least sensible output) but does not learn the momentum vector estimate in X or Y correctly:
Suspicions
Training data is dominated by low pT clusters (a lot more of those than high momentum)
Momentum might need some normalization: Typical NN ouputs should be O(1) -> Loss function can be hugely steered by large values in pT
Tried training NN with downsampled data in pT (used a slightly modified Tsalis distribution) and cutting at pT = 10 GeV/c:
Test ongoing: First result seemed not promising but not sure if I did everything correctly -> Definitely needs (sector, row, pad) as input otherwise cannot learn (by principle) the momentum in X or Y direction...
GPU installation
PyTorch installtion works on all devices except for EPN's: Needed workaround to compile with c++20 which compiles, but does not run (results in std::runtime_error: Invalid ext op lib format)
Testing OnnxRuntime GPU installation -> Started yesterday night. Encountering:
fatal error: error in backend: cannot lower memory intrinsic in address space 5 Seems like a very recent issue (some github 3 threads are only 3 weeks old and seems llvm related: https://github.com/llvm/llvm-project/issues/88497)
Clusterization
Understood the clusterization code now and know where to "hack" in. For now trying to simply read out the 7x7x7 grid needed for the NN. Once that is achieved and have working installation of ONNX-GPU / PyTorch try to implement simple CPU version.