Cluster errors

General observations
- NLL (negative log-likelihood) loss works better for convergence than MSE (mean-square error) loss
- Scaling necessary (requires tuning of parameter -> currently only one parameter, but optimally take 2 separate ones)
- Training now rather stable and testing different configurations
  - Also tested the idea sigma / sqrt(qTot). Worked decently well, reaches similar number of tracks, but efficiency is down by 10-15%
    - Feeding in both cluster and track position completely deteriorated the fit, no long tracks found
    - Getting rather good results now for long tracks even though total number of tracks is still not as high as with default method -> Need to adjust scaling parameter

Next try: Retuning x and y scaling separately