Perlmutter: 3K GPU added, ~70/77% CPU/GPU allocation used
- Constant rate of job failures due to SLURM job timeout
- (Xin)The parallel job in the same work triggers some issues on the rucio side. Some jobs can not get service initialized that they fail on stage-in/outÂ