SWT2_CPB:
Operations
No major site issues to mention. We have been running very smoothly.
We noticed a slight decrease in running job slots and increase in wrapper faults on 3/30 (weekend), but this was brief and now resolved.
Jobs appear to have decreased for a moment last night (4/1) as well. Will look into this.
Maintenance of data centers as usual (replacing drives in storage, addressing two problematic worker nodes, monitoring).
EL9 Migration Updates
Continuing to make slight improvements to our modules in both the test and production clusters.
Continuing to develop and test modules for XRootD proxy and storage in the test cluster.
New Storage
We physically installed additional new storage in racks.
Ran into a few issues while trying to deploy new storage.
Tested new third-party rails, but are deciding against using it. Contacted Dell for their suggestions on solutions as we also research and plan.
DHCP requests issues interfering with Rocks ability to provision with EL7. This has been resolved. iDRAC devices were set to DHCP. Temporarily removed a module that manages this to fix/test in the test cluster before adding it back into production. Manually set these devices to static for now which resolved the issue.
TFTP server issues. Provisioning nodes leads to TFTP timeout issues with Rocks. Investigated this and resolved it.
Configured and tested provisioning of new storage in the test cluster. Rocks does not seem to support UEFI, but does work with BIOS. Setting new storage to BIOS boot mode causes M.2 in the BOSS-N1 boot controller to not be detected. Working on a solution and continuing to test for now.
OU: