Hope everyone is enjoying their summer! Eric on vacation this week.
1) the gatekeeper which receives the HC jobs stopped working , we did not have monitor for the condor-ce service, so did not realize it right away.
2) a big portion of analysis jobs fail, the site gets a ggus ticket. We found out one script we use to clean up the zombie files left by killed job by HTcondor accidentally deletes the work dir of running jobs too. This was a bug in the script when it switches to pilot2. We fixed the bug in our script.
Sorted out the storage servers we could retire from Tier2 according to the age of the hardware and also the number of failures on the hardware. Figure out the items for the purchase.
Setup a new replication server for dCache database
Upgraded frontier-squid site-wide to 4.8-1.1.
Discussed MWT2 retirement and purchasing plans.
Still working on getting the dCache nodes dual-stacked. Needed to get external IPv6 PTR records set up from UC ITS. This was completed yesterday.
Setting up temporary IU and UIUC SLATE nodes to test SLATE frontier-squid configuration (see Lincoln's talk).
We've reinstalled the NET2 squid with new software for security and to get rid of a low level GGUS ticket where we have too many failovers. Seems to have worked.
Lot's of work re: migrating to NESE. Gridftp Docker container with Wei's Gridftp (with Adler callout) works. Lots of work and testing still to do. ADC informed. Will have two DATADISK space tokens during transition.
Smooth operations and full site otherwise.
We have two open GGUS tickets. Both can be closed.
- all working well
- in the process of reconfiguring OU xrootd storage for automatic space group assignment and http-over-xrootd
1) Migration of UTA_SWT2 to CentOS7 completed
2) All equipment from recent hardware purchase received - planning deployment schedule
3) Systems running well post-CentOS7 upgrades