Site is now full of jobs and operating well
Minor problems/fixes during the last two weeks
- Site was running nearly all SCORE HIMEM jobs even though MCORE had many activated
- Because we had run so many MCORE jobs for so long time (no SCORE), weighting was skewed
- Changed the condor knob "PRIORITY_HALFLIFE" to 5 minutes to balance out faster (was 1/2 day)
- The three MWT2 Squids heavily loaded by Frontier requests
- Impacted access to CVMFS repositories causing slow mounts and access to data
- Created three "CVMFS" only squids on the local CVMFS Stratum-1 servers
- NSS bug with certificates
- Push out to all nodes on Tuesday
OSG 3.3.23-1 installed on all nodes
USERDISK and GROUPDISK decommissioning continuing
- Waiting on ADC to change Panda Q to use SCRATCHDISK for output by ANALY Qs
- Reducing size of GROUPDISK and adding freed space to DATADISK
Storage decomissioning has begun
- In FY17 we are scheduled to retire over 1PB of old storage
- First server was retired causing a reduction of 120TB of available storage
- 6 more servers to retire
At UC, SciDMZ upgrade work continues. New fiber and Arista switch being deployed this week. Should improve WAN transfers limited by distribution switch between uct2 and the campus SciDMZ border router.
At UC, CRAC2 unit compressor replaced, system fully back up and running well. CRAC1,3 units have been assessed by outside vendor, meeting tomorrow to discuss additional needed repairs.
Greg is building a hot-isle containment system to improve efficiency.