<p>On OSG 3.6, for gatekeepers and worker nodes.</p>

<p>We broke frontier squids while trying to fix gratia probe problems.<br>
Our first fix attempt inadvertently re-enabled a local setup script overriding squid location variables.<br>
Gratia issues solved: directory ownership was root instead of condor.</p>

<p>&nbsp;</p>

<p>2 tickets:</p>

<p>156868&nbsp; 15-Apr-2022&nbsp;&nbsp; AGLT2: Failing jobs in panda with "Unable to identify specific exception"<br>
156873&nbsp; 17-Apr-2022&nbsp;&nbsp; US AGLT2: High Transfer failures as source</p>

<p>The job problems was traced to time outs during stage-out.<br>
There was no clear problem but the likely suspect was dcache and java running out of memory.<br>
We increased the memory for webdav on the doors and dCacheDomain on the headnodes.<br>
Also added CPUs and memory to the VM doors.&nbsp; That all helped.<br>
We also upgraded dcache from 6.2.35 to 7.2.15 (since we had to restart to load new CA certs anyway)<br>
The issues from both tickets disappeared after that.</p>

<p>&nbsp;</p>

<p>Maintenance:</p>

<p>mostly through updating all worker nodes for new kernel, Dell FW updates, OSG updates (cvmfs)</p>

<p>&nbsp;</p>

<p>Network upgrades completed * and tested * :</p>

<p>All new multi-path and multi-100G connections to ESnet and between MSU and UM are now fully deployed<br>
and were tested for proper failover in case of backhoe vs fiber incident.</p>