Little to report. We are running quite smoothly.
Problems seen here with madevent. Two issues, that seem to get tracked back to a gridpack of some kind.
1. child processes not properly cleaned up so that pstree shows 2 processes per core on the machine (and why was it doing this in the first place?) even though those processes were defunct.
2. References to an inaccessible, private cvmfs repo cp3.uclouvain.be were saturating the /var/log/messages file and partition.
Latter were limited by making a --negative-timeout=600 parameter in auto.master for cvmfs (default is 60 seconds, meaning as much as one such message set logged per 60 seconds).
These combined with grid jobs running 4 copies of root with large memory were crashing AGLT2 WN. Much of that was "bullet-proofed" by updating software to current kernel, etc.
Work continues on our Condor configuration that will better help address this.
See this ticket for more info on the madevent issues: https://its.cern.ch/jira/browse/AGENE-1134