ROC manager update
*************************
ROC manager meeting cancelled.
Ops meeting update
*************************
Ops meeting cancelled due to service reliability workshop at CERN
UKI site issues
****************
UCL-HEP:
Discovered that machine running the CE and BDII_site had load greater than 20. This caused several BDIi drop-outs. Identified the gridice daemon as the culprit, with a process using up to over 50% CPU at times. Had to turn that process off to re-satablish stable functionality.
Despite atlas queue being stuffed, we still receive a steady submission fo Atlas jobs. Now queued jobs is close to 1000, with a steadly increasing waiting time (currently at 326.6 Ms). This will inevitably lead to large number of failures due to proxy expiring. Not sure if we should cap the number of queued jobs per queue
Monitoring & accounting questions
**************************************
Recent SAM problems: Imperial HEP and Cambridge
ATLAS (SL tests) problems: Many sites but Durham, Glasgow and Manchester stand out
APEL: Problems still seen at:
RAL-LCG2
QMUL
RHUL
UCL-CENTRAL
Manchester
Durham
Bristol
RALPP
UKI tickets
***************
See attached update