- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
https://tinyurl.com/T1-GGUS-Open
https://tinyurl.com/T1-GGUS-Closed
https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
New pledge values from April 1st
Xrootd RPM deployment:
- Dev ceph cluster is down
- VMs prevented from access to prod cluster
Echo Downtime:
- Batch farm to stop new submissions from tonight
- Want to take opportunity to switch more jobs to Harvester and multi-job pilots
- Tape access expected to multihop via Cern for the period.
Antares:
- Delaying T0 export retest until MGM 'fix' is confirmed
- Lots of (~25%) Operation Expired errors due to antares-tpc01 xrootd service; affecting writes to Antares
- Might not explain the Recall errors (same error message).
SAM tests are failing due to webdav tests.
Tape Challenge - it sounds like some fraction of the data chosen to be recalled for tape challenge may be on broken/stuck tapes, and this is due to be fixed by external engineers this afternoon (30 March).
Recalls in the tape challenge probably also affected by other factors -
1. Upgrade of EOS required to fix problem (if one missing file in an FTS batch of requests fails all requests in the batch fail with a 'this file doesn't exist' type error).
2. Upgrade of Rucio to forthcoming 1.28 required to fix another problem (when resubmissions are triggered, Rucio is no longer aware that multihop jobs consist of two, coupled jobs).
3. To be confirmed - possible problem with server certificate in CMS-Rucio which may have expired, and might explain the inability of Rucio to cancel FTS requests (I have 2 examples of how I think this was broken).
Job efficiencies are ok, a bit below average.