Outstanding tickets
-
148474 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-09-01 09:24:00 UKI-NORTHGRID-LANCS-HEP : Low deletion efficiency
- Similar status to last week; combination of aging servers, some full, and empty ones that become overloaded
- On site access yesterday; some older hardware will need OS upgrades.
-
148401 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-09-02 15:37:00 UKI-NORTHGRID-LANCS-HEP: globus_ftp_client failures
-
148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-09-02 15:53:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
- Consistency check returns files that have zero replicas in DPM. AF to see if has any scripts that might help.
- SS to check the database for the 0 replica entries
-
146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-08-20 14:44:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
-
146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
-
146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-07-22 14:53:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
-
144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
-
142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
CPU
-
RAL
- Stable; below pledge in Monit, but consistent with internal (and pledge) if scaled by correct corepower.
-
Northgrid
- LANCS: disk problems (described above); in test. May also have some pilot issues
-
London
- QMUL: From 3rd. Switched to run only-prod jobs. (To stop jobs using scratchdisk).
-
SouthGrid
- OX: Recovered from power issues; 3 WNs (older tranche) not recoverable.
- RALPP Some reduction due to dCache upgrades.
-
Scotgrid
- GLA: Running below full capactity; some from DPM, awaiting decommissioning of DPM and relocation, others in new DC.
Other new issues
- QMUL upgrade
- JW to confirm that other sites dependent on QMUL storage are also in downtime.
Ongoing issues
- Sussex
- Grand Unified queues
News round-table
(NTR)
-
Vip
- Data center power issues / air con. now recovered. Lost 3 old WNs approx. 190 cores
-
Dan
- Check that dependent sites (e.g. Cambridge) will transition correctly
-
Matt
- Appears that some Pilots are dying at LANCS; lower priority to Disk failures at the moment
-
Alessandra
- JW - to add to agenda page TPC items that need to be done.
-
Gareth
-
Tim
- Lost files at MAN; AF to redeclare things as lost.
-
JW
AOB
There are minutes attached to this event.
Show them.