Outstanding tickets
- 149842 UKI-SCOTGRID-ECDF less urgent in progress 2021-01-03 13:08:00 UKI-SCOTGRID-ECDF: Low transfer efficiency due to TRANSFER ERROR: Copy failed with mode 3rd pull, wi…
- Manual blacklisting on WAN transfers set over New year period;
- Test of whitelisting shows no improvement since new year.
- Needs input from site to understand situation.
- 149362 UKI-SOUTHGRID-RALPP urgent in progress 2021-01-05 12:35:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
- Ongoing; Peter to try and look from apfmon side
- 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2021-01-04 12:01:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
- DPM transfer failures; Sam to check if from old files that should be cleared from the namespace.
- CEPH still running well; but current job mix is testing the caching
- On Ceph, Internal xrootd Cache filling up due to intesive sets of jobs;
- Purging of old files in xrootd cache to understand better; may be that all files in cache are in active usage?
- Still needing to move final compute capacity; requiring on-site work.
- 146651 RAL-LCG2 urgent on hold 2020-12-18 10:48:00 singularity and user NS setup at RAL
- Remains awaiting updates to underlying software stack; no date is given by Grid Services
- 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-12-18 09:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- Matt, some discussions on certs with site held.; will try to get recent update.
CPU
- RAL
- Failure of one CE over Christmas resulted in drained slots. Other VOs running fine, so took ~ 2days to fully reclaim slots.
- Gareth mentioned that one CE might be Primiary; which, if fails, drops all jobs.
- Not obvious in AGIS/CRIC if one is set to primary
- Northgrid
- LANCS: Fairshare issues; other VOs taking numbers of slots.
- Might need to reduce the time window on SGE to allocate slots; Also a dropoff parameter to change the weighting of older jobs
-
- London
- QMUL: Reinstalled batch system; crashing every 10mins; related to (new) arc accouting?
- Legacy accounting had issues with processing to the sqlite database.
- Major upgrade to DC planned for early in year; but timing still be be finalised
- SouthGrid
- Scotgrid
- 10Gb to 1Gb negotiation issue in networking caused drop in jobs over the new year.
Other new issues
Ongoing issues
News round-table
- Dan
- Matt
- Peter
- Sam
- Gareth
- Q/R’s needed by month-end
- JW
- Duncan
- Was confirmed that Sheffield has no storage set up.
- Discussion on IO demands e.g. 5k cores (glasgow) * 0.5MB/s/core for future UK and Glasgow requirements
AOB
There are minutes attached to this event.
Show them.