ATLAS UK Cloud Support
→
Europe/London
Zoom
Zoom
,
Description
Meeting to be held via Zoom (https://ukri.zoom.us/j/97404730356)
Password protected (same as OPs Mtg)
Outstanding tickets
- 149752 UKI-NORTHGRID-LANCS-HEP less urgent assigned 2020-12-02 16:07:00 Failovers from University of Lancaster to CERN backup proxy
- Number of stale cvmfs observed (also at Glasgow)
- geoip issues; might be related to Stratum 1 updates?
- refresh cache may be best option
- 149750 UKI-SOUTHGRID-RALPP less urgent in progress 2020-12-02 10:16:00 UKI-SOUTHGRID-RALPP: unable to connect to host
- Problems in FTS transfers for ATLAS (not other VOs). CLI TPC transfers appear ok.
- 149738 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-12-02 15:55:00 UKI-NORTHGRID-LANCS-HEP: deletion errors
- Poor raid card showing issues with many simultaneous interactions (deletions) causing crashing.
- Down to last 25% of data from draning of the seriver.
- Stop draining for today; but should expect some file losses.
- 149705 UKI-SCOTGRID-ECDF less urgent in progress 2020-11-30 11:52:00 UKI-SCOTGRID-ECDF: Low transfer efficiency due to TRANSFER [70] TRANSFER an end-of-file was reached …
- Load on headnode from httpd processes
- From Matt; method to mitigate high mem usage at lancs for http implemented. Might be related issues.
- Load on headnode from httpd processes
- 149362 UKI-SOUTHGRID-RALPP urgent in progress 2020-11-19 10:11:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
- heplnx207 still in downtime (ended post-meeting)
- 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-11-27 10:00:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
- Disk 40; being drained withing decom. Raid set says ok, FS not.
- AC / cooling issues in DPM server room
- 146651 RAL-LCG2 urgent on hold 2020-10-16 11:56:00 singularity and user NS setup at RAL
- on hold, working on underlying issues
- 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-11-05 10:52:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- Arc-ce issues; not reporting back to the monitoring sites
- Communication issue ? GridFTP looks to be working
- Can the BDII / LDAP be queried (from offsite?)
- Status information usually through the BDII.
- To contact the arc-devs?
- To try an LDAP search against BDII
- Patrick to report back to TB support.
CPU
-
RAL
-
Northgrid
-
London
-
SouthGrid
-
Scotgrid
-
Downtime for DPM; Problems with Chillers and AC. Effectively shut down for the moment.
- Some replacements needed.
-
Prod is in DC; which is fine
Other new issues
Ongoing issues
-
CentOS7 - Sussex
-
TPC http
- RAL TPC-http FTS tests working by converting // to / in path.
-
Oxford Storageless tests
-
10GB link working
-
Arc config needed; Sam to send to Vip
-
ECDF unreliable storage
- Rob to update ticket
-
Glasgow LOCALGROUPDISK
- Sam to aim to create Ceph pool.
News round-table
-
Vip
- Production squid server failover yesterday;
- CPU efficiency looks a bit lower?
- prmon to be added: https://github.com/HSF/prmon in monitoring for storageless tests.
-
Dan
- Possible downtime 1wk on the 14th.
- Storm moving ahead to centos7
- Next year disruption expected in DC, dates to be determined.
- Possible downtime 1wk on the 14th.
-
Matt
- NTR; prepare for lost files.
-
Peter
- Considering options for CRC shifter
- Soliciting for CRC shifts.
- Considering options for CRC shifter
-
Sam
-
NTR
-
Gareth
- Continue to work on cooling issues
-
JW
- NTR
-
Patrick
- NTR
AOB
There are minutes attached to this event.
Show them.