ATLAS UK Cloud Support
→
Europe/London
Zoom
Zoom
,
Description
Meeting to be held via Zoom (https://ukri.zoom.us/j/97404730356)
Password protected (same as OPs Mtg)
Outstanding tickets
- 149362 UKI-SOUTHGRID-RALPP urgent in progress 2020-11-13 08:45:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
- Issue in adding new CE with AGIS/CRIC (see below)
- Site to take ce into downtime on Monday for general cleanup
- 148968 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-11-19 06:53:00 UKI-NORTHGRID-LANCS-HEP: deletion and transfer failures
- gridFTP restarted; looking better, but will keep an eye
- Other non-Lancs issues with Italy sites, adds a bit of confusion
- Napoli issue with https available only on LHCONE (via certain IPvX?) whereas,
- gridFTP available on non LHCONE
- 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-11-12 17:24:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
- Sam to take a look at problem files
- 146651 RAL-LCG2 urgent on hold 2020-10-16 11:56:00 singularity and user NS setup at RAL
- no update
- 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-11-05 10:52:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- no update
CPU
-
RAL
-
Northgrid
-
London
-
Server with occasional bad memory issue
- To discuss with manufactuer to attempt a proper fix (no just bios update)
-
SouthGrid
-
Scotgrid
- Durham Priorty user that takes the all the priorty, causing loss of ATLAS jobs.
- Some job loss also from HC test failures from missing files
- GLA: CMVFS update for CMS, some unindented consequences caused problems
- GLA: Bringing online additional capacity slowly; aim before Christmas full capactiy.
- Probable identification of high iops in ceph cluster from offsite xroot direct access reads
- user from cern, accessing scratchdisk
- Swtich off to see if this solves the issue.
- Probable identification of high iops in ceph cluster from offsite xroot direct access reads
- Durham Priorty user that takes the all the priorty, causing loss of ATLAS jobs.
Other new issues
-
Recent Switcher problem with AGAS/CRIC sending many emails
- Concerns raised on approprate use of mailing lists.
- ATLAS uk has cloud-support, uk comp operations, and uk comp users.
- The comp users list has been unused for 5 years, and it was agreed to be removed
- For cloud support, this remains the most active discussion list, and will be unchanged.
- The comp operations contains the daily summary and Switcher notifications. Non automated traffic is on the order of 1 email per year; which may have been unintentionally intended for cloud support.
- It was decided to keep the Swticher and Daily summary in this list. A simple filter can remove any unwanted emails.
-
Queues:
- Long-term queues that are not disabled, but not running production:
- UK ANALY_MANC_TEST_SL7: Still needed
- UK ANALY_QMUL_GPU_TEST: -> could be renamed to non test
- RAL-LCG2_TEST: -> not actively used (see comment from Peter)
- RAL-LCG2_UCORE: Can be disabled
- UKI-NORTHGRID-LANCS-HEP_TEST (see comment from Peter)
- UKI-NORTHGRID-MAN-HEP_TEST; testbed -> keep
- UKI-SCOTGRID-GLASGOW_CEPH_TEST: keep
- UKI-SOUTHGRID-OX-HEP_TEST: (see comment from Peter)
- UKI-SOUTHGRID-SUSX_UCORE: not test, should become production, might want remaining
- Peter uses TEST queuse for dev test work monitoring
- QM test queue might be useful
- Long-term queues that are not disabled, but not running production:
Ongoing issues
-
CentOS7 - Sussex
- no update
-
Datadisk; watermark reduced.
-
LOCALGROUP disk
- New pool to be created shortly
-
TPC:
-
Naples -> moved to DPM 1.14.2, networking blocked 443 ipv6, ipv4 open on general network
-
Affects whole of UK (e.g. Lancs, MAN)
-
Retry transfer failures
-
Vunerability from DPM, and dCache
- Beleive all UK DPM sites up to date (or not affected)
- dCache issue announced in appropriate channels
-
News round-table
- Vip
- Had to leave before end; NTR
- Dan
- NTR
- Matt
- NTR; away for next week’s meeting
- Peter
- NTR
- http://atlas-cric.cern.ch/api/atlas/pandaqueue/query/?json for Cric status of queues
- Alessandra
- NTR
- Gareth
- The two CE’s recently added to Glasgow will stay in downtime for time being.
- JW to check they are included correctly in CRIC / AGIS.
- JW
- NTR
- Sam;
- Final talk available for Workshop.
- Positive comments on updates to talk draft
- Tables are now much better
AOB
NTR
There are minutes attached to this event.
Show them.