ATLAS UK Cloud Support
Vidyo
Outstanding tickets
-
147698 UKI-SCOTGRID-DURHAM less urgent assigned 2020-07-01 15:32:00 UKI-SCOTGRID-DURHAM squid down
- Assigned; VM / to reboot
-
146771 UKI-SCOTGRID-ECDF less urgent reopened 2020-07-01 22:18:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
- reopened; hoped that update to centos7 would have resolved most issues
-
146651 RAL-LCG2 urgent in progress 2020-05-27 10:43:00 singularity and user NS setup at RAL
- Timescale and planning underway with Grid service
-
146374 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-06-24 16:18:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
- On Hold
-
145688 UKI-NORTHGRID-MAN-HEP less urgent in progress 2020-06-30 06:45:00 Very old version of squids at UKI-NORTHGRID-MAN-HEP
- Almost complete; test squid online; try new version for few days on one production squid. Then rollout.
-
145510 RAL-LCG2 urgent in progress 2020-06-29 07:33:00 RAL-LCG2: timeouts on stage-in/outs
- Pilot update seems to have improved situation; However was a spike in timeout activity. Will try to close
-
144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-06-09 07:59:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
- on Hold; access may become increasingly restricted
-
142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- on Hold
CPU
-
RAL
- Problem in aCT appear. Fixed by 2200, but taking time to reclaim the lost slots
-
Northgrid
- Durham; Aircon on but below full efficiency; may take time to get jobs back in the queue from the backlog of other jobs.
-
London
- QMUL: To investigate memory issues from jobs
-
SouthGrid
-
Scotgrid
Other new issues
Ongoing issues
-
CentOS7 - Sussex
- On Hold
-
Grand Unified queues
- On Hold
News round-table
-
Vip
- Downtine arc; next Wednesday 1 day
-
Dan
- JW: To look at the memory failures QMUL
- https://bigpanda.cern.ch/wns/UKI-LT2-QMUL/?hours=12
-
Matt
(via email)- Our new CE is coming slowly; but managing with current version. Details for our new ARC CE as soon as I get it accepting jobs.
-
Peter
- new CE in progress
-
Alessandra
- NTR
-
Sam
- Can run current number of jobs on CEPH ok;
- To online all CPU, would want to add more redirectors
-
Gareth
- CEPH site; still in mentions test in some ATLAS pages
- AF to ask what situation with CEPH in ATLAS and how to properly put into production
- SAM to update the JIRA.
- DPM can drain once ATLAS is happy with CEPH.
- Increased restriction to server room with recirculated air - under discussion
- http://adc-ddm-mon.cern.ch/ddmusr01/plots/plots.php?endpoint=UKI-SCOTGRID-GLASGOW-CEPH_DATADISK
- https://fts3-pilot.cern.ch:8449/fts3/ftsmon/#/?vo=&source_se=gsiftp:%2F%2Fcephc04.gla.scotgrid.ac.uk&dest_se=&time_window=1
- http://adc-ddm-mon.cern.ch/ddmusr01/plots/plots.php?endpoint=UKI-SCOTGRID-GLASGOW_DATADISK
- CEPH site; still in mentions test in some ATLAS pages
-
JW
- TPC; xrootd back for RAL-LCG2 and RAL-CEPH; working on http with minor sucesses