ATLAS UK Cloud Support
Vidyo
Outstanding tickets
-
147553 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-06-20 09:08:00 UK UKI-NORTHGRID-LANCS-HEP_DATADISK deletion failures
- Closed
-
147390 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-06-25 07:19:00 Failovers from jobs running at UKI-SCOTGRID-GLASGOW_CEPH to CERN backup proxy
- Static route now rolled out to all nodes.
-
147361 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-06-18 08:22:00 Deletion errors at UKI-SCOTGRID-GLASGOW
- Specific files done. Will close the ticket once remaining files in namespace have been proceesed.
-
146771 UKI-SCOTGRID-ECDF less urgent on hold 2020-06-16 15:41:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
- DPM centos 7 migration done; but not completely removed the issue. Some difference between ECDF and other DPM configs.
- Under investigation and will talk to dpm-devs
-
146651 RAL-LCG2 urgent in progress 2020-05-27 10:43:00 singularity and user NS setup at RAL
- If moved to unprivigled, we use our own; else RAL needs support singularity
- Docker makes it look like User namespace is enabled. Singlarity must be able to mount /proc
- JW to follow up with JA
-
146374 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-06-24 16:18:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
- Work on ce in progress
-
145688 UKI-NORTHGRID-MAN-HEP less urgent waiting for reply 2020-06-24 16:43:00 Very old version of squids at UKI-NORTHGRID-MAN-HEP
- Upgrade underway; need to make Frontier squid work with the puppet modules
-
145510 RAL-LCG2 urgent in progress 2020-06-18 05:50:00 RAL-LCG2: timeouts on stage-in/outs
- Problems at ral preventing looking into and closing the ticker
-
144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-06-09 07:59:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
- Needs Access
-
142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
- Needs Access
CPU
Pledge line back; move to cric db in atlas monit
-
RAL
- Powercut, broken software (singularity) update
-
Northgrid
- LANCS; migration done; some residual problems
- using old CE; but upgrade needed
- Dirac workin; atlas needs some work.
- LANCS; migration done; some residual problems
-
London
- RHUL: In test; HC ‘stuck’; action being followed-up
-
SouthGrid
-
Scotgrid
- Durham: Cooling failed; off until Monday
Other new issues
-
RAL-FTS
- ATLAS moved sites from RAL to CERN’s FTS instance
-
Cern DB downtime
- Major DB intervention 27 June; affects many services
- CERN Frontier switched off from afternoon 26th
- Jobs submission to be halted later in day
-
Downtimes:
- Durham: 24-28 Aircon failure, (24) Storage maintainance
- LANCS: 23 Upgrade SEs
- MAN: 22 Arc-ce6
- RAL: 22/23 Power cut
Ongoing issues
-
CentOS7 DPM Lancs
LANCS; migration done; some residual problems
using old CE; but upgrade needed
Dirac workin; atlas needs some work.
CentOS7 - SussexNeeds Access
Glasgow Ceph storageVarious improvements planned; stable running
Will remove from ‘ongoing’ issues
Grand Unified queuesAwaiting SHEF
News round-table
-
Vip
- NTR
-
Dan
- Panda failing; out-of-memory error
- JW To investigate
-
Matt
- NTR
-
Peter
- NTR
-
Alessandra
- NTR
-
Sam
- NTR
-
Tim
- Echo access from James for http to progress on that
-
JW
- NTR