RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
Please attend via the following Zoom meeting:
https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09
-
-
12:38
→
12:39
Major Incidents Changes 1m
-
12:39
→
12:40
Summary of Operational Status and Issues 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
12:40
→
12:41
GGUS /RT Tickets 1m
https://tinyurl.com/T1-GGUS-Open
https://tinyurl.com/T1-GGUS-Closed -
12:41
→
12:42
Site Availability 1m
https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
-
12:42
→
12:43
Experiment Operational Issues 1m
-
12:44
→
12:45
VO-Liaison ATLAS 1mSpeakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
* ATLAS needs to run more single-core analysis jobs
- https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397775* ATLAS hostname env for WN containers
- https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=398494* Oxford Xcache; Done on RAL side
- https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397191Discrepancy between Vande 100% CPU (for ATLAS) and ATLAS Monitoring (cf. Vande * 11.7/10).
- to be understoodATLAS slowly increasing Single-core running jobs (to ~ 3k).
Vector Reads:
CMS Sam test code can run on gw683 and gw691:
- See at what frequency problem can be triggered;
- In parallel try some lower-level tests -
12:46
→
12:47
VO Liaison CMS 1mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
SAM tests are ok, just occasional failures. Transfers seem fine.
However, real jobs are failing at a very high rate. Efficiency is extremely low. CMS L1s have asked me to organise stopping Processing-type jobs running at RAL, as these are the culprits. Failures are 60-80% and efficiencies are <1% for many jobs. These jobs mostly fail with FileOpen or File Read.
I changed the redirector fallback from the UK alias to the European alias. This seemed to reduce the number of FileOpen errors (the total number of failures remained high - FileOpen errors were replaced by FileRead errors).
-
12:48
→
12:49
VO Liaison LHCb 1mSpeaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
-
12:52
→
12:53
VO Liaison Others 1m
-
12:53
→
12:54
Experiment Planning 1m
-
12:54
→
12:55
Dune/protoDune 1m
-
12:55
→
12:56
Euclid 1m
-
12:56
→
12:57
SKA 1m
-
12:57
→
12:58
AOB 1m
-
12:58
→
12:59
Any other Business 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
12:38
→
12:39