RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
Please attend via the following Zoom meeting:
https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09
-
-
12:38
→
12:39
Major Incidents Changes 1m
-
12:39
→
12:40
Summary of Operational Status and Issues 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
12:40
→
12:41
GGUS /RT Tickets 1m
https://tinyurl.com/T1-GGUS-Open
https://tinyurl.com/T1-GGUS-Closed -
12:41
→
12:42
Site Availability 1m
https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
-
12:42
→
12:43
Experiment Operational Issues 1m
-
12:44
→
12:45
VO-Liaison ATLAS 1mSpeakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
-
12:46
→
12:47
VO Liaison CMS 1mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
AAA-related SAM tests were completely broken for a couple of days. This turned out to be because when the manager was rebooted it came back without an IPv6 address. I will go back to watching for intermittent failures related to Echo access (including via AAA). I will try running SAM tests by hand. Ian J also asked me for ways to test the Vector Read, which I feel may be related to these SAM test failures (and likely many other failures and inefficiencies in CMS jobs at RAL). I'm hoping we can try this in a test machine in the next 1-2 weeks.
I also need to continue following up the multiple repeat queries appearing in the redirector logs (which are probably causing a problem, but are definitely filling up the machine with logs within ~20 days).
The IPv6 side of the firewall change was done yesterday. I did a quick test of transfers from Nebraska and Florida before and after the change. The rate after the change was better.
270MB file from Florida - before 42s, after 12 s.
4GB file from Nebraska - before 201s, after 141s.
The proof of the pudding will be the IPv4 change (Monday 8th March).
With Darren I have moved the AAA Vande dashboard into the new area. I am in touch with Christos and planning to add a new plot here to monitor requests to the redirector machine based at RAL. This will be an incomplete picture, but better than nothing.
-
12:48
→
12:49
VO Liaison LHCb 1mSpeaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
LHCb
- Streaming from ECHO issue
- https://ggus.eu/?mode=ticket_info&ticket_id=142350
- Waiting for information on development of fix to vector reads
- Other tests of alleviation measures
- Issues with job statuses in DIRAC - being investigated
- Low number of running jobs
- https://ggus.eu/?mode=ticket_info&ticket_id=150679
- Under investigation
- Number of running jobs went down from 9.5K to ~3K over last few weeks
- No known obvious reason
- FTS library missing
- https://ggus.eu/?mode=ticket_info&ticket_id=150653
- RAL FTS does not support macaroons
- RAL FTS to be upgraded at some point
DUNE
- Nothing special operationally to report
- From meeting on Monday
- Confirmed that jobs that run in the UK (not too many so far) do read input data from UK storage if available
- Planning for changing data composition in UK storages following this.
- Streaming from ECHO issue
-
12:52
→
12:53
VO Liaison Others 1m
-
12:53
→
12:54
Experiment Planning 1m
-
12:54
→
12:55
Dune/protoDune 1m
-
12:55
→
12:56
Euclid 1m
-
12:56
→
12:57
SKA 1m
-
12:57
→
12:58
AOB 1m
-
12:58
→
12:59
Any other Business 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
12:38
→
12:39