RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
Please attend via the following Zoom meeting:
https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09
-
-
12:38
Major Incidents Changes
-
1
Summary of Operational Status and IssuesSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
2
GGUS /RT Tickets
https://tinyurl.com/T1-GGUS-Open
https://tinyurl.com/T1-GGUS-Closed -
3
Site Availability
https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
-
12:42
Experiment Operational Issues
-
4
VO-Liaison ATLASSpeakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
-
5
VO Liaison CMSSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
AAA-related SAM tests were completely broken for a couple of days. This turned out to be because when the manager was rebooted it came back without an IPv6 address. I will go back to watching for intermittent failures related to Echo access (including via AAA). I will try running SAM tests by hand. Ian J also asked me for ways to test the Vector Read, which I feel may be related to these SAM test failures (and likely many other failures and inefficiencies in CMS jobs at RAL). I'm hoping we can try this in a test machine in the next 1-2 weeks.
I also need to continue following up the multiple repeat queries appearing in the redirector logs (which are probably causing a problem, but are definitely filling up the machine with logs within ~20 days).
The IPv6 side of the firewall change was done yesterday. I did a quick test of transfers from Nebraska and Florida before and after the change. The rate after the change was better.
270MB file from Florida - before 42s, after 12 s.
4GB file from Nebraska - before 201s, after 141s.
The proof of the pudding will be the IPv4 change (Monday 8th March).
With Darren I have moved the AAA Vande dashboard into the new area. I am in touch with Christos and planning to add a new plot here to monitor requests to the redirector machine based at RAL. This will be an incomplete picture, but better than nothing.
-
6
VO Liaison LHCbSpeaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
LHCb
- Streaming from ECHO issue
- https://ggus.eu/?mode=ticket_info&ticket_id=142350
- Waiting for information on development of fix to vector reads
- Other tests of alleviation measures
- Issues with job statuses in DIRAC - being investigated
- Low number of running jobs
- https://ggus.eu/?mode=ticket_info&ticket_id=150679
- Under investigation
- Number of running jobs went down from 9.5K to ~3K over last few weeks
- No known obvious reason
- FTS library missing
- https://ggus.eu/?mode=ticket_info&ticket_id=150653
- RAL FTS does not support macaroons
- RAL FTS to be upgraded at some point
DUNE
- Nothing special operationally to report
- From meeting on Monday
- Confirmed that jobs that run in the UK (not too many so far) do read input data from UK storage if available
- Planning for changing data composition in UK storages following this.
- Streaming from ECHO issue
-
7
VO Liaison Others
-
12:53
Experiment Planning
-
8
Dune/protoDune
-
9
Euclid
-
10
SKA
-
12:57
AOB
-
11
Any other BusinessSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
12:38