RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:00
→
13:01
Major Incidents Changes 1m
-
13:01
→
13:02
Summary of Operational Status and Issues 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB)), Kieran Howlett (STFC RAL)
-
13:02
→
13:03
GGUS /RT Tickets 1m
https://tinyurl.com/T1-GGUS-Open
https://tinyurl.com/T1-GGUS-Closed -
13:04
→
13:05
Site Availability 1m
https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
-
13:05
→
13:06
Experiment Operational Issues 1m
-
13:15
→
13:16
VO Liaison CMS 1mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
I should have mentioned a few weeks ago (perhaps missed as it was the resource review meeting?) that token (SAM) tests are now green on the AAA machines. So we can now pass token tests for both webdav and root, thanks to James W for testing and Jyothish for rolling out. Next step could be a change control in preparation for roll-out on all production gateways?
Red day for webdav tests on Friday.
CMS running 8k cores, performance is excellent. Possible to raise the cap on cores back to 10 or 12k?
DC24 pre-tests for RAL and other UK sites are in discussion. Possible 100TB test coming soon to T1 and IC, 50TB for other sites.
RALPP to T1 link is 'fixed' by Chris Brew - issues with firewall. Could be an issue in the future though (legacy network).
On tape - spotted a consistent ~1.8% failure rate on transfers between Echo and Antares. Likely cause was tpc03 (1 of 4 machines) being out of action. If 3 attempts to transfer each landed on tpc03 then 0.25x0.25x0.25 = 1.56%. George fixed tpc03 and looks like the problem has gone.
George notified me of a very large file that could not copy into Antares due to its size (146GB). The limit is 128.8GB, but he lifted it temporarily for this one file (created in 2018).
-
13:16
→
13:17
VO-Liaison ATLAS 1mSpeakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
- RAL-LCG2 staging Issue “File not found” (https://ggus.eu/index.php?mode=ticket_info&ticket_id=162827):
- https://its.cern.ch/jira/browse/EOS-5771
- It's now appropriate for the antares CTA team to take over this GGUS ticket, and follow it up in the JIRA.
- A fix for this will be included in EOS 5.X, but a need / request to back-port into CTA (EOS 4) needs to be pushed.
- ATLAS jobs failing with: "failed to close file descriptor: bad file descriptor": https://stfc.atlassian.net/browse/GS-131
- Propose to create a GGUS so that any progress can be more visibly tracked and discussed.
- atlas:test file dumps: https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=485639 [since 12 July 2023].
- Requires Aquilon changes, best done by the appropriate (storage) team.
- RAL-LCG2 staging Issue “File not found” (https://ggus.eu/index.php?mode=ticket_info&ticket_id=162827):
-
13:20
→
13:21
VO Liaison LHCb 1mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
Tickets:
- RAL->SARA transfer issues
- Mitigation is in place (LHCONE higher priority), SARA is now accessible
- Waiting for CERN admins to set up a proper solution
- Redirector setup
- No updates
- Slow checksums
- No updates
Operational issues:
- Last week we encountered a lot of vector read failures
- See slides attached
- Upload problems are still happening periodically
- Gateway issues?
- Number of running lhcb jobs is still unstable
- LHCb dirac release on Monday, the other one should come today
- Not without issues
- RAL->SARA transfer issues
-
13:25
→
13:28
VO Liaison LSST 3mSpeaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
-
13:30
→
13:31
VO Liaison Others 1m
-
13:31
→
13:32
AOB 1m
-
13:32
→
13:33
Any other Business 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
13:00
→
13:01