RAL Tier1 Experiments Liaison Meeting
→
Europe/London
Access Grid (RAL R89)
Access Grid
RAL R89
-
-
13:30
Experiment Operational Issues
-
1
ATLAS Operations ReportSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
2
CMS Operations ReportSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
SAM test issues:
- Timeout failures on svc20 (AAA server) on Friday - Jyothish removed from cluster. Telegraf and Icinga were also down. Jyothish has ticket with Fabric.
- Network problems on Saturday
- After 2. the other AAA servers and manager failed 'federation' test fairly consistently since. Restarts of the usual services by Katy and Jyothish has not fixed it.
- ARC-CE xrootd-access test requires AAA. This has failed intermittently due to 3. Fortunately not every CE is failing the test simultaneously, so we do not get a red mark in the summary.
- New tokens tests for CEs are generally working, but the 'basic' test is in warning due to jobs almost entirely landing on 2018/9 WNs which do not have IPv6 (Tom Birkett might comment).
- 'Connection' test for Antares endpoints in warning due to no IPv6 - how are the tests for the new EOS nodes going?
Job efficiency dropped sharply during the network issue on Saturday.
Suspect CMS running empty pilots again - there are major monitoring discrepancies I am seeing (Tuesday night). Have messaged Submission Infrastructure team.
-
3
LHCb Operations ReportSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
News:
- 2025 data distribution has started
Issues:
- Network outage last Saturday
- Major network outage caused a lot of transfer/job failures
- Although (it seems) only external connectivity was lost, ECHO redirectors died as well, causing local upload failures too
- Is it due to packet marking mechanism?
- IPv6 connectivity is still missing (GGUS:683377)
- causes some delays in production output validation, which basically executes stats (which are delayted due to xrootd IPv6 preference).
CVMFS:- squid0[56] addition
- These squid servers should be used in production, to do so they should be added to
(cma|atlas|cvmfs)-squid
aliases - Previous attempt to add them caused problems (GGUS:683332)
- PTR records for reverse zone were added, causing issues
- The change was reverted
- Last week the addresses were added again (only to forward zone, as it should be), but only partially
- only to
cvmfs-squid
alias - only IPv4 addresses
- only to interlan DNS
- only to
- Waiting for Fabric/DI to proceed on FAB-1101
- These squid servers should be used in production, to do so they should be added to
- RAL as an official EESSI repository mirror
- Any opinions? Technically seems to be possible.
-
4
ALICE Operations ReportSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
2025/26 tape allocation was added to ALICE accounting.
-
5
LSST Operations ReportSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
- Moved 1.6 Million files for DC2 since yesterday using a WN 60 cores - should be done with data movement tomorrow / Friday morning
- IngestD update - deploying that today / now
- LSST:UK meeting tomorrow for general updates
- Still awaiting fix for DC2 job failure
- while w14 worked and was fixed and merged
- w18 is failing unsure if code, data movement or something else
-
14:00
Tier-1 Projects
-
6
Anatares Upgrade
New EOS nodes
Repack ProgressSpeakers: George Patargias, Thomas Byrne -
7
XRootD DevelopmentSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
-
8
Utilizing GPUsSpeakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
-
14:45
AOB
-
9
Summary of Operational Status and IssuesSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
10
Any other BusinessSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30