RAL Tier1 Experiments Liaison Meeting
→
Europe/London
Access Grid (RAL R89)
Access Grid
RAL R89
-
-
13:30
→
13:31
Experiment Operational Issues 1m
-
13:35
→
13:40
ATLAS Operations Report 5mSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
13:40
→
13:45
CMS Operations Report 5mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
Update on last week's reported SAM test issues:
- Timeout failures on svc20 (AAA server) on Friday - Jyothish removed from cluster. Telegraf and Icinga were also down. Jyothish has ticket with Fabric. - UPDATE: was running out of RAM. Jyothish added memory limits which were missing and re-instated to the cluster just before the meeting.
- Network problems - continuing with several problem periods throughout the week.
- After 2. the other AAA servers and manager failed 'federation' test fairly consistently since. Restarts of the usual services by Katy and Jyothish has not fixed it. - UPDATE : restarts on the UK redirector helped with this
- ARC-CE xrootd-access test requires AAA. - Has not been a problem this week.
- New tokens tests for CEs are generally working, but the 'basic' test is in warning due to jobs almost entirely landing on 2018/9 WNs which do not have IPv6 (Tom Birkett might comment). UPDATE: CMS said they are ok with the test being yellow
- 'Connection' test for Antares endpoints in warning due to no IPv6 - how are the tests for the new EOS nodes going? UPDATE: perf tests ongoing but some improvement.
CMS took advantage of other VOs dropping out and claimed a huge number of WNs over the weekend. In general job performance has been good, with just a couple of clear efficiency drops or failure spikes throughout the week.
Transfers:
Periods of excellent transfer rate to buffer and tape. Some file exists errors likely due to network disruption - Katy investigating if clean-up is necessary if auto-mechanism is not effective.
Disk transfer failures have calmed with Echo as destination (could be other end of transfer in any case). With Echo as source errors still look bad - investigating.
-
13:45
→
13:50
LHCb Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
Operational issues:
- Network issues affect LHCb transfers (including transfers over LHCOPN), as well as jobs. GGUS ticket.
- LHCOPN transfers were affected e.g. yesterday due to problems with hostname resolution (e.g. https://fts3-lhcb.cern.ch/fts3/ftsmon/#/job/43c41a4c-40af-11f0-b9e9-fa163e4e8fd9).
CVMFS:- RAL Stratum-1 as an official EESSI repository mirror?
- Network issues affect LHCb transfers (including transfers over LHCOPN), as well as jobs. GGUS ticket.
-
13:50
→
13:55
ALICE Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
13:55
→
14:00
LSST Operations Report 5mSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
- Data movement for DC2 now complete thanks to the WN lent for the purpose
- Working on RC2 data ingestion into the Metadata service now with multiple sections now ingested, but working on others
- Want to get this done soon for data from pipeline tests to be part of an LSST Technote next month
- Some issues with SLAC infrastructure have meant Voms server at SLAC not reliable - defaulting to read only voms at lancs
- There was raised the transition to IAM/VOMS server and it was asked if RAL could run it due to other technical knowledge of the service
- While data movement to RAL has not been greatly effected, nor jobs LSST have noticed issues with RAL FTS transfers with 500 errors, and its not clear if this is now an FTS issue, or a site core issue that FTS noticed around a month or so ago
-
14:00
→
14:01
Tier-1 Projects 1m
-
14:15
→
14:25
Anatares Upgrade 10m
New EOS nodes
Repack ProgressSpeakers: George Patargias, Thomas Byrne -
14:25
→
14:35
XRootD Development 10mSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
- 14:35 → 14:45
-
14:45
→
14:46
AOB 1m
-
14:46
→
14:55
Summary of Operational Status and Issues 9mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
14:55
→
15:00
Any other Business 5mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30
→
13:31