RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:31
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      SAM test issues:

      1. Timeout failures on svc20 (AAA server) on Friday - Jyothish removed from cluster. Telegraf and Icinga were also down. Jyothish has ticket with Fabric.
      2. Network problems on Saturday
      3. After 2. the other AAA servers and manager failed 'federation' test fairly consistently since. Restarts of the usual services by Katy and Jyothish has not fixed it.
      4. ARC-CE xrootd-access test requires AAA. This has failed intermittently due to 3. Fortunately not every CE is failing the test simultaneously, so we do not get a red mark in the summary. 
      5. New tokens tests for CEs are generally working, but the 'basic' test is in warning due to jobs almost entirely landing on 2018/9 WNs which do not have IPv6 (Tom Birkett might comment).
      6. 'Connection' test for Antares endpoints in warning due to no IPv6 - how are the tests for the new EOS nodes going?

      Job efficiency dropped sharply during the network issue on Saturday. 

      Suspect CMS running empty pilots again - there are major monitoring discrepancies I am seeing (Tuesday night). Have messaged Submission Infrastructure team. 

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      News:

      • 2025 data distribution has started

       

      Issues:

      • Network outage last Saturday
        • Major network outage caused a lot of transfer/job failures
        • Although (it seems) only external connectivity was lost, ECHO redirectors died as well, causing local upload failures too
          • Is it due to packet marking mechanism?
        • IPv6 connectivity is still missing (GGUS:683377)
          • causes some delays in production output validation, which basically executes stats (which are delayted due to xrootd IPv6 preference).



      CVMFS:

      • squid0[56] addition
        • These squid servers should be used in production, to do so they should be added to (cma|atlas|cvmfs)-squid aliases
        • Previous attempt to add them caused problems (GGUS:683332)
          • PTR records for reverse zone were added, causing issues
          • The change was reverted
        • Last week the addresses were added again (only to forward zone, as it should be), but only partially
          • only to cvmfs-squid alias
          • only IPv4 addresses
          • only to interlan DNS
        • Waiting for Fabric/DI to proceed on FAB-1101
      • RAL as an official EESSI repository mirror
        • Any opinions? Technically seems to be possible.
    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      2025/26 tape allocation was added to ALICE accounting.

    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
      • Moved 1.6 Million files for DC2 since yesterday using a WN 60 cores - should be done with data movement tomorrow / Friday morning
      • IngestD update - deploying that today / now
      • LSST:UK meeting tomorrow for general updates
      • Still awaiting fix for DC2 job failure
        • while w14 worked and was fixed and merged
        • w18 is failing unsure if code, data movement or something else

       

       

    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:15 14:25
      Anatares Upgrade 10m

      New EOS nodes
      Repack Progress

      Speakers: George Patargias, Thomas Byrne
    • 14:25 14:35
      XRootD Development 10m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 14:35 14:45
      Utilizing GPUs 10m
      Speakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:45 14:46
      AOB 1m
    • 14:46 14:55
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 14:55 15:00
      Any other Business 5m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore