RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:34
      Site Operations 4m
    • 13:34 13:35
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Failing SAM tests for data access via CEs reported last week turned out to be a problem with ETF, suffered by many ARC-CE sites for CMS. SAM tests generally green since the fix on Thursday. 

      Production jobs: some big spikes in running cores in the last week, but performance remains good.

      A few tape write failures to investigate. 

      Issue with /store/unmerged/ on Echo not being cleaned up for all files: CMS keeps files that require merging via Merge jobs in this 'directory'. These files are not managed by Rucio. We run Cleanup jobs, which delete unmerged files and typically work well at RAL. However, some files remain, and CMS has a mechanism to delete these after a certain period and after checking those files are no longer needed. This uses ls of directories and does not work on Echo. The test is always green though! Files have built up over the years - we can delete the majority of them. We are considering a long-term solution. 

      DC27: 50% of HL-LHC challenge. Proposed for last week of Feb and first week of March. 

      CMS were testing FTS4 last week and continue this week, working closely with the FTS team. Generally successful so far, found a few things to fix or that hadn't been fully developed yet. Need to keep up the pressure to make sure it is production-ready in the autumn. 

      Pledges: All looks up to date for CMS, tape pledge was given early; CPU pledge appears not to have increased. (pledges are here: https://indico.cern.ch/event/1598195/contributions/6736058/attachments/3197677/5692007/tier1expts2026-v1.pdf)

       

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      News:
       - Lots of WGProduction jobs were running at RAL on Monday, more than 100GB/s read rate from ECHO. All worked fine!

      Operational issues:

      • Corrupted echo files found (GGUS:1002197)
        • 2 recent files, one uploaded on the 20th of January (during echo incident)
        • ~30 old ones, corresponding to 2023 incidents
        • ~270 corrupted due to incorrect user activity (at least it seems so -- files from this user are corrupted at many sites)
        • 2 files with incorrect metadata
      • Resource confirmation request (GGUS:1002186)
      • ECHO gateway NICs overload last Thursday
        • Seem to correspond to recovery traffic increase
          • Is the reason behind this increase known?
    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Alice redirector was unavailable since last Friday afternoon. Redirector machines turned off before the DNS name switch.

    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Thomas Birkett, Timothy Noble (Science and Technology Facilities Council STFC (GB))
      • Found issue why some jobs not running at RAL
        • LSST code issue causing jobs to fail immediately
          • replace all '//' with '/' preventing any paths to files
          • Fixed in next version deployed on Friday
          • Can be seen here in build up of jobs and sudden drop - then retries
      • Transfers of RAW data continue to RAL - 9TB in last week continuing the same rate
      • Failures in Butler transfers prompted request to use RAL FTS again as US based one struggling
        • No issues from FTS site
    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:02 14:07
      Antares Upgrade 5m
      Speakers: George Patargias, Thomas Byrne
    • 14:08 14:13
      XRootD Development 5m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 14:14 14:19
      Utilizing GPUs 5m
      Speakers: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:25 14:26
      AOB 1m
    • 14:27 14:36
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Science and Technology Facilities Council STFC (GB)), Darren Moore, Thomas Birkett
    • 14:45 14:50
      Any other Business 5m
      Speakers: Brian Davies (Science and Technology Facilities Council STFC (GB)), Darren Moore