RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 14:00 14:01
      Major Incidents Changes 1m
    • 14:05 14:06
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore, Kieran Howlett (STFC RAL)
    • 14:10 14:11
      Experiment Operational Issues 1m
    • 14:15 14:16
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      CMS FTS transfers used gridftp following the CMS Rucio upgrade. Not just at RAL, but RAL was one site that saw this - thanks to Jyothish for noticing! The Rucio upgrade has now been rolled back and gridftp transfers have gone back to their usual rate (i.e. ~zero in FTS, and just a couple of transfers per bin in Vande monitoring, which are the WN uploads from RAL T1 batch farm).

      Note: FTS have announced that they will end support on CERN CMS FTS for gridftp on May 7th!

      Katy to move batch farm uploads at RAL from gridftp to davs ASAP.

      Number of cores: other VOs are low on cores. CMS is not. In fact the 12k 'cap' appears to have been released if you look at CMS monitoring. However this does NOT appear in Vande, so some artefact, or possibly related to the pilot overloading.

      It was pointed out today that the OPN is saturated and it is the IPv4 (WNs) that are drawing the data. There is corresponding rise in 'average data read' by CMS jobs. There is also a big jump in read time (possibly greater than proportional to data quantity), and a drop in efficiency taking us below the CMS T1 average.

      Gateway migration: AAA gateway certificates were sorted out and all tests are green again as of this week. The issue was with the job that pulled the certificate was broken. 

      CMS jobs running on the test CEs / pre-prod farm => to be checked

      Katy has pushed for EL9 jobs but CMS seem slow to act.

    • 14:20 14:21
      VO-Liaison ATLAS 1m
      Speakers: Dr Brij Kishor Jashal (RAL, TIFR and IFIC), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)

       

      • RAL was running around its pledge on Monday, but lately, way below its pledge (impact of the draining of the 2021 workernodes) -- Job Accounting (UK Cloud).

       

      • Upcoming QMUL Data Centre Refurbishment: ADCINFR-268.

       

      • Actions from RAL (and ATLAS) on EL8/EL9 job submission? 

       

      • 2024Q1 Resource Review Meeting next week!
    • 14:25 14:26
      VO Liaison Others 1m
    • 14:30 14:31
      VO Liaison LHCb 1m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
      • Because of the batch farm drain, LHCb receives less CPU power than pledged by RAL
        • For the last week we have (see plot attached): 174.2*365*12.7/7 = 115357.7HS23; pledge (minimal) is 140kHS23
      • Vector read optimisation
        • Some atlas job types have lower efficiency on the preprod farm (where the change is applied), which is suspicious
      • Lost files on antares: follow-up
        • Can we ban removals on certain directories (containing raw files) completely?
      • ETF tests for storage are failing on ECHO
        • Tests were failing because stats were not working for top-level "directories"
        • Jyothish fixed the problem, now tests are failing because of the missing "directories"
          • I want to upload a file instead of directory to ECHO to fool this test
    • 14:35 14:38
      VO Liaison LSST 3m
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
    • 14:40 14:41
      VO Liaison APEL 1m
      Speaker: Thomas Dack
    • 14:45 14:46
      AOB 1m
    • 14:50 14:51
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore