RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:31
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Brij Kishor Jashal (Rutherford Appelton Laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Generally smooth operations - job efficiencies are good again. Long read times for jobs need to be checked - the data for these jobs was being read from Offsite (not just premix libraries but primary datasets). These jobs are primarily Processing jobs reading data from CERN, which has always worked well in the past but seeing such high input data read times is unusual. 

      SAM tests on Antares were failing due to EOS node hardware failure, and the test files were not in the correct place. Production was unaffected, but still CMS put us into 'Rucio drain' which we are yet to escape. Tom Byrne asked Katy to have the SAM test files retransferred on Monday. Katy performed the 'declare bad' procedure on Monday afternoon. This has removed the replica according to Rucio, but because we do not run the reaper on tape sites (except during tape deletion campaigns) the file was probably never removed. Therefore when the Antares team "re-enabled this morning the antares-eos01 filesystems as read-only" on Tuesday morning, the tests went green again. In the meantime the new copy of the SAM test files cannot reach RAL because we are in Rucio drain. I am now trying to persuade CMS to remove us from Rucio drain...

      Also CMS went into production job drain, so we have picked up a lot of User Analysis jobs, but I have this morning re-enabled Production. CMS have been running a large proportion of the farm since the weekend. 

      Mention again the proposed metadata tape challenge.

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Operational issues:

      • LHCb drained on Saturday due to problems with LHCb Bookkeeping.
        • The number of running jobs is still a bit lower than usual, probably due to CMS taking over the farm?
      • A few failed uploads to ECHO from external sites yesterday afternoon and this morning.
        • The errors are hostname resolution failures -- were there any known DNS-related issues?
      • A few buggy WG Productions were submitted by LHCb, resulting in job failure spikes
        • Linked to xrootd 5.1.1 client, which cannot execute vector reads for xrootd-based storage properly.

       

      News:

      • Token-based ETF test for ARC CEs added to LHCb preprod environment (see e.g. here)
        • So far the tests are failing for almost all sites
        • At RAL LHCb token configuration seems to be missing in the ARC config. GSTSM-521.
    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:15 14:25
      Anatares Upgrade 10m

      New EOS nodes
      Repack Progress

      Speakers: George Patargias, Thomas Byrne
    • 14:25 14:35
      XRootD Development 10m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 14:35 14:45
      Utilizing GPUs 10m
      Speakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:45 14:50
      SSD Storage Evaluation 5m
    • 14:50 14:55
      Echo deployment 5m
    • 15:00 15:01
      AOB 1m
    • 15:01 15:10
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 15:10 15:15
      Any other Business 5m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore