RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30
      Experiment Operational Issues
    • 1
      VO-Liaison ATLAS
      Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 2
      VO Liaison CMS
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Katy is at CERN for O&C week and will not attend the liaison meeting today. 

      SAM test errors on Friday from disk gateway problems stemming from VMware machine back-ups and high read load from LHCb. Various improvements put in place to make system more robust.

      AAA seeing occasional periods of SAM test errors although traffic is not high. ceph-svc20 has been worst this week; Jyothish is updating XRootd and priority scheduling. 

      UK mini-DC: I talked to Alessandra and she doesn't want to hurry the tape testing given the EOS nodes have only just arrived. She thinks we have enough tests for Echo to perform in the week of the 3rd March, and then we should be able to schedule tape tests for another week. 

    • 3
      VO Liaison LHCb
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Operational issues:

      • ECHO redirectors overload last Friday (GGUS 681943)
        • Redirectors were overloaded, probably because of the high number of LHCb Sprucing jobs
        • Fixed by a few server tweaks
        • Writeable WN gateways could be helpful to remove some load from the redirectors


      News:

      • Preprod farm now has writeable WN gateways
        • LHCb jobs are already using them!
    • 4
      VO Liaison ALICE
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 5
      VO Liaison LSST
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
      • Components all deployed at RAL for MultiSIte testing that is to happen in the next few weeks
        • Butler now contactable from BatchFarm
        • IngestD configured and ready for MultiSite test
        • Kafka picking up messages from US
        • Current tests passing green
      • Potential issues that RAL could face
        • Lancaster has run some tests with actual data and they have seen a large amount of I/O on their CephFS mount on the worker nodes
          • Due to many small files
          • Swapping to DAVS has changed the job I/O behaviour to reduce this but still high amounts
          • Lancs considering using Ceph over CephFS for this workflow
      • Job Slot limit increased to 1000 for when analysis work comes in
      •  

       

      Jobs running well, (SLAC / S3DF in down time hence no jobs today)

       

       

      transfers were failing, but now seem to be working after network changes last night for FTS

    • 6
      VO Liaison APEL
      Speaker: Thomas Dack
    • 7
      WP-D - GPU, Data Management, Other
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 15:00
      Major Incidents Changes
    • 8
      Summary of Operational Status and Issues
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 15:20
      AOB
    • 9
      Any other Business
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore