RAL Tier1 Experiments Liaison Meeting

Name: RAL Tier1 Experiments Liaison Meeting
Start: 2025-02-05T13:30:00+00:00
End: 2025-02-05T15:50:00+00:00
Location: RAL R89

Wednesday 5 Feb 2025, 13:30 → 15:50 Europe/London

Access Grid (RAL R89)

Access Grid

RAL R89

66811541532

Alastair Dewhurst

Join via phone

- 13:30 → 13:31
  
  Experiment Operational Issues 1m
- 13:35 → 13:45
  
  VO-Liaison ATLAS 10m
  
  Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
- 13:45 → 13:55
  
  VO Liaison CMS 10m
  
  Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
  
  Katy is at CERN for O&C week and will not attend the liaison meeting today.
  
  SAM test errors on Friday from disk gateway problems stemming from VMware machine back-ups and high read load from LHCb. Various improvements put in place to make system more robust.
  
  AAA seeing occasional periods of SAM test errors although traffic is not high. ceph-svc20 has been worst this week; Jyothish is updating XRootd and priority scheduling.
  
  UK mini-DC: I talked to Alessandra and she doesn't want to hurry the tape testing given the EOS nodes have only just arrived. She thinks we have enough tests for Echo to perform in the week of the 3rd March, and then we should be able to schedule tape tests for another week.
- 13:55 → 14:05
  VO Liaison LHCb 10m
  
  Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
  
  325b8251f64ea2304386ad7db7c52b57.png
  
  606e25263820139eae6f5950366e7f89.png
  Operational issues:
  
  ECHO redirectors overload last Friday (GGUS 681943)
  
  Redirectors were overloaded, probably because of the high number of LHCb Sprucing jobs
  
  Fixed by a few server tweaks
  
  Writeable WN gateways could be helpful to remove some load from the redirectors
  
  News:
  
  Preprod farm now has writeable WN gateways
  
  LHCb jobs are already using them!
- 14:10 → 14:20
  
  VO Liaison ALICE 10m
  
  Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
- 14:20 → 14:30
  VO Liaison LSST 10m
  
  Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
  Components all deployed at RAL for MultiSIte testing that is to happen in the next few weeks
  
  Butler now contactable from BatchFarm
  
  IngestD configured and ready for MultiSite test
  
  Kafka picking up messages from US
  
  Current tests passing green
  
  Potential issues that RAL could face
  
  Lancaster has run some tests with actual data and they have seen a large amount of I/O on their CephFS mount on the worker nodes
  
  Due to many small files
  
  Swapping to DAVS has changed the job I/O behaviour to reduce this but still high amounts
  
  Lancs considering using Ceph over CephFS for this workflow
  
  Job Slot limit increased to 1000 for when analysis work comes in
  
  Jobs running well, (SLAC / S3DF in down time hence no jobs today)
  
  transfers were failing, but now seem to be working after network changes last night for FTS
- 14:30 → 14:40
  
  VO Liaison APEL 10m
  
  Speaker: Thomas Dack
- 14:45 → 14:55
  
  WP-D - GPU, Data Management, Other 10m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
- 15:00 → 15:01
  
  Major Incidents Changes 1m
- 15:05 → 15:15
  
  Summary of Operational Status and Issues 10m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
  
  Weekly Report 05 February 2025.docx
  
  Weekly Report 05 February 2025.pdf
- 15:20 → 15:21
  
  AOB 1m
- 15:22 → 15:32
  
  Any other Business 10m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore

Choose timezone

RAL Tier1 Experiments Liaison Meeting

Access Grid

RAL R89