RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:00 13:01
      Major Incidents Changes 1m
    • 13:01 13:02
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 13:02 13:03
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:04 13:05
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:05 13:06
      Experiment Operational Issues 1m
    • 13:15 13:16
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Christmas went well for CMS. No particular problems. I noticed some SAM test errors on the CE tests relating to the RAL based AAA service starting on New Year at midnight, so I restarted services on the proxies and the manager, and things look better since. 

       

      Some progress on the long-standing problem with LogCollect jobs always failing at RAL. https://ggus.eu/?mode=ticket_info&ticket_id=141120

      'Newer' campaigns are considerably more successful whereas older/legacy campaigns still fail at 100%. 

       

      Before Christmas I was investigating why CMS consistency checking is not removing 'dark data' from Echo. There are ~3500 files in total. I checked how many were stub files and the answer was only ~30. I removed these. I could not see deletion attempts at RAL for the non-stub files, so I wonder if this is a Rucio issue and I need to check the Rucio daemon pod logs more carefully to see if these files ever have a deletion attempt. 

       

      I'm currently looking at some errors on transfers from Echo to Antares, 150 this morning

      TRANSFER [19] Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3005] /PATH  --server IP|HOST:PORT/PATH IP|HOST:PORT/PATH
    • 13:16 13:17
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)

      Generally quiet over the period; ~ ran at pledge (averaged) over the period.

      (Would like to have a proper discussion about the farm job allocations, e.g. fairshare, atlas partitioning, soon.)

       

      No specific issues with transfers.

       

      Will be changing the Oxford Xcache prefetch and request buffer size  (to match the needs of Virtual placement), and will monitor for any impact on the External gateways at RAL. 

    • 13:20 13:21
      VO Liaison LHCb 1m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 13:25 13:28
      VO Liaison LSST 3m
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
    • 13:30 13:31
      VO Liaison Others 1m
    • 13:31 13:32
      AOB 1m
    • 13:32 13:33
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))