RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:00 13:01
      Major Incidents Changes 1m
    • 13:01 13:02
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB)), Kieran Howlett (STFC RAL)
    • 13:02 13:03
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:04 13:05
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:05 13:06
      Experiment Operational Issues 1m
    • 13:15 13:16
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Katy was at CHEP for the last 2 meetings.

      Echo problems from Friday until yesterday. Originally thought to be related to reweighting of new disk hardware, was then also blamed on the vRead change hitting Echo with more requests than normal. The number of IOps was too high. SAM tests red on Friday and Saturday. Katy put CMS into drain as jobs were failing at a high rate (lots more stage-out errros). Transfers were also failing. On sunday tests were green as the load was removed - Katy put CMS back into production. 

      On Monday and Tuesday SAM tests failed again and CMS went back into drain automatically. Tuesday afternoon the WN-xrootd-access (accessing Echo) continued to fail. All other tests were green after the vRead changes were removed. The xrootd-access test files were accessible. The xrootd-access tests started passing again about 5 hours after the other tests went green. This delay in passing tests after the end of an incident has been observed several times before. Suspicion that this is related to AAA redirector being blacklisted for too long - a known issue?

      Batch farm upgrades have been ongoing the last week and a half, with several half-batch farm drains. CMS are currently (still) capped at 8k cores due to the suspected pressure on the network in recent weeks. This should be released when we move LHCONE off of Janet. 

      To Do: test Tape REST API

       

    • 13:16 13:17
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)

      ATLAS recovered from the weekend's issue with Echo

      • Affected also SHEF and OX
      • Potential for some cleanup of residual files needed

       

      Ran first test of REST API this morning with (test) production atlas traffic:

      Writes (e.g. https://fts3-atlas.cern.ch:8449/fts3/ftsmon/#/job/02904e96-f495-11ed-8ea4-fa163e5a92fb) and observed archiveinfo api calls in the eso logs

      Will continue with read tests. 

      • Once confirmed, ATLAS will be keen to use this for production. May also wish to try and remove multihop (discussions ongoing). 

       

       

    • 13:20 13:21
      VO Liaison LHCb 1m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
      • Echo problems due to increased IOPs rate after vector read patch application
        • Fixed by rolling back the patch
        • Several corrupted files as a result
      • Problems with uploads to antares
        • Fixed
      • Request to replace service certificate with host certificate on the vobox
        • Security implications should be considered
      • Vector read
        • See slides attached
    • 13:25 13:28
      VO Liaison LSST 3m
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
    • 13:30 13:31
      VO Liaison Others 1m
    • 13:31 13:32
      AOB 1m
    • 13:32 13:33
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))