RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Videoconference
RAL Tier1 Experiments Liaison Meeting
Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 12:38 12:39
      Major Incidents Changes 1m
    • 12:39 12:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 12:40 12:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 12:41 12:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 12:42 12:43
      Experiment Operational Issues 1m
    • 12:44 12:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      New pledge values from April 1st

       

      Xrootd RPM deployment:

      - Dev ceph cluster is down
      - VMs prevented from access to prod cluster

      Echo Downtime:
       - Batch farm to stop new submissions from tonight
       - Want to take opportunity to switch more jobs to Harvester and multi-job pilots
       - Tape access expected to multihop via Cern for the period. 

      Antares:
       - Delaying T0 export retest until MGM 'fix' is confirmed
      - Lots of (~25%) Operation Expired errors due to antares-tpc01 xrootd service; affecting writes to Antares
      - Might not explain the Recall errors (same error message). 

       

       

    • 12:45 12:46
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      SAM tests are failing due to webdav tests.

      Tape Challenge - it sounds like some fraction of the data chosen to be recalled for tape challenge may be on broken/stuck tapes, and this is due to be fixed by external engineers this afternoon (30 March).

      Recalls in the tape challenge probably also affected by other factors - 

      1. Upgrade of EOS required to fix problem (if one missing file in an FTS batch of requests fails all requests in the batch fail with a 'this file doesn't exist' type error).

      2. Upgrade of Rucio to forthcoming 1.28 required to fix another problem (when resubmissions are triggered, Rucio is no longer aware that multihop jobs consist of two, coupled jobs).

      3. To be confirmed - possible problem with server certificate in CMS-Rucio which may have expired, and might explain the inability of Rucio to cancel FTS requests (I have 2 examples of how I think this was broken). 

      Job efficiencies are ok, a bit below average. 

    • 12:50 12:51
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
    • 12:55 12:58
      VO Liaison LSST 3m
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
    • 13:00 13:01
      VO Liaison Others 1m
    • 13:05 13:06
      Experiment Planning 1m
    • 13:10 13:11
      Euclid 1m
    • 13:15 13:16
      SKA 1m
    • 13:20 13:30
      Dune/protoDune 10m
    • 13:30 13:31
      AOB 1m
    • 13:35 13:36
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))