RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Description

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

 

    • 12:38 12:39
      Major Incidents Changes 1m
    • 12:39 12:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 12:40 12:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 12:41 12:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 12:42 12:43
      Experiment Operational Issues 1m
    • 12:44 12:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      Hostname env for WN containers: https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=398494 
       - Open 2 weeks; hope to be deployed later this week?

      RAL-LCG2_TEST job failures:
      https://ggus.eu/index.php?mode=ticket_info&ticket_id=151098 
      Used to push more analysis jobs through; submitted via Harvester central_B (not aCT). 
      Job appears to terminate abruptly;

      Number of "LRMS error: (-1) RemoveReason: Job removed by SYSTEM_PERIODIC_REMOVE due to job running more than once" errors;   likely ATLAS issue?

       

       

    • 12:46 12:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      We have been running with a cap of 3k since the end of last week. We have a mix of Processing and Production jobs, and an average efficiency of ~50% in the last 5 days. This is following a fix by Tom Byrne in the middle of last week on the WN xrootd containers. I am pushing for 'permission' to increase this.

      I am still seeing some jobs with the very low efficiencies (<1%), which eventually fail.

      I have one of the premix libraries at RAL, so I am comparing the campaign using that dataset with other campaigns using premix offsite. So far there is no obvious advantage having the premix onsite...but I want to study it longer.

      Firewall change not now expected until 21st April.

      Vector read...don't know.

    • 12:48 12:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))

      LHCb

      1. Number of running jobs at RAL
         - https://ggus.eu/?mode=ticket_info&ticket_id=150679
         - Still open, but solved now I suppose.

      2. Some files fail to be downloaded at RAL
         - https://ggus.eu/?mode=ticket_info&ticket_id=150898
         - Data corruption issue affecting ~30 files
         - Many of the files deleted now.

      3. ECHO streaming issue
         - https://ggus.eu/?mode=ticket_info&ticket_id=142350
         - Development of xrootd-ceph vector read interface ongoing
           - Testing against the gateways which have the fix for testing

      DUNE

      Nothing to report operationally.

      - Interested in http(s) access to ECHO

    • 12:52 12:53
      VO Liaison Others 1m
    • 12:53 12:54
      Experiment Planning 1m
    • 12:54 12:55
      Dune/protoDune 1m
    • 12:55 12:56
      Euclid 1m
    • 12:56 12:57
      SKA 1m
    • 12:57 12:58
      AOB 1m
    • 12:58 12:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))