RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Description

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

 

    • 12:38 12:39
      Major Incidents Changes 1m
    • 12:39 12:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 12:40 12:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 12:41 12:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 12:42 12:43
      Experiment Operational Issues 1m
    • 12:44 12:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      O(3k) job failures overnight from 9 problematic WNs ( no running xrootd containers, but still accepting jobs).

      Interest expressed (ATLAS and NA62) on expected CTA timeline.

      TPC http pull bug fix available; but not in time for 5.1.0
       - Running through normal tests now. 

    • 12:46 12:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      AAA-related SAM tests were completely broken for a couple of days. This turned out to be because when the manager was rebooted it came back without an IPv6 address. I will go back to watching for intermittent failures related to Echo access (including via AAA). I will try running SAM tests by hand. Ian J also asked me for ways to test the Vector Read, which I feel may be related to these SAM test failures (and likely many other failures and inefficiencies in CMS jobs at RAL). I'm hoping we can try this in a test machine in the next 1-2 weeks.

      I also need to continue following up the multiple repeat queries appearing in the redirector logs (which are probably causing a problem, but are definitely filling up the machine with logs within ~20 days).

       

      The IPv6 side of the firewall change was done yesterday. I did a quick test of transfers from Nebraska and Florida before and after the change. The rate after the change was better.

      270MB file from Florida - before 42s, after 12 s.

      4GB file from Nebraska - before 201s, after 141s.

      The proof of the pudding will be the IPv4 change (Monday 8th March).

       

      With Darren I have moved the AAA Vande dashboard into the new area. I am in touch with Christos and planning to add a new plot here to monitor requests to the redirector machine based at RAL. This will be an incomplete picture, but better than nothing.

    • 12:48 12:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))

      LHCb

      1. Streaming from ECHO issue
        • https://ggus.eu/?mode=ticket_info&ticket_id=142350
        • Waiting for information on development of fix to vector reads
        • Other tests of alleviation measures
          • Issues with job statuses in DIRAC - being investigated
      2. Low number of running jobs
        • https://ggus.eu/?mode=ticket_info&ticket_id=150679
        • Under investigation
        • Number of running jobs went down from 9.5K to ~3K over last few weeks
        • No known obvious reason
      3. FTS library missing
        • https://ggus.eu/?mode=ticket_info&ticket_id=150653
        • RAL FTS does not support macaroons
        • RAL FTS to be upgraded at some point

      DUNE

      • Nothing special operationally to report
      • From meeting on Monday
        • Confirmed that jobs that run in the UK (not too many so far) do read input data from UK storage if available
        • Planning for changing data composition in UK storages following this.
    • 12:52 12:53
      VO Liaison Others 1m
    • 12:53 12:54
      Experiment Planning 1m
    • 12:54 12:55
      Dune/protoDune 1m
    • 12:55 12:56
      Euclid 1m
    • 12:56 12:57
      SKA 1m
    • 12:57 12:58
      AOB 1m
    • 12:58 12:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))