RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Description

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

 

    • 12:38 12:39
      Major Incidents Changes 1m
    • 12:39 12:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 12:40 12:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 12:41 12:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 12:42 12:43
      Experiment Operational Issues 1m
    • 12:44 12:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      Fix for failing Xrootd containers killing o(?k) jobs hopefully fixed in Prod.

      Currently observing high failure rates / lost wall time in certain Group Production tasks;  these are direct-io, and spot log checks show IO access issues - need further investigation; Most apparent from 17th March. 

       

      Open question still on discrepancy between Vande and ATLAS monitoring (after nominal scaling).

       

      - DB upgrades: (One planned on 24th March was postponed).

      30 March - CERN (ATONR to ATLR) upgrade of GG to Oracle 19c compatibility
      30 March - CERN ATLR upgrade the database to Oracle 19c
      31 March - IN2P3 (Lyon) DBAMI@CC upgrade database to Oracle 19c  
      31 March - TRIUMF (Canada) DBATL@CC / TR3D@TRIUMF upgrade database to Oracle 19c
      31st March TRIUMF. Because of the time zone, the upgrade will be on March 30th evening
      https://cern.service-now.com/service-portal?id=outage&n=OTG0062675

    • 12:46 12:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
    • 12:48 12:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))

      LHCb

      1. Number of running jobs at RAL
         - https://ggus.eu/?mode=ticket_info&ticket_id=150679
         - Seems "solved" for now
         - Currently running ~8K jobs

      2. Some files fail to be downloaded at RAL
         - https://ggus.eu/?mode=ticket_info&ticket_id=150898
         - Data corruption issue affecting ~30 files

      3. ECHO streaming issue
         - https://ggus.eu/?mode=ticket_info&ticket_id=142350
         - Development of xrootd-ceph vector read interface ongoing
           - Testing against the gateways which have the fix for testing
         - Stopping "mitigation" work
           - Multiple changes of the buffer sizes did not show any promise.
           - Move to using this infrastructure (DIRAC + RAL) to  do stress test
             of the fix above

      DUNE

      Nothing to report operationally.

      - Interested in http(s) access to ECHO

    • 12:52 12:53
      VO Liaison Others 1m
    • 12:53 12:54
      Experiment Planning 1m
    • 12:54 12:55
      Dune/protoDune 1m
    • 12:55 12:56
      Euclid 1m
    • 12:56 12:57
      SKA 1m
    • 12:57 12:58
      AOB 1m
    • 12:58 12:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))