RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Description

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

 

    • 12:38 12:39
      Major Incidents Changes 1m
    • 12:39 12:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 12:40 12:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 12:41 12:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 12:42 12:43
      Experiment Operational Issues 1m
    • 12:44 12:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      *  ATLAS needs to run more single-core analysis jobs
      - https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397775

      * ATLAS hostname env for WN containers
      - https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=398494 

      * Oxford Xcache; Done on RAL side
      - https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397191 

       

      Discrepancy between Vande 100% CPU (for ATLAS)  and  ATLAS Monitoring (cf. Vande * 11.7/10). 
       - to be understood

      ATLAS slowly increasing Single-core running jobs (to ~ 3k).

       

       

      Vector Reads:
         CMS Sam test code can run on gw683 and gw691:
      - See at what frequency problem can be triggered;
      - In parallel try some lower-level tests

       

    • 12:46 12:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      SAM tests are ok, just occasional failures. Transfers seem fine.

      However, real jobs are failing at a very high rate. Efficiency is extremely low. CMS L1s have asked me to organise stopping Processing-type jobs running at RAL, as these are the culprits. Failures are 60-80% and efficiencies are <1% for many jobs. These jobs mostly fail with FileOpen or File Read.

      I changed the redirector fallback from the UK alias to the European alias. This seemed to reduce the number of FileOpen errors (the total number of failures remained high - FileOpen errors were replaced by FileRead errors). 

    • 12:48 12:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
    • 12:52 12:53
      VO Liaison Others 1m
    • 12:53 12:54
      Experiment Planning 1m
    • 12:54 12:55
      Dune/protoDune 1m
    • 12:55 12:56
      Euclid 1m
    • 12:56 12:57
      SKA 1m
    • 12:57 12:58
      AOB 1m
    • 12:58 12:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))