RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Description
    • 12:38 12:39
      Major Incidents Changes 1m
    • 12:39 12:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 12:40 12:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 12:41 12:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 12:42 12:43
      Experiment Operational Issues 1m
    • 12:44 12:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      * ATLAS in drain since 0500 (no new jobs since around 0330):
         AREX issue?  Resolved 1100?
      - Follow up on why it affects all CE's 

      * RAL (unified queue) subsequently (and currently) set into TEST by HC:
      """Diag from worker : Condor HoldReason: None ; Condor RemoveReason: removed by SYSTEM_PERIODIC_REMOVE due to job remote status outdated time exceeded (3600*4)."""


       

    • 12:46 12:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      CMS went into drain a week ago, this was fixed on Thursday by reverting a change that was originally made in Oct, but it's still a mystery what happened exactly.

      https://ggus.eu/index.php?mode=ticket_info&ticket_id=150207

      I am seeing some odd dips in running cores in the last 2 days plus today. We appear to drop 10-20% of cores in the monit-grafana, however this is not observed in Vande. 

      Otherwise, we are running at 400% of pledge due to ATLAS problems with ARC-CE 01 overnight.

      I am seeing SAM test failures appearing on all ARC-CEs, of type xrootd. Possibly related to getting files from Echo? I need to investigate. 

      The debug/loadtest tests for the tape have stopped again. I can see in the Site Readiness than the FTS status has been corrected. We are still receiving no new transfers to tape as we are at pledge. I asked if any deletions could be done (in anticipation of the tape migration to Spectra in about one month) but no reply yet. I also provided the file dump to those doing the consistency checking, but no response either.

      No update on the network changes planned by DI.

    • 12:48 12:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
    • 12:52 12:53
      VO Liaison Others 1m
    • 12:53 12:54
      Experiment Planning 1m
    • 12:54 12:55
      Dune/protoDune 1m
    • 12:55 12:56
      Euclid 1m
    • 12:56 12:57
      SKA 1m
    • 12:57 12:58
      AOB 1m
    • 12:58 12:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))