RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

    • 13:38 13:39
      Major Incidents Changes 1m
    • 13:39 13:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 13:40 13:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:41 13:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:42 13:43
      Experiment Operational Issues 1m
    • 13:44 13:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      * RT: #296706: Optimise XRootD checksums for TPC transfers from ECHO
      * RT: #313743: Update of SRR to v6 required specs
      * AD to update Corepower 

      Staging from latest reprocessing campaign going well; comment from atlas:
      "Staging throughput at RAL has been high and stable (DDM plot attached). what are your secrets to reach such a good performance ?"
       - Would be good to send back a current summary.

       

    • 13:46 13:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Update from SL6 to SL7 by George P to the CMS AAA RAL redirector. We had trouble figuring out why I could not make an xrdcp through the redirector, so we did not put it back in the UK alias. xrootd-cms-uk is the name of the VM. On Friday we figured out the config, and put it back in the alias on Monday. Possibly related - a lot of failing jobs over the weekend with 'secondary inputs' offsite. The idea is that RAL can use the 2 other redirectors at IC, but I'm not sure if this wasn't working, or was overloaded, as a possible cause of the failures of Processing jobs, image attached.

      Job failure rate briefly improved, but looks bad again in the last couple of days. Job efficiency is currently low, ~40%. Data read time is high. 

       

       

    • 13:48 13:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))

      Apologies - last week has been unconventional for me.

      LHCb

      1. Streaming data out of RAL
        • Fairly detailed study now of errors seen
        • 80% of failures are with "Operation Expired".
          • First target this issue for now.
          • This failure is always fatal
        • gitHub issue opened with xrootd developers
          • https://github.com/xrootd/xrootd/issues/1259
      2. No staging so far since move to new tape robot. No errors seen.

      DUNE

      1. Waiting for DUNE - CRIC development to take DUNE - ETF forward
      2. Waiting to see if the number of jobs coming in to RAL has increased.
    • 13:52 13:53
      VO Liaison Others 1m
    • 13:53 13:54
      Experiment Planning 1m
    • 13:54 13:55
      Dune/protoDune 1m
    • 13:55 13:56
      Euclid 1m
    • 13:56 13:57
      SKA 1m
    • 13:57 13:58
      AOB 1m
    • 13:58 13:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))