RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

    • 13:38 13:39
      Major Incidents Changes 1m
    • 13:39 13:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 13:40 13:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:41 13:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:42 13:43
      Experiment Operational Issues 1m
    • 13:44 13:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      * RT: #296706: Optimise XRootD checksums for TPC transfers from ECHO

       

      * Closed long-standing GGUS:145510 Stage-in/stage-out:
         Several contributing factors in improvements; change to pilot code appears to have reduced finally the job error rate.
        Still observe significance differences in errors rates between SSD and non-SSD (for the staging failures), but all at <1%

      * Sub-quota values;
       - Aim to remove additional layers of prioritisation with change to Atlas sub-group quotas. Did not work, other VO's manage to absorb slots in negotiation cycles?
       - ATLAS currently 65% (nominal 109%)
       - CMS 250%.  (nominal 160% )
       - LHCb 153% (nominal 140%)
       - ALICE 194% (nominal 4 – 140%)

      * ATLAS may exceed (overall) 2020 tape pledge
          * Will plan to clean up and  delete secondary tape replicas (when is convenient for RAL)?
         * Also discussions ongoing on file size.

      * ceph - xrootd plugin fchmod fix (GLA)

       

      * TPC Status
       - XrootD;  Smoke tests passing for RAL-LCG2(prod) and RAL-CEPH(test).
         stress test 
      - checksum 'fix' needed: (RT: #296706)

      - HTTP: Using the test gateway;
       - Main problems around authorisation with gridmap-file
       - setup with "http.gridmap",  works with Firefox download, but not curl or davix-get
       - (VOMS would use http.secxtractor libXrdHttpVOMS.so)
       - Trying a patched version of libXrdHttpVOMS.so (to use the grid map file), but no success yet

      -- Other possibility to enable VOMS for testing of http?
        - Is this complicated

       

       

       

    • 13:46 13:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      No particular problems with operations this week. Job efficiency has been a bit up and down, but I have not found a particular cause yet. 

      The CMS 'batch allotment' is not accepting jobs; Jose has been looking into it. 

      I have also been studying the SSD vs non-SSD jobs. SSDs appear to cause fewer failures due to error code associated with FileRead. Both WN types have similar failure rates with FileOpen type errors.

    • 13:48 13:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))

      LHCb:

      1. New ARC-6 CE testing ongoing
      2. ECHO streaming issue
        • Memory proxy has resulted in more jobs running, and a lower proportion of failures (but higher number of failures!). Failure reasons seem unchanged.
        • XRD_LOGLEVEL set to Debug on new jobs, which may give extra insight to the failures
        • Aiming to try no proxy (but still two servers) on friday, followed by no proxy with one gateway.

      DUNE:

      1. Move to CRIC for ETF tests : Waiting on development from Andrew McNab
      2. Changes to UCSD glide-in factory have been made (yesterday). Wait and see if we have better rates for jobs coming in to RAL.

       

    • 13:52 13:53
      VO Liaison Others 1m
    • 13:53 13:54
      Experiment Planning 1m
    • 13:54 13:55
      Dune/protoDune 1m
    • 13:55 13:56
      Euclid 1m
    • 13:56 13:57
      SKA 1m
    • 13:57 13:58
      AOB 1m
    • 13:58 13:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))