RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:00 13:01
      Major Incidents Changes 1m
    • 13:01 13:02
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB)), Kieran Howlett (STFC RAL)
    • 13:02 13:03
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:04 13:05
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:05 13:06
      Experiment Operational Issues 1m
    • 13:15 13:16
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Some hiccups with Echo/gateways in the last couple of days are visible in the SAM tests. Rob A is currently blaming a bad OSD.

      CMS currently capped at 6k cores due to the same issue described earlier with network saturation. 

    • 13:16 13:17
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
      • HC tests have not been very good recently, especially with the 1184 template (plot attached).

       

      • The  Storage Resource Reporting (SRR) is still not correct: "Update Echo SRR with FY23/24 pledges" (https://stfc.atlassian.net/browse/CEPH-76) 

       

      • There are occurrences of ATLAS job failures due to  "failed to close file descriptor: bad file descriptor" (https://stfc.atlassian.net/browse/GS-131)
    • 13:20 13:21
      VO Liaison LHCb 1m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Tickets:

      1. Request to move from service cert to host cert on the VO-box
        • Waiting for the security team
      2. Pilots are killed due to memory limit being exceeded
        • Memory limit is enforced since 17th of May
        • Condor's ResidentSetSize seems to be too "strict" according to its description
        • Would it be possible to remove the limit?
      3. ETF tests are failing since 17th of May
        • 4GiB allocation is tested, job's limit is 3GiB
      4. Vector read
        • Configuration that was applied to lcg2268 differs from the desired one
          • Vector reads were still executed sequentially
          • Should be fixed now

      Operational issues:

      • A lot of failed uploads due to the gateway problems.
      • ~150 files are lost as a result of the ECHO incident.
      • PIC is experiencing routing issues, all transfers PIC <-> RAL are failing
      • New cvmfs endpoints were added to LHCb's VO card:
        • The following endpoints should be accessible:
          • /cvmfs/lhcb.cern.ch
          • /cvmfs/lhcb-condb.cern.ch
          • /cvmfs/lhcbdev.cern.ch
          • /cvmfs/unpacked.cern.ch
          • /cvmfs/cernvm-prod.cern.ch
    • 13:25 13:28
      VO Liaison LSST 3m
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
    • 13:30 13:31
      VO Liaison Others 1m
    • 13:31 13:32
      AOB 1m
    • 13:32 13:33
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))