RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Videoconference
RAL Tier1 Experiments Liaison Meeting
Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:00 13:01
      Major Incidents Changes 1m
    • 13:01 13:02
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 13:02 13:03
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:04 13:05
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:05 13:06
      Experiment Operational Issues 1m
    • 13:15 13:16
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Investigating failing tape transfers from RAL and elsewhere for CMS. Currently the finger is pointing at FNAL FTS which was recently upgraded. I did a test with CERN FTS on a subset of the same data and that has successful transfers where the FNAL FTS has none. The data is staging successfully from tape but then not being instructed by FTS to move to Echo (at least this is the current working theory). Steve Murray is looking at it. He says that FNAL FTS is mis-configured for Antares.

      Also on tape failures - a few CMS tapes were 'disabled' this week (they were re-enabled by the script, but still caused significant failures). Is this happening more than normal?

      Intermittent webdav SAM test failures in the last 2 days. Coincident with critical status on a number of gws: svc01/02, gw14/15 mainly.

       

    • 13:16 13:17
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)

      RAL in HC test overnight:
       - Stage out failures (svc02) and Rack power off triggered HC test failures. One of the HC tests stopped running, so RAL not put back online. 
      - Have forced site online, and following up; experts now have reinjected tests.

      BNL -> RAL (and CNAF) transfers over the OPN have been very slow for ~ 1 week. Problem appears to be on the BNL side however. 

      Accounting differences observered between the VO monitoring and WLCG accounting figures, starting ~ September. See attached plot.

      DNS issues reappeared on Sunday morning. Due (?) to TTL changes to webdav alias, observed fewer transfer failures during this period. 

       


       

    • 13:20 13:21
      VO Liaison LHCb 1m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
      •  Vector read status:
        • New patch was developed and applied on lcg2270. With several features:
          • atomic cache reads
          • caching layer
          • timeout increase for readv operations
          • async read operations disabled
        • So far looks good, but only 11 user jobs were executed there.
        • Old patch has the following results: 5 user jobs failed due to read erros, 792 user jobs executed successfully (0.6 percent failure rate). On the whole farm failure rate was approximately 1.7 percent for the same time period.
      • Dark data
        • Size of the dark data was identified, it is 877TB
        • Discussion is ongoing how to delete this data, it may be better to do it from the site's side
      • DNS issue
        • Reappeared last Sunday, affected LHCb significantly
      • Upload Failures
        • Multiple peaks of failed uploads since yesterday afternoon, seems to be related to the gateway overload
      • Low number of running jobs
        • The number of running LHCb jobs was low throughout the weekend, due to fs tuning
        • Recovered now

       

    • 13:25 13:28
      VO Liaison LSST 3m
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
    • 13:30 13:31
      VO Liaison Others 1m
    • 13:31 13:32
      AOB 1m
    • 13:32 13:33
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))