RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Description

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

 

    • 13:38 13:39
      Major Incidents Changes 1m
    • 13:39 13:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 13:40 13:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:41 13:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:42 13:43
      Experiment Operational Issues 1m
    • 13:44 13:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
    • 13:46 13:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      The SAM tests are making our site availability red for a few days this week. I believe this is down to a combination of 2 factors:

      1. Sometimes the tests on the arc-ces are not reported so there is a blank for a couple of slots, and sometimes several of these happen at the same time. If another arc-ce happens to fail at the time, the overall SAM status will be red. If all the tests were running as normal and some of them were green the overall status would be green since it only takes one arc-ce to be green in any one time slot for a green overall status. There is a perhaps related ticket describing the missing tests: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150482

      2. There may be a problem with the RAL-based redirector xrootd-cms-uk.gridpp.. I saw that the current log file was growing to more than 20GB for less than 24 hours running. The machine was running out of space. I cleaned it up but I will need to do this regularly with the level of logging, or vastly decrease the time logs can live on the machine. Concerning the amount of logged output, the theory at the moment is that when a request comes in, this is generally satisfied right away. However, the log continues with repeats of the 'do_have' query seemingly after a successful transfer. I chose one file at random and in the log I was checking (16th Feb) there were 10k references to that file. The proxy log contains ~8 requests (almost all from different sites) during the same period, all of which appear successful. 

      Last week I completed the clean-up from the consistency check on the tape. I deleted 111k files which may have failed to delete last summer (July 2020?).

    • 13:48 13:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
    • 13:52 13:53
      VO Liaison Others 1m
    • 13:53 13:54
      Experiment Planning 1m
    • 13:54 13:55
      Dune/protoDune 1m
    • 13:55 13:56
      Euclid 1m
    • 13:56 13:57
      SKA 1m
    • 13:57 13:58
      AOB 1m
    • 13:58 13:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))