RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:00 13:01
      Major Incidents Changes 1m
    • 13:01 13:02
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB)), Kieran Howlett (STFC RAL)
    • 13:02 13:03
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:04 13:05
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:05 13:06
      Experiment Operational Issues 1m
    • 13:15 13:16
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      CMS monitoring showed a big drop in running slots on evening of the 17th/morning of the 18th, but the SI operator told me he was running tests again (that do not show up in CMS monitoring). Vande continued to show the ~12k slots running continuously. 

      CMS job efficiency has dropped off. We also notice some network saturation on LHCONE which is being tentatively blamed on CMS remote reads. There is correlation between the running slots decrease mentioned above and a temporary drop-off in network traffic. 

      One gateway had a problem yesterday that caused some red webdav and xrootd SAM tests. 

      RobH did a change to the RAL-based redirector: https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=481341 Nothing was working - needed an auth file adding (explained in the config file), then IPv6 address had not been added (aquilon had this config but somehow did not add it to the host). Now the redirector appears to be working...but Katy is monitoring the cms-aaa-manager01 logs to try to determine if that is properly in contact with the redirector. 

    • 13:16 13:17
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)

      Issue with Harvester central_B instance yesterday; afftected jobs UK sites (and IT, ES). 

      permissions issue with gw15 put ATLAS into test. HC unable to get out of test. 

      Currently forced RAL back online, and HC experts are investigating. 

       

      Tomorrow's ECHO DT set to <4hrs; ATLAS will carry on as usual (expect to go again into HC test). 

       

    • 13:20 13:21
      VO Liaison LHCb 1m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Tickets:

      • Vector read
        • Large-scale test has started on Monday
          • Patch is applied to 2017-dell tranche
          • Looks OK so far!
      • Environment variable removal request
        • Variable XrdSecGSISRVNAMES can not be removed
        • It's removal does not solve the original issue (warning message) completely
          • Some versions of xrootd will print the warning if the variable is missing
        • To prevent the warning one should remove XrdSecGSIDELEGPROXY variable instead
        • This variable is set in the LHCb environment, so T1 can not change it

      Operational issues

      • Upload failures due to gateway shut down last week
      • Upload failures due to gateway issue yesterday
      • Failed fts transfers between RAL and RU-Protvino-IHEP
    • 13:25 13:28
      VO Liaison LSST 3m
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
    • 13:30 13:31
      VO Liaison Others 1m
    • 13:31 13:32
      AOB 1m
    • 13:32 13:33
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))