RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:00
      Major Incidents Changes
    • 1
      Summary of Operational Status and Issues
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB)), Kieran Howlett (STFC RAL)
    • 2
      GGUS /RT Tickets

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 3
      Site Availability

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:05
      Experiment Operational Issues
    • 4
      VO Liaison CMS
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      CMS monitoring showed a big drop in running slots on evening of the 17th/morning of the 18th, but the SI operator told me he was running tests again (that do not show up in CMS monitoring). Vande continued to show the ~12k slots running continuously. 

      CMS job efficiency has dropped off. We also notice some network saturation on LHCONE which is being tentatively blamed on CMS remote reads. There is correlation between the running slots decrease mentioned above and a temporary drop-off in network traffic. 

      One gateway had a problem yesterday that caused some red webdav and xrootd SAM tests. 

      RobH did a change to the RAL-based redirector: https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=481341 Nothing was working - needed an auth file adding (explained in the config file), then IPv6 address had not been added (aquilon had this config but somehow did not add it to the host). Now the redirector appears to be working...but Katy is monitoring the cms-aaa-manager01 logs to try to determine if that is properly in contact with the redirector. 

    • 5
      VO-Liaison ATLAS
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)

      Issue with Harvester central_B instance yesterday; afftected jobs UK sites (and IT, ES). 

      permissions issue with gw15 put ATLAS into test. HC unable to get out of test. 

      Currently forced RAL back online, and HC experts are investigating. 

       

      Tomorrow's ECHO DT set to <4hrs; ATLAS will carry on as usual (expect to go again into HC test). 

       

    • 6
      VO Liaison LHCb
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Tickets:

      • Vector read
        • Large-scale test has started on Monday
          • Patch is applied to 2017-dell tranche
          • Looks OK so far!
      • Environment variable removal request
        • Variable XrdSecGSISRVNAMES can not be removed
        • It's removal does not solve the original issue (warning message) completely
          • Some versions of xrootd will print the warning if the variable is missing
        • To prevent the warning one should remove XrdSecGSIDELEGPROXY variable instead
        • This variable is set in the LHCb environment, so T1 can not change it

      Operational issues

      • Upload failures due to gateway shut down last week
      • Upload failures due to gateway issue yesterday
      • Failed fts transfers between RAL and RU-Protvino-IHEP
    • 7
      VO Liaison LSST
      Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
    • 8
      VO Liaison Others
    • 13:31
      AOB
    • 9
      Any other Business
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))