RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:00
→
13:01
Major Incidents Changes 1m
-
13:01
→
13:02
Summary of Operational Status and Issues 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB)), Kieran Howlett (STFC RAL)
-
13:02
→
13:03
GGUS /RT Tickets 1m
https://tinyurl.com/T1-GGUS-Open
https://tinyurl.com/T1-GGUS-Closed -
13:04
→
13:05
Site Availability 1m
https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
-
13:05
→
13:06
Experiment Operational Issues 1m
-
13:15
→
13:16
VO Liaison CMS 1mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
CMS monitoring showed a big drop in running slots on evening of the 17th/morning of the 18th, but the SI operator told me he was running tests again (that do not show up in CMS monitoring). Vande continued to show the ~12k slots running continuously.
CMS job efficiency has dropped off. We also notice some network saturation on LHCONE which is being tentatively blamed on CMS remote reads. There is correlation between the running slots decrease mentioned above and a temporary drop-off in network traffic.
One gateway had a problem yesterday that caused some red webdav and xrootd SAM tests.
RobH did a change to the RAL-based redirector: https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=481341 Nothing was working - needed an auth file adding (explained in the config file), then IPv6 address had not been added (aquilon had this config but somehow did not add it to the host). Now the redirector appears to be working...but Katy is monitoring the cms-aaa-manager01 logs to try to determine if that is properly in contact with the redirector.
-
13:16
→
13:17
VO-Liaison ATLAS 1mSpeakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
Issue with Harvester central_B instance yesterday; afftected jobs UK sites (and IT, ES).
permissions issue with gw15 put ATLAS into test. HC unable to get out of test.
Currently forced RAL back online, and HC experts are investigating.
Tomorrow's ECHO DT set to <4hrs; ATLAS will carry on as usual (expect to go again into HC test).
-
13:20
→
13:21
VO Liaison LHCb 1mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
Tickets:
- Vector read
- Large-scale test has started on Monday
- Patch is applied to 2017-dell tranche
- Looks OK so far!
- Large-scale test has started on Monday
- Environment variable removal request
- Variable XrdSecGSISRVNAMES can not be removed
- It's removal does not solve the original issue (warning message) completely
- Some versions of xrootd will print the warning if the variable is missing
- To prevent the warning one should remove XrdSecGSIDELEGPROXY variable instead
- This variable is set in the LHCb environment, so T1 can not change it
Operational issues
- Upload failures due to gateway shut down last week
- Upload failures due to gateway issue yesterday
- Failed fts transfers between RAL and RU-Protvino-IHEP
- Vector read
-
13:25
→
13:28
VO Liaison LSST 3mSpeaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
-
13:30
→
13:31
VO Liaison Others 1m
-
13:31
→
13:32
AOB 1m
-
13:32
→
13:33
Any other Business 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
13:00
→
13:01