RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
14:00
→
14:01
Major Incidents Changes 1m
-
14:01
→
14:02
Summary of Operational Status and Issues 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore, Kieran Howlett (STFC RAL)
-
14:02
→
14:03
GGUS /RT Tickets 1m
https://tinyurl.com/T1-GGUS-Open
https://tinyurl.com/T1-GGUS-Closed -
14:04
→
14:05
Site Availability 1m
https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
-
14:05
→
14:06
Experiment Operational Issues 1m
-
14:15
→
14:16
VO Liaison CMS 1mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
No operational issues to report over the long (Easter) weekend).
Ceph-gw10 (AAA proxy machine) was migrated to new network but started failing certificate tests (whilest token tests are green). Ceph-gw11 is on the list to be migrated too, but we need to fix gw10 first. AAA still 'works' at RAL with one proxy machine, but without it we will go into drain.
Pilot overloading turned on Tuesday 26th March (not earlier, as previously - incorrectly - reported). Only a couple of days of data available from Marco so far, due to queuing and pilots lasting 48 hours. arc-ce01, 02, 03 were overloaded.
- the 29th, 85 vs 94% efficiency
- on the 29th we have 92 vs 100% efficiency
NOTE: These numbers are not consistent with the job efficiency in the usual grafana plots - INVESTIGATING
A few things on the CMS to-do list:
- Migration away from gridftp. The biggest element of this is the uploads to Echo from WNs.
- Adoption of the Tape REST API
- Removal of lazy-download option..?
-
14:16
→
14:17
VO-Liaison ATLAS 1mSpeakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
- GGUS:165529:
- T0-T1 test repetition (RAL-LCG2); in progress.
- Queries have been addressed by RAL.
- The tests are done and documented. What's next for this ticket?
- All the T1 GGUS tickets are still there, none of them has been closed.
- Reminder – T1 written reports on DC24: sites should submit the reports to DOMA. The deadline is 26 April 2024.
- Who is/are going to submit the RAL report?
- The internal documentation is here.
- The corresponding GridPP51 presentations.
- RAL new pledge (216 K --> 239 K) is there now, job accounting link.
- Upcoming downtime @QMUL during 24-29 May.
- Power testing work in the building that houses QMUL cluster. In addition, some network work and tests of the cluster power off and on procedure will be carried out.
- Good time to point satellite sites to new homes before the main down time in July/August: RHUL, IC, Brunel, SUSX, OX to RAL.
- GGUS:165529:
-
14:20
→
14:21
VO Liaison LHCb 1mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
LHCb:
- Lost files after DC24
- Still not very clear who is to blame: antares or FTS
- It would be great to have more deailed logging on antares
- Now, for example, it is difficult to identify client's host for particular request
- Dark data on ECHO
- has been removed
- Vector read optimisation
- "No prefetch" change passed CC, to be rolled-out soon
- Do other VOs approve the change as well?
- Token authentication (GGUS 165051)
- Fix is rolled-out, to be tested
- Currently i can not generate LHCb tokens, trying to restore this ability
- Fix is rolled-out, to be tested
- New pledges: Antares updated, ECHO is not updated yet
Alice:- TPC test issue
- No issues since the application of the fix (turning on TCP keepalive)
- Lost files after DC24
-
14:25
→
14:28
VO Liaison LSST 3mSpeaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
-
14:30
→
14:31
VO Liaison Others 1m
-
14:31
→
14:32
AOB 1m
-
14:32
→
14:33
Any other Business 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
14:00
→
14:01