RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
Please attend via the following Zoom meeting:
https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09
-
-
13:38
→
13:39
Major Incidents Changes 1m
-
13:39
→
13:40
Summary of Operational Status and Issues 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
13:40
→
13:41
GGUS /RT Tickets 1m
https://tinyurl.com/T1-GGUS-Open
https://tinyurl.com/T1-GGUS-Closed -
13:41
→
13:42
Site Availability 1m
https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
-
13:42
→
13:43
Experiment Operational Issues 1m
-
13:44
→
13:45
VO-Liaison ATLAS 1mSpeakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
ATLAS needs to run more single-core analysis jobs
- https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397775- Will be direct IO; need for vectored reads
Did notice that 100% on Vande no longer corresponds to 100% *11.7/10 on Atlas monitoring (accounting for corepower difference). Obscured by current changes
- Some recent change to batch workers ?
- Some change to absolute Fairshare values ?Echo Read access for Oxford ATLAS XCache
- https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397191TPC-http
- Bespoke checksum script on Test Gateway to return checksum
- Return of the '//' macaroon path normalisation issue. -
13:46
→
13:47
VO Liaison CMS 1mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
CMS is running 'at pledge' due to being limited for LHCb to be fixed and they are now running 200% of their pledge. Most CMS-only nodes are empty.
SAM tests looking much better this week. No change or fix was applied. However, I see a large number of job failures and very low efficiency. The failures are mostly FileOpen or FileRead. I have an example 'step chain' job to try - i.e. a multi-step job. I want to try this on one of the empty CMS-only nodes, hopefully this week.
After talking to Chris Brew, we think there is a problem with the /etc/hosts file for the CMS docker config. He says you can't do this with the same IP address:
172.28.1.1 xrootd.echo.stfc.ac.uk
172.28.1.1 ceph-gw10.gridpp.rl.ac.uk
172.28.1.1 ceph-gw11.gridpp.rl.ac.uk
He said I should ask for a change to:
172.28.1.1 xrootd.echo.stfc.ac.uk ceph-gw10.gridpp.rl.ac.uk ceph-gw11.gridpp.rl.ac.uk
-
13:48
→
13:49
VO Liaison LHCb 1mSpeaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
LHCb
- Low number of running jobs
- https://ggus.eu/?mode=ticket_info&ticket_id=150679
- Seems fixed after limits put on CMS, ATLAS
- Not permanent solution, but this seems to have allowed LHCb jobs to be picked up by batch system (???)
- ECHO streaming issue
- Waiting for release of fix to vector reads
- Timescale?
- Trying to understand discrepancy between storage used reported by RAL vs DIRAC
- Currently 20% discrepancy - big since 2019 (LHCb move to ECHO)
- Date : DIRAC vs RAL (Grafana)
- 31/12/2020: 5.61 vs 6.46PB
31/12/2019: 5.62 vs 6.42PB
31/12/2018: 4.55 vs 4.54PB
08/02/2018: 4.13 vs 4.10PB
31/12/2016: 3.61 vs 3.09PB
18/08/2016: 3.17 vs 3.22PB
31/12/2015: 3.13 vs 3.12PB
25/08/2015: 2.30 vs 2.34PB
18/01/2015: 2.23 vs 2.28PB
DUNE
- Normal operations
- Testing dynafed access to RAL storage to transfer data between RAL and Fermilab
- Is dynafed supported?
- Or other protocols supporting http(s)?
- Low number of running jobs
-
13:52
→
13:53
VO Liaison Others 1m
-
13:53
→
13:54
Experiment Planning 1m
-
13:54
→
13:55
Dune/protoDune 1m
-
13:55
→
13:56
Euclid 1m
-
13:56
→
13:57
SKA 1m
-
13:57
→
13:58
AOB 1m
-
13:58
→
13:59
Any other Business 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
-
13:38
→
13:39