RAL Tier1 Experiments Liaison Meeting

Name: RAL Tier1 Experiments Liaison Meeting
Start: 2021-03-03T13:30:00+00:00
End: 2021-03-03T14:30:00+00:00
Location: RAL R89

Wednesday 3 Mar 2021, 13:30 → 14:30 Europe/London

Access Grid (RAL R89)

Access Grid

RAL R89

Description

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

- 13:38
  
  Major Incidents Changes
- 1
  
  Summary of Operational Status and Issues
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
  
  RT1EL-20210303.docx
  
  RT1EL-20210303.pdf
- 2
  
  GGUS /RT Tickets
  
  https://tinyurl.com/T1-GGUS-Open
  https://tinyurl.com/T1-GGUS-Closed
- 3
  
  Site Availability
  
  https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
  
  https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
  
  http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
- 13:42
  
  Experiment Operational Issues
- 4
  
  VO-Liaison ATLAS
  
  Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
  
  20210303_ipv46_update.pdf
  
  ATLAS needs to run more single-core analysis jobs
  - https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397775
  
  - Will be direct IO; need for vectored reads
  
  Did notice that 100% on Vande no longer corresponds to 100% *11.7/10 on Atlas monitoring (accounting for corepower difference). Obscured by current changes
  - Some recent change to batch workers ?
  - Some change to absolute Fairshare values ?
  
  Echo Read access for Oxford ATLAS XCache
  - https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397191
  
  TPC-http
  - Bespoke checksum script on Test Gateway to return checksum
  - Return of the '//' macaroon path normalisation issue.
- 5
  
  VO Liaison CMS
  
  Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
  
  CMS is running 'at pledge' due to being limited for LHCb to be fixed and they are now running 200% of their pledge. Most CMS-only nodes are empty.
  
  SAM tests looking much better this week. No change or fix was applied. However, I see a large number of job failures and very low efficiency. The failures are mostly FileOpen or FileRead. I have an example 'step chain' job to try - i.e. a multi-step job. I want to try this on one of the empty CMS-only nodes, hopefully this week.
  
  After talking to Chris Brew, we think there is a problem with the /etc/hosts file for the CMS docker config. He says you can't do this with the same IP address:
  
  172.28.1.1 xrootd.echo.stfc.ac.uk
  
  172.28.1.1 ceph-gw10.gridpp.rl.ac.uk
  
  172.28.1.1 ceph-gw11.gridpp.rl.ac.uk
  
  He said I should ask for a change to:
  
  172.28.1.1 xrootd.echo.stfc.ac.uk ceph-gw10.gridpp.rl.ac.uk ceph-gw11.gridpp.rl.ac.uk
- 6
  VO Liaison LHCb
  
  Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
  LHCb
  
  Low number of running jobs
  
  https://ggus.eu/?mode=ticket_info&ticket_id=150679
  
  Seems fixed after limits put on CMS, ATLAS
  
  Not permanent solution, but this seems to have allowed LHCb jobs to be picked up by batch system (???)
  
  ECHO streaming issue
  
  Waiting for release of fix to vector reads
  
  Timescale?
  
  Trying to understand discrepancy between storage used reported by RAL vs DIRAC
  
  Currently 20% discrepancy - big since 2019 (LHCb move to ECHO)
  
  Date : DIRAC vs RAL (Grafana)
  
  31/12/2020: 5.61 vs 6.46PB
  31/12/2019: 5.62 vs 6.42PB
  31/12/2018: 4.55 vs 4.54PB
  08/02/2018: 4.13 vs 4.10PB
  31/12/2016: 3.61 vs 3.09PB
  18/08/2016: 3.17 vs 3.22PB
  31/12/2015: 3.13 vs 3.12PB
  25/08/2015: 2.30 vs 2.34PB
  18/01/2015: 2.23 vs 2.28PB
  
  DUNE
  
  Normal operations
  
  Testing dynafed access to RAL storage to transfer data between RAL and Fermilab
  
  Is dynafed supported?
  
  Or other protocols supporting http(s)?
- 7
  
  VO Liaison Others
- 13:53
  
  Experiment Planning
- 8
  
  Dune/protoDune
- 9
  
  Euclid
- 10
  
  SKA
- 13:57
  
  AOB
- 11
  
  Any other Business
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))