RAL Tier1 Experiments Liaison Meeting

Name: RAL Tier1 Experiments Liaison Meeting
Start: 2023-01-25T12:30:00+00:00
End: 2023-01-25T14:30:00+00:00
Location: RAL R89

Wednesday 25 Jan 2023, 12:30 → 14:30 Europe/London

Access Grid (RAL R89)

Access Grid

RAL R89

66811541532

Alastair Dewhurst

Join via phone

- 13:00 → 13:01
  
  Major Incidents Changes 1m
- 13:01 → 13:02
  
  Summary of Operational Status and Issues 1m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
  
  Weekly Report 25 January 2023 (1).docx
  
  Weekly Report 25 January 2023 (1).pdf
- 13:02 → 13:03
  
  GGUS /RT Tickets 1m
  
  https://tinyurl.com/T1-GGUS-Open
  https://tinyurl.com/T1-GGUS-Closed
- 13:04 → 13:05
  
  Site Availability 1m
  
  https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
  
  https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
  
  http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
- 13:05 → 13:06
  
  Experiment Operational Issues 1m
- 13:15 → 13:16
  
  VO Liaison CMS 1m
  
  Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
  
  Katy planning to remove gsiftp tests from contribution to SAM status.
  
  CMS saw spikes in failures on writes to Echo on the 18th and 23rd during the DNS issues. Also SAM status failed on those days due to storage tests (gsiftp and webdav).
  
  Large numbers of (Processing type) jobs failing, but this is reflected at other sites.
- 13:16 → 13:17
  
  VO-Liaison ATLAS 1m
  
  Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
  
  GGUS: 160156
  The last DNS outage 'reset' most transfers. As of this morning, 33k submitted transfers to write into Echo, + 12k files being recalled from Antares (via Echo).
  Very few failures (O(100)) in the last 24hrs where the source file had been evicted prior to transfer; We can (hopefully) resolve the ticket this afternoon if no further issues arise.
  
  DNS Failed name resolution from external hosts:
  Last Wednesday, AM, and Monday evening. ~ 220k transfers failed (not started).
  GOCDB was also affected (any other ancillary services?).
- 13:20 → 13:21
  VO Liaison LHCb 1m
  
  Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
  
  dc_review.pdf
  Reading part of the tape challenge finished last week. Results look promising -- expected throughput was 1.93GB/s, we achieved ~ 1GB/s more than this.
  There was a major LHCb Dirac update on Monday, which introduced some issues. Recovered within several hours. There were lot of failed jobs due to this.
  Low number of running LHCb jobs due to insufficient number of production requests.
  Consistency check identified some dark and lost data. Dark data was removed, lost files were re-replicated by (all data operations were done by the LHCb Computing team).
  
  Tickets:
  Slow checksums (stats):
  Still waiting
  Deletion problems
  Solved
  Problems with simultaneous access to the same file on ECHO
  On hold, tests are ongoing at Glasgow
  Vector read.
  One more test: what happens with the LHCb applications is vector read requests returns "wrong" (i.e. not the one that was requested) data. This was tested (the same patch, but once in a 1000 vector reads it shifts one of the requested chunks by 1 byte), and it seems like the application crashes.
  Dedicated patched WN for production LHCb jobs is being prepared.
- 13:25 → 13:28
  
  VO Liaison LSST 3m
  
  Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
- 13:30 → 13:31
  
  VO Liaison Others 1m
- 13:31 → 13:32
  
  AOB 1m
- 13:32 → 13:33
  
  Any other Business 1m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))

Choose timezone

RAL Tier1 Experiments Liaison Meeting

Access Grid

RAL R89