RAL Tier1 Experiments Liaison Meeting

Name: RAL Tier1 Experiments Liaison Meeting
Start: 2022-03-30T12:30:00+01:00
End: 2022-03-30T13:50:00+01:00
Location: RAL R89

Wednesday 30 Mar 2022, 12:30 → 13:50 Europe/London

Access Grid (RAL R89)

Access Grid

RAL R89

Videoconference

RAL Tier1 Experiments Liaison Meeting

Zoom Meeting ID: 66811541532
Host: Alastair Dewhurst
Useful links: Join via phone
Zoom URL

- 12:38 → 12:39
  
  Major Incidents Changes 1m
- 12:39 → 12:40
  
  Summary of Operational Status and Issues 1m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
  
  Weekly Report 30 March 2022 (1).docx
  
  Weekly Report 30 March 2022 (2).docx
  
  Weekly Report 30 March 2022 (2).pdf
- 12:40 → 12:41
  
  GGUS /RT Tickets 1m
  
  https://tinyurl.com/T1-GGUS-Open
  https://tinyurl.com/T1-GGUS-Closed
- 12:41 → 12:42
  
  Site Availability 1m
  
  https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
  
  https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
  
  http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
- 12:42 → 12:43
  
  Experiment Operational Issues 1m
- 12:44 → 12:45
  
  VO-Liaison ATLAS 1m
  
  Minutes
  
  Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
  
  New pledge values from April 1st
  
  Xrootd RPM deployment:
  
  - Dev ceph cluster is down
  - VMs prevented from access to prod cluster
  
  Echo Downtime:
  - Batch farm to stop new submissions from tonight
  - Want to take opportunity to switch more jobs to Harvester and multi-job pilots
  - Tape access expected to multihop via Cern for the period.
  
  Antares:
  - Delaying T0 export retest until MGM 'fix' is confirmed
  - Lots of (~25%) Operation Expired errors due to antares-tpc01 xrootd service; affecting writes to Antares
  - Might not explain the Recall errors (same error message).
- 12:45 → 12:46
  
  VO Liaison CMS 1m
  
  Minutes
  
  Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
  
  SAM tests are failing due to webdav tests.
  
  Tape Challenge - it sounds like some fraction of the data chosen to be recalled for tape challenge may be on broken/stuck tapes, and this is due to be fixed by external engineers this afternoon (30 March).
  
  Recalls in the tape challenge probably also affected by other factors -
  
  1. Upgrade of EOS required to fix problem (if one missing file in an FTS batch of requests fails all requests in the batch fail with a 'this file doesn't exist' type error).
  
  2. Upgrade of Rucio to forthcoming 1.28 required to fix another problem (when resubmissions are triggered, Rucio is no longer aware that multihop jobs consist of two, coupled jobs).
  
  3. To be confirmed - possible problem with server certificate in CMS-Rucio which may have expired, and might explain the inability of Rucio to cancel FTS requests (I have 2 examples of how I think this was broken).
  
  Job efficiencies are ok, a bit below average.
- 12:50 → 12:51
  
  VO Liaison LHCb 1m
  
  Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
- 12:55 → 12:58
  
  VO Liaison LSST 3m
  
  Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
- 13:00 → 13:01
  
  VO Liaison Others 1m
- 13:05 → 13:06
  
  Experiment Planning 1m
- 13:10 → 13:11
  
  Euclid 1m
- 13:15 → 13:16
  
  SKA 1m
- 13:20 → 13:30
  
  Dune/protoDune 10m
- 13:30 → 13:31
  
  AOB 1m
- 13:35 → 13:36
  
  Any other Business 1m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))

Choose timezone

RAL Tier1 Experiments Liaison Meeting

Access Grid

RAL R89

Share this page

Direct link

Social networks

Calendaring