RAL Tier1 Experiments Liaison Meeting

Name: RAL Tier1 Experiments Liaison Meeting
Start: 2021-03-10T12:30:00+00:00
End: 2021-03-10T13:30:00+00:00
Location: RAL R89

Wednesday 10 Mar 2021, 12:30 → 13:30 Europe/London

Access Grid (RAL R89)

Access Grid

RAL R89

Description

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

- 12:38 → 12:39
  
  Major Incidents Changes 1m
- 12:39 → 12:40
  
  Summary of Operational Status and Issues 1m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
  
  RT1EL-20210310.docx
  
  RT1EL-20210310.pdf
- 12:40 → 12:41
  
  GGUS /RT Tickets 1m
  
  https://tinyurl.com/T1-GGUS-Open
  https://tinyurl.com/T1-GGUS-Closed
- 12:41 → 12:42
  
  Site Availability 1m
  
  https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
  
  https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
  
  http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
- 12:42 → 12:43
  
  Experiment Operational Issues 1m
- 12:44 → 12:45
  
  VO-Liaison ATLAS 1m
  
  Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
  
  * ATLAS needs to run more single-core analysis jobs
  - https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397775
  
  * ATLAS hostname env for WN containers
  - https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=398494
  
  * Oxford Xcache; Done on RAL side
  - https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=397191
  
  Discrepancy between Vande 100% CPU (for ATLAS) and ATLAS Monitoring (cf. Vande * 11.7/10).
  - to be understood
  
  ATLAS slowly increasing Single-core running jobs (to ~ 3k).
  
  Vector Reads:
  CMS Sam test code can run on gw683 and gw691:
  - See at what frequency problem can be triggered;
  - In parallel try some lower-level tests
- 12:46 → 12:47
  
  VO Liaison CMS 1m
  
  Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
  
  SAM tests are ok, just occasional failures. Transfers seem fine.
  
  However, real jobs are failing at a very high rate. Efficiency is extremely low. CMS L1s have asked me to organise stopping Processing-type jobs running at RAL, as these are the culprits. Failures are 60-80% and efficiencies are <1% for many jobs. These jobs mostly fail with FileOpen or File Read.
  
  I changed the redirector fallback from the UK alias to the European alias. This seemed to reduce the number of FileOpen errors (the total number of failures remained high - FileOpen errors were replaced by FileRead errors).
- 12:48 → 12:49
  
  VO Liaison LHCb 1m
  
  Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
- 12:52 → 12:53
  
  VO Liaison Others 1m
- 12:53 → 12:54
  
  Experiment Planning 1m
- 12:54 → 12:55
  
  Dune/protoDune 1m
- 12:55 → 12:56
  
  Euclid 1m
- 12:56 → 12:57
  
  SKA 1m
- 12:57 → 12:58
  
  AOB 1m
- 12:58 → 12:59
  
  Any other Business 1m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))