RAL Tier1 Experiments Liaison Meeting

Name: RAL Tier1 Experiments Liaison Meeting
Start: 2021-01-20T12:30:00+00:00
End: 2021-01-20T13:30:00+00:00
Location: RAL R89

Wednesday 20 Jan 2021, 12:30 → 13:30 Europe/London

Access Grid (RAL R89)

Access Grid

RAL R89

Description

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

- 12:38 → 12:39
  
  Major Incidents Changes 1m
- 12:39 → 12:40
  
  Summary of Operational Status and Issues 1m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
  
  RT1EL-20210120.docx
  
  RT1EL-20210120.pdf
- 12:40 → 12:41
  
  GGUS /RT Tickets 1m
  
  https://tinyurl.com/T1-GGUS-Open
  https://tinyurl.com/T1-GGUS-Closed
- 12:41 → 12:42
  
  Site Availability 1m
  
  https://lcgwww.gridpp.rl.ac.uk/utils/availchart/
  
  https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
  
  http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden
- 12:42 → 12:43
  
  Experiment Operational Issues 1m
- 12:44 → 12:45
  
  VO-Liaison ATLAS 1m
  
  Minutes
  
  Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))
  
  * ATLAS in drain since 0500 (no new jobs since around 0330):
  AREX issue? Resolved 1100?
  - Follow up on why it affects all CE's
  
  * RAL (unified queue) subsequently (and currently) set into TEST by HC:
  """Diag from worker : Condor HoldReason: None ; Condor RemoveReason: removed by SYSTEM_PERIODIC_REMOVE due to job remote status outdated time exceeded (3600*4)."""
- 12:46 → 12:47
  
  VO Liaison CMS 1m
  
  Minutes
  
  Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
  
  CMS went into drain a week ago, this was fixed on Thursday by reverting a change that was originally made in Oct, but it's still a mystery what happened exactly.
  
  https://ggus.eu/index.php?mode=ticket_info&ticket_id=150207
  
  I am seeing some odd dips in running cores in the last 2 days plus today. We appear to drop 10-20% of cores in the monit-grafana, however this is not observed in Vande.
  
  Otherwise, we are running at 400% of pledge due to ATLAS problems with ARC-CE 01 overnight.
  
  I am seeing SAM test failures appearing on all ARC-CEs, of type xrootd. Possibly related to getting files from Echo? I need to investigate.
  
  The debug/loadtest tests for the tape have stopped again. I can see in the Site Readiness than the FTS status has been corrected. We are still receiving no new transfers to tape as we are at pledge. I asked if any deletions could be done (in anticipation of the tape migration to Spectra in about one month) but no reply yet. I also provided the file dump to those doing the consistency checking, but no response either.
  
  No update on the network changes planned by DI.
- 12:48 → 12:49
  
  VO Liaison LHCb 1m
  
  Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
- 12:52 → 12:53
  
  VO Liaison Others 1m
- 12:53 → 12:54
  
  Experiment Planning 1m
- 12:54 → 12:55
  
  Dune/protoDune 1m
- 12:55 → 12:56
  
  Euclid 1m
- 12:56 → 12:57
  
  SKA 1m
- 12:57 → 12:58
  
  AOB 1m
- 12:58 → 12:59
  
  Any other Business 1m
  
  Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))

Choose timezone

RAL Tier1 Experiments Liaison Meeting

Access Grid

RAL R89

Please attend via the following Zoom meeting:

https://ukri.zoom.us/j/98562731547?pwd=UU9Wb2xCL05tWmROT1h6SUlWdUJ3dz09

Share this page

Direct link

Social networks

Calendaring