US ATLAS Tier 2 Technical

Name: US ATLAS Tier 2 Technical
Start: 2026-04-29T11:00:00-04:00
End: 2026-04-29T12:00:00-04:00
Location: No location set

Wednesday 29 Apr 2026, 11:00 → 12:00 US/Eastern

Fred Luehring (Indiana University (US)), Rafael Coelho Lopes De Sa (University of Massachusetts (US)), Shawn Mc Kee (University of Michigan (US))

Description

Meeting to discuss technical issues at the US ATLAS Tier 2 site. The primary audience is the US Tier 2 site administrators but anyone interested is welcome to attend.

Fred Luehring

luehring@iu.edu

+1 812 855 1025

67453565657

Fred Luehring

Join via phone

- 11:00 → 11:10
  Introduction 10m
  
  Speakers: Fred Luehring (Indiana University (US)), Rafael Coelho Lopes De Sa (University of Massachusetts (US))
  News:
  
  Keep the capacity and services spreadsheets updated. Keep CRIC and OSG topology updated when servers are added or retired.
  
  AMD CPU benchmarking ongoing by Fred.
  
  Upcoming meetings:
  
  dCache workshop [May 6th-7th at NIKHEF]
  
  CHEP 2026 [May 23rd-29th in Bangkok, Thailand]
  
  HTC2026 [June 9th-12th in Madison, Wisconsin] - US ATLAS face-to-face on Tuesday and Wednesday (June 9th and 10th).
  
  ATLAS S&C week #84: end of June, more information to come
  
  Open tickets:
  
  Infrastructure tickets [all on-hold until the next downtime where CRIC and RUCIO will be updated]
  
  ggus:1002243 MWT2: RSE basepath prefix
  
  ggus:1002244 NET2: RSE basepath prefix
  
  ggus:1002233 AGLT2: RSE basepath prefix
  
  ggus:1001568 SWT2/OU: xrootd version higher than 5.7.0 needed
  
  ggus:3559 SWT2/OU: Dual-stack
  
  Operations:
  
  Site production during the previous 2 weeks: AGLT2, MWT2, NET2, SWT2 (CPB, OU), TW
  
  TW
  
  AGLT2
  
  MWT2
  
  NET2
  
  SWT2/CPB
  
  SWT2/OU
- 11:10 → 11:20
  TW-FTT 10m
  
  Speakers: Eric Yen, Felix.hung-te Lee (Academia Sinica (TW)), Yi-Ru Chen (Academia Sinica (TW))
  Scheduled downtime: site shutdown from 1:00 on 28 April. (UTC) because of high-voltage switchgear for maintenance.
- 11:20 → 11:30
  AGLT2 10m
  
  Speakers: Daniel Hayden (Michigan State University (US)), Philippe Laurens (Michigan State University (US)), Shawn Mc Kee (University of Michigan (US)), Dr Wendy Wu (University of Michigan)
  deleted 105TB dark data from datadisk, including rucio, SAM, DUMPS directories under datadisk
  
  prepare for the downtime 4/30 9AM-14:00PM
  
  plan to update the firmware and kernel for all the work nodes and storage nodes, and reboot them
  
  the new release of dcache is not yet out, but we will continue with the downtime for planned work, and do dcache update another time without downtime.
- 11:30 → 11:40
  MWT2 10m
  
  Speakers: Aidan Rosberg (Indiana University (US)), David Jordan (University of Chicago (US)), Farnaz Golnaraghi (University of Chicago (US)), Fengping Hu (University of Chicago (US)), Fred Luehring (Indiana University (US)), Judith Lorraine Stephen (University of Chicago (US)), Robert William Gardner Jr (University of Chicago (US))
  UIUC PM on 04/15/2026
  
  dCache pools overloaded on 04/17/2026. It was mainly from one user jobs
  
  Set offline briefly on 04/19/2026. A few worker switches maxed out briefly
  
  IU Networking updated campus to LHCONE VRF BGP peerings on 04/21/2026
  
  dCache upgrade is planned for 05/04, assuming the patched version is released Thursday
  
  Dark data on 04/28. Hiro's tests in /pnfs/uchicago.edu/atlasdatadisk/hiro/DAVS. Cleaned up and down to less than 10TB
- 11:40 → 11:50
  
  NET2 10m
  
  Speakers: Eduardo Bach (University of Massachusetts (US)), Rafael Coelho Lopes De Sa (University of Massachusetts (US)), William Axel Leight (University of Massachusetts Amherst)
  
  Smooth running except that a power sag on the 19th at the MGHPCC caused the cooling to shut down, putting us into an unscheduled downtime. We recovered fine, though we did find a bug in CRIC which caused the storage to continue to be marked as in downtime for a couple of days after the downtime ended.
  
  Load balacing in ESnet international links fixed for NET2. Mini-data challenges with PRG to be resumed next week (or the next)
- 11:50 → 12:00
  SWT2 10m
  
  Speakers: Andrey Zarochentsev (University of Texas at Arlington (US)), Horst Severini (University of Oklahoma (US)), Kaushik De (University of Texas at Arlington (US)), Mark Sosebee (University of Texas at Arlington (US)), Zachary Thomas Booth (University of Texas at Arlington (US))
  SWT2_CPB:
  
  We rebuilt four storage servers from EL7 to EL9.
  
  There were some transfer errors caused by these rebuilds.
  
  No data has been lost. Backups were made before rebuilds.
  
  We checked and verified data was not lost after rebuilds were complete.
  
  We have one R740xd2 storage server left to rebuild, which will be rebuilt today (4/29).
  
  We have two ME4084 storage arrays (connected to R640) we plan to migrate data from, rebuild to EL9, then put back into production.
  
  Once data is migrated from these servers, all of our storage nodes in production will be EL9.
  
  Changes were made to CRIC to remove the reliability and availability monitoring of SWT2_CPB_SE_TEST-WEBDAV-gridftp.swt2.uta.edu and SWT2_CPB-CE-HTCONDOR-CE-test03.swt2.uta.edu (part of the test cluster) so it does not impact the availability and reliability shown for the SWT2_CPB production site.
  
  We changed the in_report and in_monitored values in CRIC to “False”.
  
  OU:
  
  Running smoothly
  
  Still waiting on feedback from OSCER admins and OneNet folks on the three open tickets. Will follow up again and ask for updates

US ATLAS Tier 2 Technical

News:

Upcoming meetings:

Open tickets:

Operations: