US ATLAS Computing Facility
→
US/Eastern
Description
Facilities Team Google Drive Folder
Zoom information
Meeting ID: 996 1094 4232
Meeting password: 125
Invite link: https://uchicago.zoom.us/j/99610944232?pwd=ZG1BMG1FcUtvR2c2UnRRU3l3bkRhQT09
-
-
13:00
→
13:10
WBS 2.3 Facility Management News 10mSpeakers: Robert William Gardner Jr (University of Chicago (US)), Dr Shawn McKee (University of Michigan ATLAS Group)
-
13:10
→
13:20
OSG-LHC 10mSpeakers: Brian Lin (University of Wisconsin), Matyas Selmeci
Releases
- HTCondor-CE 5.1.0, HTCondor 9.0.0 (upcoming): recommended update to support both tokens + GSI!
- Frontier Squid security fix (already deployed in ATLAS, thanks DevOps!)
- HTCondor 8.8.13
- XRootD 5.2.0 RC1 available in upcoming-testing
- HTCondor Week May 24-28 registration open! https://agenda.hep.wisc.edu/event/1579/
- WLCG CE + Pilot Factory Hackathon June 3-4: https://indico.cern.ch/event/1032742/
-
13:20
→
13:35
Topical ReportsConvener: Robert William Gardner Jr (University of Chicago (US))
-
13:20
Scale Validation of the XRootD Monitoring pipeline 15m
Derek Weitzel will cover the work to verify the transfer accounting of XRootD and the OSG’s XRootD Monitoring Collector pipeline which is replacing the legacy GLED collector currently hosted at UCSD. We found that the single largest issue with the monitoring is the unreliable communication between the XRootD instances and the collectors since it uses the UDP protocol.
Speaker: Derek Weitzel (University of Nebraska Lincoln (US))
-
13:20
-
13:35
→
13:40
WBS 2.3.1 Tier1 Center 5mSpeakers: Eric Christian Lancon (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR)), Xin Zhao (Brookhaven National Laboratory (US))
-
13:40
→
14:00
WBS 2.3.2 Tier2 Centers
Updates on US Tier-2 centers
Convener: Fred Luehring (Indiana University (US))-
13:40
AGLT2 5mSpeakers: Philippe Laurens (Michigan State University (US)), Dr Shawn McKee (University of Michigan ATLAS Group), Prof. Wenjing Wu (Computer Center, IHEP, CAS)
- Added 3 C6420 Nodes (288 cores) to be shared between UM Tier3 and Tier2.
- Site started draining (65% usage) on 6th May, reported to ADC, added a second gatekeeper, it started to ramp up after 24 hours.
- Had low transfer efficiency and job stage in error (27% job failure) over the weekend, file access time out. The solution was to restart all dCache services.
- Update dCache from 6.2.15 to 6.2.21 on Monday; Seems to have helped: greatly decreased transfer errors and increased job efficiency.
- Will also test adding 50% more memory to one of the MSU dcache pool nodes with the most pools and files as those still seem to cause more errors than they should.
- Update 3 squid servers (UM site) to 4.15-1.1 to address recent security alert; MSU has updated one squid server to SL7, another one to be updated.
-
13:45
MWT2 5mSpeakers: David Jordan (University of Chicago (US)), Judith Lorraine Stephen (University of Chicago (US))
-
13:50
NET2 5mSpeaker: Prof. Saul Youssef (Boston University (US))
-
13:55
SWT2 5mSpeakers: Dr Horst Severini (University of Oklahoma (US)), Mark Sosebee (University of Texas at Arlington (US)), Patrick Mcguigan (University of Texas at Arlington (US))
-
13:40
-
14:00
→
14:05
WBS 2.3.3 HPC Operations 5mSpeakers: Doug Benjamin (Duke University (US)), Lincoln Bryant (University of Chicago (US))
-
14:05
→
14:20
WBS 2.3.4 Analysis FacilitiesConvener: Wei Yang (SLAC National Accelerator Laboratory (US))
-
14:05
Analysis Facilities - BNL 5mSpeaker: William Strecker-Kellogg (Brookhaven National Lab)
-
14:10
Analysis Facilities - SLAC 5mSpeaker: Wei Yang (SLAC National Accelerator Laboratory (US))
- 14:15
-
14:05
-
14:20
→
14:40
WBS 2.3.5 Continuous OperationsConvener: Ofer Rind
- BNL and SWT2 XRootd testbeds are updated with latest test version and undergoing stress tests. BNL monitoring is set up.
- Ongoing discussion about VOMS-IAM testing; awaiting development on VOMS import
- Frontier-Squid DevOps meeting today
- Latest security update from OSG-testing applied to SLATE squids
- Discussion of update process
- Non-SLATE squids still running at some sites - need an audit of configs across sites
- Mark S. compiling information on downtime problems, planning for dicussion at ADC weekly in two weeks
- Also looking at site CRIC configurations, e.g. NET2
- Jobs still not getting brokered to ANALY_BNL_VP queue, unclear why (some suggestions from Rod)
-
14:20
US Cloud Operations Summary: Site Issues, Tickets & ADC Ops News 5mSpeakers: Mark Sosebee (University of Texas at Arlington (US)), Xin Zhao (Brookhaven National Laboratory (US))
-
14:25
Service Development & Deployment 5mSpeakers: Ilija Vukotic (University of Chicago (US)), Robert William Gardner Jr (University of Chicago (US))
-
14:40
→
14:45
AOB 5m
-
13:00
→
13:10