US ATLAS Computing Facility
Facilities Team Google Drive Folder
Zoom information
Meeting ID: 993 2967 7148
Meeting password: 452400
Invite link: https://umich.zoom.us/j/99329677148?pwd=c29ObEdCak9wbFBWY2F2Rlo4cFJ6UT09
-
-
13:00
→
13:05
WBS 2.3 Facility Management News 5mSpeakers: Alexei Klimentov (Brookhaven National Laboratory (US)), Dr Shawn Mc Kee (University of Michigan (US))
We need to prepare for (pre) Scrubbing. A WBS 2.3 L3 template has been shared https://docs.google.com/presentation/d/1mU1eDQQxIE3Lm6qqZsLFPZ-EtmjrgbxXxggr61gJau4/edit?usp=sharing
- Target is having draft slides by June 9th (to be confirmed)
The 5-year evolution spreadsheets for the Tier-2 facility is complete but still needs updates and final numbers
- Each Tier-2 should be working on spending plans for a possible end of CA distribution (see Tier-2 Spending Proposal)
- Tier-2 managers will meet Friday to discuss
HTC25 is fast approaching. We have a draft agenda started at https://agenda.hep.wisc.edu/event/2297/timetable/#20250605.detailed
- Comments welcome
LHCOPN/LHCONE meeting proposed shutting off IPv4 for LHCOPN
- HEPiX IPv6 working group discussed today and we want to see if ATLAS/BNL and CMS/FNAL are willing to try this with the expectation that any IPv4 traffic fails over to LHCONE
- Phil Demar is asking CMS and FNAL if they are will to try to do this in the next month or two. Shawn is tasked with doing the same for ATLAS and BNL.
-
13:05
→
13:10
OSG-LHC 5mSpeakers: Brian Hua Lin (University of Wisconsin), Matyas Selmeci
-
13:10
→
13:30
WBS 2.3.1: Tier1 CenterConvener: Alexei Klimentov (Brookhaven National Laboratory (US))
-
13:10
Tier-1 Infrastructure 5mSpeaker: Jason Smith
-
13:15
Compute Farm 5mSpeaker: Thomas Smith
-
13:20
Storage 5mSpeakers: Carlos Fernando Gamboa (Brookhaven National Laboratory (US)), Carlos Fernando Gamboa (Department of Physics-Brookhaven National Laboratory (BNL)-Unkno)
-
13:25
Tier1 Operations and Monitoring 5mSpeaker: Ivan Glushkov (Brookhaven National Laboratory (US))
-
13:10
-
13:30
→
13:40
WBS 2.3.2 Tier2 Centers
Updates on US Tier-2 centers
Conveners: Fred Luehring (Indiana University (US)), Rafael Coelho Lopes De Sa (University of Massachusetts (US))- Lack of work caused significant disruption over the past week.
- Over the weekend there were only SCORE_HIMEM jobs that would not broker to a site unless meanRSS was set to 3000 MB. We set this at AGLT2, MWT2, and I believe SWT2_CPB and these sites refilled.
- There was also a large number of exotics group jobs that failed at all tier 2 sites for looping.
- EL9 updates/FY24 equipment installs continue at MSU and UTA.
- MSU believes that a Satellite will allow them to finish.
- CPB has been struggling with zombie condor entries.
- There is ticket open with the condor team about the issue.
- Tier 2 PIs will meet on Friday to discuss procurement both FY25 and end of grant special funds.
- I need to get with Rafael on pre-scubbing slides.
- Lack of work caused significant disruption over the past week.
-
13:40
→
13:50
WBS 2.3.3 Heterogenous Integration and Operations
HIOPS
Convener: Rui Wang (Argonne National Laboratory (US))- 13:40
-
13:45
Integration of Complex Workflows on Heterogeneous Resources 5mSpeakers: Doug Benjamin (Brookhaven National Laboratory (US)), Xin Zhao (Brookhaven National Laboratory (US))
-
13:50
→
14:10
WBS 2.3.4 Analysis FacilitiesConveners: Ofer Rind (Brookhaven National Laboratory), Wei Yang (SLAC National Accelerator Laboratory (US))
-
13:50
Analysis Facilities - BNL 5mSpeaker: Qiulan Huang (Brookhaven National Laboratory (US))
- Managed to load user name and group name while creating the pod with init container
- The dCache nfs client still needs to configure NFSv4 identity mapping properly
- Needs further work on the idmap on openshift work node or pod
- The test of pull/push image to/from to SDCC Quay service is done
- Customize the alma9 base image and build it to register on SDCC Quay.
- Tom Smith is deploying the accounting monitoring that was missing from the A9 Tier-3 pool and interactive hosts
- Managed to load user name and group name while creating the pod with init container
-
13:55
Analysis Facilities - SLAC 5mSpeaker: Wei Yang (SLAC National Accelerator Laboratory (US))
- 14:00
-
13:50
-
14:10
→
14:25
WBS 2.3.5 Continuous OperationsConvener: Ofer Rind (Brookhaven National Laboratory)
-
14:10
ADC Operations, US Cloud Operations: Site Issues, Tickets & ADC Ops News 5mSpeaker: Ivan Glushkov (Brookhaven National Laboratory (US))
- Preparing for deployment of FTS update at BNL (v14.0.1 to be released next week) - will allow for token testing during data challenge
- Varnish at BNL now functional on OpenShift with Quay image; still some network routing to deploy
- DDM moved BNL VP queue xcache to ESNET server
- Ongoing discussions of Varnish deployment and management
- CRIC permissions were updated (more info)
- BNL-OSG2_DATADISK protocol priorities to be changed from 0 to null.
-
14:15
Services DevOps 5mSpeaker: Ilija Vukotic (University of Chicago (US))
- XCaches
- multiple issues in UK cloud.
- ESNet xcache is operational but no monitoring coming from it.
- will try to build new image this week
- VP
- BNL_VP trying to use ESnet xcache
- Varnish
- starting building neo_frontier infrastructure at OpenStack k8s cluster at CERN
- asked SWT2 to deploy their own Varnish
- all Varnishes removed from WLCG monitoring. Dedicated varnish monitoring meeting on Friday 9:30 AM CST
- CREST
- NTR
- ServiceX/Y
- updated all the components to 1.6.1
- testing RDataFrame codegenerator and transformers
- AF
- cleaned up images and their naming.
- added python 3.12 to login nodes.
- XCaches
-
14:20
Facility R&D 5mSpeaker: Lincoln Bryant (University of Chicago (US))
- Armada seems to be working locally on the stretched k8s, and we are investigating the auth components needed to send tasks to another cluster
- We are actively debugging/trying to understand EOS user authentication.
- Kerberos nonstarter, X509 might be tricky because the EOS containers are all EL7 (!) and we're trying to understand the CA/cert situation
- "plain" OAuth2 deprecated, with support shifting to SciToken-based auth
- Not quite clear how to bridge the gap from Keycloak to SciTokens, still working on it
- Coffea Casa JupyterHub should be working on https://coffea-casa.hl-lhc.io/ , with caveats..
- Must have a UChicago AF account already, to get your /home, /data, and access to HTCondor
- Still working on:
- General ATLAS users coming from IAM without a UChicago AF account
- Only get Jupyter, no persistence
- Probably will crash right now if you try it
- HTCondor pool on the stretched cluster
- Mounting NFS/Ceph over the WireGuard interface within K8S
- Jupyter limited to UChicago nodes at the moment, where we can mount locally
- General ATLAS users coming from IAM without a UChicago AF account
-
14:10
-
14:25
→
14:35
AOB 10m
-
13:00
→
13:05