US ATLAS Computing Facility (Possible Topical)
Facilities Team Google Drive Folder
Zoom information
Meeting ID: 993 2967 7148
Meeting password: 452400
Invite link: https://umich.zoom.us/j/99329677148?pwd=c29ObEdCak9wbFBWY2F2Rlo4cFJ6UT09
-
-
13:00
→
13:05
WBS 2.3 Facility Management News 5mSpeakers: Alexei Klimentov (Brookhaven National Laboratory (US)), Dr Shawn Mc Kee (University of Michigan (US))
We will have a WLCG mini-capability discussion on Friday at 2:30 PM Eastern/1:30 PM Central including both USATLAS and USCMS (as well as ESnet). Zoom info https://unl.zoom.us/j/96665685001 Please join if you are interested.
- Optimization & Host Tuning Draft Plan
https://docs.google.com/document/d/1eGUYg30GMawxL_dhMhCcU7_9E4JWfjlXiM_BhG8i7dQ/edit?tab=t.0#heading=h.yam1bisflruw
Monitoring Capabilities Draft
https://docs.google.com/document/d/1qvCWvJw5dT3NmJBl-SF4v-quuCVXpzPM4iWAaKWpW8M/edit?usp=sharing
Initial November 5th meeting notes
https://docs.google.com/document/d/1zuHdDeMfp0lsFMphy0_WwnFAdihpSCX_PPe_yFTG2R8/edit?tab=t.0#heading=h.2noi3geixvlq
During today's updates it would be good to hear about plans for the facility regarding holiday coverage (if any).
The HEPiX Techwatch WG had some discouraging news on disk and memory pricing and supply chain delays...this will likely adversely impact our purchases at least till 2028
- Optimization & Host Tuning Draft Plan
-
13:05
→
13:10
OSG-LHC 5mSpeakers: Brian Hua Lin (University of Wisconsin), Matyas Selmeci
- No releases planned until next year
- Google is disallowing use of certs for client auth in Chrome (https://knowledge.digicert.com/alerts/sunsetting-client-authentication-eku-from-digicert-public-tls-certificates). IGTF CAs that are also respected by Chrome by default will need to adjust or be dropped from Chrome.
- OSG-LHC PEP
- Blueprints for OSG Accounting future planning, XRootD monitoring architecture authentication
- Integration tests of containers
- Cybersecurity tabletop
- Capability challenges as part of mini DC challenges
- Support enabling SciTags on US ATLAS/CMS
- Remove last vestiges of X.509
-
13:10
→
13:20
Rucio/SENSE at NET2: integration, demonstration, and next steps 10m
This presentation will cover the NET2 experience at SC25 and discuss near term plans for SENSE/Rucio work in the facility.
Speaker: Rafael Coelho Lopes De Sa (University of Massachusetts (US)) -
13:20
→
13:40
WBS 2.3.1: Tier1 CenterConvener: Alexei Klimentov (Brookhaven National Laboratory (US))
-
13:20
Tier-1 Infrastructure 5mSpeaker: Jason Smith
-
13:25
Compute Farm 5mSpeaker: Thomas Smith
All condor workers upgraded to alma 9.7
Need to schedule upgrade downtimes for the condor CEs
gratia reporting was non functioning, packages were updated and reporting was restored. The gap in reporting was filled retroactively without further intervention needed. Thanks Derek for reporting the issue
- 13:30
-
13:35
Tier1 Operations and Monitoring 5mSpeaker: Ofer Rind (Brookhaven National Laboratory)
-
13:20
-
13:40
→
13:50
WBS 2.3.2 Tier2 Centers
Updates on US Tier-2 centers
Conveners: Fred Luehring (Indiana University (US)), Rafael Coelho Lopes De Sa (University of Massachusetts (US))- Good running over the last two weeks
- TW-FTT has had transfer troubles with certain European countries.
- Today they have some sort of general network problem.
- Otherwise a very quiet period with full production.
- TW-FTT has had transfer troubles with certain European countries.
- Please, please submit the tier 2 site operations before you leave for the end of year holidays.
- We will follow up with the sites on administrative coverage over the holidays.
- My guess is this will look like it normally does: best effort.
- Need to revisit the Tier 2 equipment projects ASAP because the CA will likely be reviewed during January.
- The current federal budget continuing resolutions expires at the end of January and NSF wants to get the external reviews of the CA done in January,
- The goal is to have the CA ready to be presented at the National Science Board summer meeting.
- Good running over the last two weeks
-
13:50
→
14:00
WBS 2.3.3 Heterogenous Integration and Operations
HIOPS
Convener: Rui Wang (Argonne National Laboratory (US))-
13:50
HPC Operations 5mSpeaker: Rui Wang (Argonne National Laboratory (US))
-
13:55
Integration of Complex Workflows on Heterogeneous Resources 5mSpeaker: Doug Benjamin (Brookhaven National Laboratory (US))
Debugging error seen on GPU queue - HammerCloud jobs recording error in stage out. "File transfer timed out during stage-out: hc_test:ced04002-0d9e-4fab-b478-b8fb314d3e43_49036.1.job.log.tgz to BNL-OSG2_SCRATCHDISK, copy command timed out: TimeoutException: Unknown time-out related error, see batch log for more info, timeout=None seconds')]:failed to transfer files using copytools=['rucio'] "
Trying to under why user jobs are not starting.
-
13:50
-
14:00
→
14:20
WBS 2.3.4 Analysis FacilitiesConvener: Wei Yang (SLAC National Accelerator Laboratory (US))
-
14:00
Analysis Facilities - BNL 5mSpeaker: Qiulan Huang (Brookhaven National Laboratory (US))
- Follow up the users(inactive users with data and active users with no data) and wait for the policy decision on handling users’ storage areas.
- start to work on user quota management
- Test the new federated JupyterHub services for FCC and DUNE; the same changes will be applied to ATLAS after they are puppetized.
- Follow up the users(inactive users with data and active users with no data) and wait for the policy decision on handling users’ storage areas.
-
14:05
Analysis Facilities - SLAC 5mSpeaker: Wei Yang (SLAC National Accelerator Laboratory (US))
- 14:10
-
14:00
-
14:20
→
14:40
WBS 2.3.5 Continuous OperationsConveners: Ivan Glushkov (Brookhaven National Laboratory (US)), Ofer Rind (Brookhaven National Laboratory)
- A bug was introduced in the latest version of the pilot in which it divides the maxwdir to hard 8 (not PQ.corecount) (ATLASPANDA-1575). This affects mostly the score queues. To be fixed once Paul is back.
-
14:20
ADC Operations, US Cloud Operations: Site Issues, Tickets & ADC Ops News 5mSpeaker: Kaushik De (University of Texas at Arlington (US))
-
14:25
Services DevOps 5mSpeaker: Ilija Vukotic (University of Chicago (US))
-
14:30
Facility R&D 5mSpeaker: Robert William Gardner Jr (University of Chicago (US))
-
14:35
Cybersecurity plan(s) 5mSpeakers: Robert William Gardner Jr (University of Chicago (US)), Shigeki Misawa (Brookhaven National Laboratory (US))
- 14:40 → 14:50
-
13:00
→
13:05