US ATLAS Computing Facility
-
-
13:00
→
13:10
WBS 2.3 Facility Management News 10mSpeakers: Eric Christian Lancon (BNL), Robert William Gardner Jr (University of Chicago (US))
Hope everyone is enjoying their summer! Eric on vacation this week.
- Topical speakers - don't forget to signup!
- https://docs.google.com/document/d/1NIc67p3AB2RkYjJsP6Nx_lwPXFX03w1n2SFOgCU47ro/edit?usp=sharing
- Sep 4 meeting currently open
- Focus at the moment is procurement and completing Tier2 tasks (Shawn)
- The September GDB and pre-GDB will be located at Fermilab
- Attempting to organize federated edge security session with WLCG (new working group forming)
- We will have a meeting int two weeks, Eric will chair
- Topical speakers - don't forget to signup!
-
13:20
→
13:40
Topical Report
- 13:20
-
13:40
→
14:25
US Cloud Status
-
13:40
US Cloud Operations Summary 5mSpeaker: Mark Sosebee (University of Texas at Arlington (US))
- 13:45
-
13:50
AGLT2 5mSpeakers: Philippe Laurens (Michigan State University (US)), Dr Shawn McKee (University of Michigan ATLAS Group), Prof. Wenjing Wu (Computer Center, IHEP, CAS)
Incidents
1) the gatekeeper which receives the HC jobs stopped working , we did not have monitor for the condor-ce service, so did not realize it right away.
2) a big portion of analysis jobs fail, the site gets a ggus ticket. We found out one script we use to clean up the zombie files left by killed job by HTcondor accidentally deletes the work dir of running jobs too. This was a bug in the script when it switches to pilot2. We fixed the bug in our script.
Hardware:
Sorted out the storage servers we could retire from Tier2 according to the age of the hardware and also the number of failures on the hardware. Figure out the items for the purchase.
Service
Setup a new replication server for dCache database
-
13:55
MWT2 5mSpeakers: Judith Lorraine Stephen (University of Chicago (US)), Lincoln Bryant (University of Chicago (US))
Upgraded frontier-squid site-wide to 4.8-1.1.
Discussed MWT2 retirement and purchasing plans.
Still working on getting the dCache nodes dual-stacked. Needed to get external IPv6 PTR records set up from UC ITS. This was completed yesterday.
Setting up temporary IU and UIUC SLATE nodes to test SLATE frontier-squid configuration (see Lincoln's talk).
-
14:00
NET2 5mSpeaker: Prof. Saul Youssef (Boston University (US))
We've reinstalled the NET2 squid with new software for security and to get rid of a low level GGUS ticket where we have too many failovers. Seems to have worked.
Lot's of work re: migrating to NESE. Gridftp Docker container with Wei's Gridftp (with Adler callout) works. Lots of work and testing still to do. ADC informed. Will have two DATADISK space tokens during transition.
Smooth operations and full site otherwise.
We have two open GGUS tickets. Both can be closed.
-
14:05
SWT2 5mSpeakers: Dr Horst Severini (University of Oklahoma (US)), Kaushik De (University of Texas at Arlington (US)), Mark Sosebee (University of Texas at Arlington (US)), Patrick Mcguigan (University of Texas at Arlington (US))
OU:
- all working well
- in the process of reconfiguring OU xrootd storage for automatic space group assignment and http-over-xrootd
UTA:
1) Migration of UTA_SWT2 to CentOS7 completed
2) All equipment from recent hardware purchase received - planning deployment schedule
3) Systems running well post-CentOS7 upgrades
-
14:10
HPC Operations 5mSpeaker: Doug Benjamin (Duke University (US))
-
14:15
Analysis Facilities - SLAC 5mSpeaker: Wei Yang (SLAC National Accelerator Laboratory (US))
-
14:20
Analysis Facilities - BNL 5mSpeaker: William Strecker-Kellogg (Brookhaven National Lab)
-
13:40
-
14:25
→
14:30
AOB 5m
-
13:00
→
13:10