US ATLAS Computing Facility (Possible Topical)

US/Eastern
Description

Facilities Team Google Drive Folder

Zoom information

Meeting ID:  993 2967 7148

Meeting password: 452400

Invite link:  https://umich.zoom.us/j/99329677148?pwd=c29ObEdCak9wbFBWY2F2Rlo4cFJ6UT09

 

 

    • 13:00 13:05
      WBS 2.3 Facility Management News 5m
      Speakers: Alexei Klimentov (Brookhaven National Laboratory (US)), Dr Shawn Mc Kee (University of Michigan (US))

      We are waiting on news about funding (For FY26 and the End-of-CA funds).  One interesting possibility was mentioned by the NSF program officers...they asked if we needed a no-cost extension (NCE), which we previously thought was NOT possible.  We may want to discuss if this would be useful?   Will pricing and availability get worse as was move into and through 2027?

      The USATLAS meeting in Madison is coming up June 9-10 (part of HTC26 https://chtc.cs.wisc.edu/events/2026/06/09/throughput-computing-week).  Our planning notes are at: https://docs.google.com/document/d/1buI_ganzv1rGv-XpQrS1841GknTZSano6Vl2Imw6mp8/edit?usp=sharing

      Lots of activity related to GENESIS proposals underway.  News / updates?

      Lots of meetings coming up:  LHCOPN/LHCONE (next week in Montreal), HEPiX and CHEP, to name a few before June.

       

    • 13:05 13:10
      OSG-LHC 5m
      Speakers: Brian Hua Lin (University of Wisconsin), Matyas Selmeci
      • Release (this week)
      • Coordinating non-X.509 based access to Topology data with CRIC and GGUS teams
      • Topoogy Facility renames that will affect downtime UI
        • Columbia University_Nevis Labs -> Columbia University
        • UC Irvine -> University of California, Irvine
        • University of Massachusetts - Amherst -> University of Massachusetts Amherst
        • University of Texas Arlington -> The University of Texas at Arlington
        • University of Texas at Austin -> The University of Texas at Austin
        • University of Texas at Dallas -> The University of Texas at Dallas
    • 13:10 13:30
      WBS 2.3.1: Tier1 Center
      Convener: Alexei Klimentov (Brookhaven National Laboratory (US))
      • 13:10
        Tier-1 Infrastructure 5m
        Speaker: Jason Smith
      • 13:15
        Compute Farm 5m
        Speaker: Thomas Smith

        Nothing major to report

        The last week has seen smooth operations and full occupancy

      • 13:20
        Storage 5m
        Speakers: Carlos Fernando Gamboa (Department of Physics-Brookhaven National Laboratory (BNL)-Unkno), Carlos Fernando Gamboa (Brookhaven National Laboratory (US))
        • Operations focused ATLAS recent staging/injecting activities
          • From HPSS stats:
            • (4 days), Atlas injected 1.14 PiB (282,417 files) with Average injection rate of 3.4GiB/sec. Concurrently, Atlas staged 364.16 TiB, (81,867 files) at 1.06GiB/sec
        • Integration instance on dCache 11.2.3 
      • 13:25
        Tier1 Operations and Monitoring 5m
        Speaker: Ofer Rind (Brookhaven National Laboratory)
        • Simultaneous derivations campaign and high volume of data writing is putting some pressure on the HPSS service - reallocated and added some resources to help handle the load.  Some residual staging failures due to attempt to move HPSS batch directories to NFS volumes last week.
    • 13:30 13:40
      WBS 2.3.2 Tier2 Centers

      Updates on US Tier-2 centers

      Conveners: Fred Luehring (Indiana University (US)), Rafael Coelho Lopes De Sa (University of Massachusetts (US))
      • Good running recently
        • TW-FTT has been running very few job slots recently.
        • Short central issue on April 1.
        • NET2 down for on March 31.
      • Funding status is looking good but the deal is not signed - from John Hobbs on April 1 after the JOG:

        I had a talk with Aamir [Ali - NSF Program Manager] yesterday. He said that it's "highly, highly probable" that we'll get the full year 5 amount. Also, surprisingly, he asked if we'll need a no cost extension to spend down the funding. This is a major shift. I told him that as long as we knew early enough what the total funding would be, we were planning to spend down in the current NSF year. But this does give us a little wiggle room.

        • There will likely be FY26 funding in the usual amounts
        • Each Tier 2 will about $1.4 million in end of CA funding for infrastructure or servers.
        • With the unspent FY25 funds sites will have up to $2 million to spend on equipment and infrastructure.
      • Please get you quarterly reporting in by COB on 4/17/2026.
    • 13:40 13:50
      WBS 2.3.3 Heterogenous Integration and Operations

      HIOPS

      Convener: Rui Wang (Argonne National Laboratory (US))
      • 13:40
        HPC Operations 5m
        Speaker: Rui Wang (Argonne National Laboratory (US))

        Perlmutter: fixing the harvester setup for production queues

        TACC: working on the BNL HPC datadisk permission 

      • 13:45
        Integration of Complex Workflows on Heterogeneous Resources 5m
        Speaker: Doug Benjamin (Brookhaven National Laboratory (US))
    • 13:50 14:10
      WBS 2.3.4 Analysis Facilities
      Convener: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 13:50
        Analysis Facilities - BNL 5m
        Speaker: Qiulan Huang (Brookhaven National Laboratory (US))
      • 13:55
        Analysis Facilities - SLAC 5m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 14:00
        Analysis Facilities - Chicago 5m
        Speaker: Fengping Hu (University of Chicago (US))

        REANA has seen its first production usage since deployment, with hundreds of workflows running per day on the Kubernetes backend.

        • Configuration updates include enabling CERN GitLab image pulls, adding CVMFS access, enabling user notifications, increasing shared storage capacity, and tuning the concurrent workflow limits.
    • 14:10 14:30
      WBS 2.3.5 Continuous Operations
      Conveners: Ivan Glushkov (Brookhaven National Laboratory (US)), Ofer Rind (Brookhaven National Laboratory)
      • 14:10
        ADC Operations, US Cloud Operations: Site Issues, Tickets & ADC Ops News 5m
        Speaker: Kaushik De (University of Texas at Arlington (US))
        • Low number of running jobs last Wednesday due to Harvester CephFS migration CSOPS-2430
        • Questions about Nevis storage monitoring and usage - revisit retiring the T3?
        • SWT2 (Andrey and Zach) found issue: ALRB apptainer for user jobs is being created in home directory. ATLAS jobs should not be accessing home - since this could create scalability issue. GGUS ticket opened. Long thread on this topic. Stay tuned. GGUS-Ticket-ID: #1002282
      • 14:15
        Services DevOps 5m
        Speaker: Ilija Vukotic (University of Chicago (US))

        XCaches - mostly OK. node in UK had to be restarted.

        Varnishes - all OK 

        Frontiers

        • all OK
        • working on changing lxplus settings
        • should start removing old Frontiers from CRIC (now second choice)

         

        AI

        • OpenClaw running on our DGX Spark does periodic monitoring
          • ServiceX instances on River
          • HTCondor on AF
          • River k8s cluster
          • All Varnish and Frontier instances - reports to ADAM-varnish Mattermost channel 
          • more to come...
          •  
      • 14:20
        Facility R&D 5m
        Speaker: Robert William Gardner Jr (University of Chicago (US))
      • 14:25
        Cybersecurity plan(s) 5m
        Speakers: Robert William Gardner Jr (University of Chicago (US)), Shigeki Misawa (Brookhaven National Laboratory (US))
    • 14:30 14:40
      AOB 10m