US ATLAS Computing Facility

US/Eastern
Videoconference Rooms
US_ATLAS_Computing_Integration_and_Operations
Name
US_ATLAS_Computing_Integration_and_Operations
Description
Bi-weekly Facilities meeting
Extension
109263008
Owner
Robert William Gardner Jr
Auto-join URL
Useful links
Phone numbers
    • 13:00 13:10
      WBS 2.3 Facility Management News 10m
      Speakers: Eric Christian Lancon (BNL), Robert William Gardner Jr (University of Chicago (US))
    • 13:10 13:20
      OSG-LHC 10m
      Speakers: Brian Lin (University of Wisconsin), Matyas Selmeci

      Packages Ready for Testing

      Other

      • gate04.aglt2.org is on HTCondor-CE 2.2.4, which has been unsupported in the OSG since May 2018
      • Are there meetings for ATLAS XCache operations?
    • 13:20 13:35
      Topical Report
      Convener: Robert William Gardner Jr (University of Chicago (US))
      • 13:20
        TBD 12m
    • 13:35 13:40
      Tier1 Center 5m
      Speakers: Eric Christian Lancon (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR)), Xin Zhao (Brookhaven National Laboratory (US))
      • ART jobs work fine on ANALY_BNL_INTEL PQ now, after changes on harvester template.
      • local tape stress test done over the last week, to understand better staging bottlenecks and improvements;
      • MAS discussion ongoing, people interested can join the egroup atlas-adc-qos-mas@cern.ch
    • 13:40 14:00
      Tier2 Centers
      Convener: Shawn Mc Kee (University of Michigan (US))
      • 13:40
        AGLT2 5m
        Speakers: Philippe Laurens (Michigan State University (US)), Dr Shawn McKee (University of Michigan ATLAS Group), Prof. Wenjing Wu (Computer Center, IHEP, CAS)

        Operation:

          - Had one dcache server accidentally powered off and restored.
            Ticket 144087

          - Otherwise no problem.

        Hardware:

         - 3 more dcache nodes in production now, with data migration in progress.
           In total UM has 5 new dcache nodes (Dell R740x2d), all in production now.  We have retired UMFS01/03/04/12/13.  Will need to update the facility tracking at https://docs.google.com/spreadsheets/d/1YjDe4YdApHoB5_HbDnNwrG-ceJP3amNWMb_VzQEaxGI/edit#gid=0
           One more at MSU (MSUFS17) will be in production soon.

         - Added 3 worker nodes (C6420)
           Purchased with UM T3 fund, but shared by T2 and T3.
           CPU model: Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz
           Total 192 more cores.

        Services:

         - Progress on enabling IPV6 everywhere and registering for DNS.  Will allow us to check-off the last red box on the IPv6 tracking spreadsheet at https://docs.google.com/spreadsheets/d/1d2FbmFoXZkBP_cAmJ5q5kWgdsGnWuyFT0ot1n9Gf4ns/edit#gid=0
             - All storage nodes and dcache services had been for a long time.
             - All UM T2 nodes are now also enabled and registered for IPV6.

         

         

         

      • 13:45
        MWT2 5m
        Speakers: David Jordan (University of Chicago (US)), Judith Lorraine Stephen (University of Chicago (US))

        UC

        • All new servers received and are in the process of being racked, cabled, and configured
        • SRM door crashed November 13. Restarting the SRM dCache domain fixed it.
        • Still seeing issues with the new pilot and timeouts. Much of this appears to be on the ANALY_MWT2_UCORE queue and is due to xcache and VP issues that Ilija is working on.
        • Planning on upgrading our CEs next week to 4.x for the job restart fix
        • Upgraded the atlasconnect login node to CentOS7

        IU

        • New equipment (xcache, compute) scheduled for delivery in the coming week

        UIUC

        • Waiting on purchasing decisions
        • Working on getting atlas jobs running on the secondary queue again
      • 13:50
        NET2 5m
        Speaker: Prof. Saul Youssef (Boston University (US))

         

        Operations smooth...

        NESE storage and networking gear on the way (9 PB total, 6 for NET2), likely 9 PB more soon... including 1 node for SLATE.

        Working with tape vendors on possible NESE tape storage tier which would be useable by NET2 as well.

         

      • 13:55
        SWT2 5m
        Speakers: Dr Horst Severini (University of Oklahoma (US)), Mark Sosebee (University of Texas at Arlington (US)), Patrick Mcguigan (University of Texas at Arlington (US))

        OU:

        - site running well

        - currently not full, investigating

        - transfer failures yesterday, xrootd and gridftp restart on se1 fixed that; not sure what the problem was

         

    • 14:00 14:05
      HPC Operations 5m
      Speakers: Doug Benjamin (Duke University (US)), Marc Gabriel Weinberg (University of Chicago (US))
    • 14:05 14:20
      Analysis Facilities
      Convener: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 14:05
        Analysis Facilities - BNL 5m
        Speaker: William Strecker-Kellogg (Brookhaven National Lab)
      • 14:10
        Analysis Facilities - SLAC 5m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 14:15
        ATLAS ML Platform & User Support 5m
        Speaker: Ilija Vukotic (University of Chicago (US))
    • 14:20 14:40
      Continuous Operations
      Convener: Robert William Gardner Jr (University of Chicago (US))
      • 14:20
        US Cloud Operations Summary: Site Issues, Tickets & ADC Ops News 5m
        Speakers: Mark Sosebee (University of Texas at Arlington (US)), Xin Zhao (Brookhaven National Laboratory (US))
      • 14:25
        Analytics Infrastructure & User Support 5m
        Speaker: Ilija Vukotic (University of Chicago (US))
      • 14:30
        Intelligent Data Delivery R&D (co-w/ WBS 2.4.x) 5m
        Speakers: Andrew Hanushevsky (Unknown), Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER), Ilija Vukotic (University of Chicago (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 14:40 14:45
      AOB 5m