US ATLAS Computing Facility

US/Eastern
Videoconference Rooms
US_ATLAS_Computing_Integration_and_Operations
Name
US_ATLAS_Computing_Integration_and_Operations
Description
Bi-weekly Facilities meeting
Extension
109263008
Owner
Robert William Gardner Jr
Auto-join URL
Useful links
Phone numbers
    • 13:00 13:10
      WBS 2.3 Facility Management News 10m
      Speakers: Eric Christian Lancon (BNL), Robert William Gardner Jr (University of Chicago (US))
    • 13:10 13:20
      OSG-LHC 10m
      Speakers: Brian Lin (University of Wisconsin), Matyas Selmeci
    • 13:20 13:35
      Topical Report
      Convener: Robert William Gardner Jr (University of Chicago (US))
      • 13:20
        Deploying Harvester 12m
        Speaker: Marc Gabriel Weinberg (University of Chicago (US))
    • 13:35 13:40
      Tier1 Center 5m
      Speakers: Eric Christian Lancon (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR)), Xin Zhao (Brookhaven National Laboratory (US))
      • HPSS downtime today, 9am to 5pm, for tape library manager and CORE server upgrade.
      • MAS R&D : data popularity study based on dCache logs
      • plan to run local tape test to understand staging bottlenecks
      • set up new PQ to support ART jobs, which run only on INTEL nodes.
    • 13:40 14:00
      Tier2 Centers
      Convener: Shawn Mc Kee (University of Michigan (US))
      • 13:40
        AGLT2 5m
        Speakers: Philippe Laurens (Michigan State University (US)), Dr Shawn McKee (University of Michigan ATLAS Group), Prof. Wenjing Wu (Computer Center, IHEP, CAS)

        hardware:

        added 2 dcache servers(R740xd2), in production, retired 2 old dCache storage nodes (MD1000)

        added 4 supermicro nodes to HTCondor, each with 72HT cores, 2 run the formulus kernel, 2 run regular SL7 kernel. 

        added 1 GPU (R740, 2xV1000 GPUs, Integrated Matrox G200eW3 Graphics Controller)node to the condor queue, GPU is requested by the Tier3 user, CPUs are added to the condor queue. 

        Service 

        Security patch for squid server applied. 

        Smooth running, had1 ticket closed (transfer failure to 2 Tier2 sites, dcache issue, not much the site can do)

         

         
         
         
      • 13:45
        MWT2 5m
        Speakers: David Jordan (University of Chicago (US)), Judith Lorraine Stephen (University of Chicago (US))

        Waiting for pilot2 fix to resolve outstanding GGUS ticket

        New UC equipment in the process of shipping

        IU equipment purchase submitted; should arrive by Thanksgiving

        Minor GPFS outage for UIUC nodes last Friday (7 Nov)

      • 13:50
        NET2 5m
        Speaker: Prof. Saul Youssef (Boston University (US))

         

        Investigating a problem where nodes mysteriously become unresponsive for ~5 min. and then recover.

        Working out how to get acceptable host certs for NESE gridftp containers. 

        NESE endpoints will start off with gridftp, but we'll need to figure out what's next soon.  

         

         

      • 13:55
        SWT2 5m
        Speakers: Dr Horst Severini (University of Oklahoma (US)), Mark Sosebee (University of Texas at Arlington (US)), Patrick Mcguigan (University of Texas at Arlington (US))

        UTA_SWT2

        • Electrical work at facility forced an outage
        • The electrical work will require a second outage when complete

        SWT2_CPB

        • Took a separate outage for maintenance on UPS
        • Did firmware updates on MD3XXXi storage servers to work on Event Index job problems
        • Had problems with cluster NFS server (R510), when one drive failed in the OS RAID 1 setup.  Now repaired

        OU

        - Nothing to report, all running well

         

    • 14:00 14:05
      HPC Operations 5m
      Speakers: Doug Benjamin (Duke University (US)), Marc Gabriel Weinberg (University of Chicago (US))

      This week Marc presented on his experience deploying Harvester as part of PanDA integration with NSF supercomputers (see Topical presentation above). 

    • 14:05 14:20
      Analysis Facilities
      Convener: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 14:05
        Analysis Facilities - BNL 5m
        Speaker: William Strecker-Kellogg (Brookhaven National Lab)

        Nothing new to report

      • 14:10
        Analysis Facilities - SLAC 5m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 14:15
        ATLAS ML Platform & User Support 5m
        Speaker: Ilija Vukotic (University of Chicago (US))
    • 14:20 14:40
      Continuous Operations
      Convener: Robert William Gardner Jr (University of Chicago (US))
      • 14:20
        US Cloud Operations Summary: Site Issues, Tickets & ADC Ops News 5m
        Speakers: Mark Sosebee (University of Texas at Arlington (US)), Xin Zhao (Brookhaven National Laboratory (US))
      • 14:25
        Analytics Infrastructure & User Support 5m
        Speaker: Ilija Vukotic (University of Chicago (US))
      • 14:30
        Intelligent Data Delivery R&D (co-w/ WBS 2.4.x) 5m
        Speakers: Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER), Andrew Hanushevsky (Unknown), Ilija Vukotic (University of Chicago (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 14:40 14:45
      AOB 5m