US ATLAS Computing Integration and Operations

US/Eastern
Description
Notes and other material available in the US ATLAS Integration Program Twiki
    • 13:00 13:15
      Top of the Meeting 15m
      Speakers: Eric Christian Lancon (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR)), Robert William Gardner Jr (University of Chicago (US))
    • 13:25 13:35
      Capacity News: Procurements & Retirements 10m
    • 13:35 13:45
      Production 10m
      Speaker: Mark Sosebee (University of Texas at Arlington (US))
    • 13:45 13:50
      Data Management 5m
      Speaker: Armen Vartapetian (University of Texas at Arlington (US))
    • 13:50 13:55
      Data transfers 5m
      Speaker: Hironori Ito (Brookhaven National Laboratory (US))
    • 13:55 14:00
      Networks 5m
      Speaker: Dr Shawn McKee (University of Michigan ATLAS Group)
    • 14:00 14:05
      FAX and Xrootd Caching 5m
      Speakers: Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Andrew Hanushevsky, Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER), Ilija Vukotic (University of Chicago (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 14:25 15:25
      Site Reports
      • 14:25
        BNL 5m
        Speaker: Eric Christian Lancon (CEA/IRFU,Centre d'etude de Saclay Gif-sur-Yvette (FR))
      • 14:30
        AGLT2 5m
        Speakers: Robert Ball (University of Michigan (US)), Dr Shawn McKee (University of Michigan ATLAS Group)

        All ordered equipment has arrived at the MSU site with the exception of a Juniper EX9208 network card.  At the UM site, 2 MD3460 are still in transit, but are expected to arrive today (delivered just prior to this meeting).

        At MSU all equipment has been racked, but only a single MD3060e has so far been brought online.  At UM, the storage situation is the same, but in addition 10 of the R630 are in production with another 2 held back for testing.  An additional 8 R630 are built and nearing production readiness, and the last 4 are being worked on today.

        In the rush to get US-wide order templates ready the R730xd template order left out the SAS 12Gbps HBA cards (2).  Dell has quoted these for us at a cost of $135.00 each, and we have received that special order for our two R730xd.  One of these R730xd has been built and configured, modulo not yet having the attached storage on hand to install.

        Overall production has been smooth since the replacement of some problematic cat7 cables in our VMWare infrastructure two weeks back.

        On Friday at 10am there will be an annual power test affecting the MSU server room.  WNs will be idled and shut down in advance of this.  All dCache and server infrastructure systems are on UPS that should weather the storm as designed. 

         

      • 14:35
        MWT2 5m
        Speakers: David Lesny (Univ. Illinois at Urbana-Champaign (US)), Lincoln Bryant (University of Chicago (US))

        MWT2-IU - 24 compute nodes received, racked.

        MWT2-UIUC: 24 nodes arrived, to be racked soon (Dave on vacation)

        Update on RBT and FY16 procurement at MWT2-UC: 

        Nearly all equipment has arrived and has been racked.  Now being cabled.

        (Exception is analytics cluster nodes)

        R730xd head nodes 7
        MD3460 7
        MD3060e 7
        R630 compute nodes 48
           
           
        Cluster services  
        dCache head node 1
        Hypervisor nodes 4
        Hypervisor head node 1
        PerfSonar node (large SciDMZ) 1
        Analytics data nodes 5
        Analytics head nodes 3

         

        Infrastructure

        • UPS servicing (100kW, 40kW) - all batteries replaced
        • Electrical work for new equipment complete
        • New Juniper switching modules ordered (but there are ~ 25 day lead times, unfortunately).  

         

         

      • 14:40
        NET2 5m
        Speaker: Prof. Saul Youssef (Boston University (US))

        1) The NESE project has been approved by the NSF.   This is a $4M project to create a regional Ceph cluster at MGHPCC with enough bandwidth to be used for NET2 main storage.

        2) P.O.s out for FY16 and replacement funds; DELL hardware on the way, not yet arrived.

        3) Timeouts from McGill to NET2 occasionally still come and go.  Some WAN person has said they would look at it, but we're keeping the ticket open.

        4) HTCONDOR-CE pilots running successfully at BU, HU running smoothly also.  Will switch over probably within a few more days.

        5) Harvard is running fine, but the SAM jobs that are used to compute availability are hanging.   After working through a couple of issues with Marian, the only remaining issue is likely to be a firewall problem at Harvard.  The Harvard guys are resolving that.

        6) We may have a short outage to switch our WAN traffic to new Cisco equipment at NoX.   This is the next step we need to do to join LHCONE.

        7) I've been meaning to help Bob with his schedconfig stuff, but haven't got to that yet.

        8) Both sites have been full with smooth operations up to the global drop on Sunday.

      • 14:45
        SWT2-OU 5m
        Speaker: Dr Horst Severini (University of Oklahoma (US))
      • 14:50
        SWT2-UTA 5m
        Speaker: Patrick Mcguigan (University of Texas at Arlington (US))
      • 14:55
        WT2 5m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    • 15:25 15:30
      AOB 5m