US ATLAS Computing Integration and Operations

US/Eastern
virtual room (your office)

virtual room

your office

Description

 

 

    • 13:00 13:15
      Top of the Meeting 15m
      Speaker: Robert William Gardner Jr (University of Chicago (US))
    • 13:15 13:25
      Capacity News: Procurements & Retirements 10m
    • 13:25 13:35
      Production 10m
      Speaker: Mark Sosebee (University of Texas at Arlington (US))
    • 13:35 13:40
      Data Management 5m
      Speaker: Armen Vartapetian (University of Texas at Arlington (US))
    • 13:40 13:45
      Data transfers 5m
      Speaker: Hironori Ito (Brookhaven National Laboratory (US))
    • 13:45 13:50
      Networks 5m
      Speaker: Dr Shawn McKee (University of Michigan ATLAS Group)
    • 13:50 13:55
      FAX and Xrootd Caching 5m
      Speakers: Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER), Andrew Hanushevsky, Ilija Vukotic (University of Chicago (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 14:15 15:15
      Site Reports
      • 14:15
        BNL 5m
        Speaker: Michael Ernst

        Smooth operations over the course of the last 2 weeks

        - Utilization at capacity, mainly MCORE jobs

        - first round of disk procurement in progress, awaiting vendor bids

        Continue working on the OpenStack setup at the facility

        - Had a series of deep technical meetings with RedHat cloud experts who are very interested in working with/learning from us   

      • 14:20
        AGLT2 5m
        Speakers: Robert Ball (University of Michigan (US)), Dr Shawn McKee (University of Michigan ATLAS Group)

        Very little out of the ordinary has occurred in the past 2 weeks.  We are hoping to have the MSU R630 online yet today, as the MSU switch configuration was finally debugged at the end of last week, and all configurations for the NICs are now in place as of early this morning.

        We will take an "at risk" SE downtime on Friday to upgrade the dCache rpm at our site.  This will also give us an opportunity to examine an msufs02 NIC that has been bouncing since early in January.  As it is part of an active/active bond it is not critical, but it is both limiting and annoying.

         

         

      • 14:25
        MWT2 5m
        Speakers: David Lesny (Univ. Illinois at Urbana-Champaign (US)), Lincoln Bryant (University of Chicago (US))

        UIUC monthly downtime.

        UPS event at UC this week - being addressed by Lincoln.

        Installation of new servers at IU at the end of the week.

        Working on 'smart data center' tool suite to move towards containerized services.

        Otherwise things have been smooth.

         

      • 14:30
        NET2 5m
        Speaker: Prof. Saul Youssef (Boston University (US))

        It has been a busy fortnight at NET2:

        o DELL equipment has arrived at MGHPCC except for 1 switch

        o Will borrow a switch and install everything next week, operations soon after.

        o Augustine went to Ceph school as I expect this to likely be in our future.

        o Large transfers of RAW data from NET2 to the other US T2s seems to be done.  This is in preparation for re-reprocessing which may (as I understand it) begin as soon as the weekend.

        o We have been having occasional problems with FAX where up to ~1000 files are open on the FAX node, linux load ~ 1000 and errors start to appear in the FAX logs.  

        o There is a problem with availability testing at Harvard that we have to track down.

        o We still need to transition to HTCondor.

        o Working with MOC team (Massachusetts Open Cloud) to add NET2 worker nodes from a pool of MOC resources.

        o Working with BU networking on a WAN upgrade plan.

        o Smooth operations otherwise.

        - Saul

      • 14:35
        SWT2-OU 5m
        Speaker: Dr Horst Severini (University of Oklahoma (US))

        - not much to report, all systems working well

        - Lucille taking a 2 day downtime for OS updates

        - new OSCER cluster (Schooner) coming online, testing OSG gatekeeper in the next week or two

        - started discussions with OU and OneNet about OU and LU LHCONE connectivity via ESnet

         

      • 14:40
        SWT2-UTA 5m
        Speaker: Patrick Mcguigan (University of Texas at Arlington (US))
        • Continuing with adding memory to WN / Retiring ancient machines at UTA_SWT2
        • Awaiting Turn on of LHCOne from our networking group
        • Need to finalize WN purchase to close out FY15
        • We had an incident where xrootd daemon on storage host died unexpectedly at SWT2_CPB; first occurrence in quite some time
        • Capacity spreadsheet WILL be fixed.
      • 14:45
        WT2 5m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))

        Spent all FY15 fund.

        1. 11 CPU nodes arrived. Working on using OpenStack to manage them.
        2. Slow ramp up when a VM is rebooted. Testing different VM configurations to identify the problem
        3. Still waiting for storage and another 11 CPU nodes to arrive.

        Power outage on 1/26.

         

    • 15:15 15:20
      AOB 5m