US ATLAS Computing Facility

US/Eastern
    • 13:00 13:10
      WBS 2.3 Facility Management News 10m
      Speakers: Eric Christian Lancon (BNL), Robert William Gardner Jr (University of Chicago (US))

      Hope everyone is enjoying their summer!  Eric on vacation this week.

       

       

    • 13:20 13:40
      Topical Report
    • 13:40 14:25
      US Cloud Status
      • 13:40
        US Cloud Operations Summary 5m
        Speaker: Mark Sosebee (University of Texas at Arlington (US))
      • 13:45
        BNL 5m
        Speaker: Xin Zhao (Brookhaven National Laboratory (US))
        • smooth operation in general 
        • PQs migrated to pilot2/singularity
        • another round of rolling upgrade of farm nodes to start soon, for kernel upgrade on WNs
      • 13:50
        AGLT2 5m
        Speakers: Philippe Laurens (Michigan State University (US)), Dr Shawn McKee (University of Michigan ATLAS Group), Prof. Wenjing Wu (Computer Center, IHEP, CAS)

        Incidents

        1) the gatekeeper which receives the HC jobs stopped working , we did not have monitor for the condor-ce service, so did  not realize it right away.

        2) a big portion of analysis jobs fail, the site gets a  ggus ticket. We found out one script we use to clean up the zombie files left by killed job by HTcondor accidentally deletes the work dir of running jobs too. This was a bug in the script when it switches to pilot2. We fixed the bug in our script. 

        Hardware:

        Sorted out the storage servers we could retire from Tier2 according to the age of the hardware and also the number of failures on the hardware. Figure out the items for the purchase.

         

        Service

        Setup a new replication server for dCache database

         

         

      • 13:55
        MWT2 5m
        Speakers: Judith Lorraine Stephen (University of Chicago (US)), Lincoln Bryant (University of Chicago (US))

        Upgraded frontier-squid site-wide to 4.8-1.1.

        Discussed MWT2 retirement and purchasing plans.

        Still working on getting the dCache nodes dual-stacked. Needed to get external IPv6 PTR records set up from UC ITS. This was completed yesterday.

        Setting up temporary IU and UIUC SLATE nodes to test SLATE frontier-squid configuration (see Lincoln's talk).

      • 14:00
        NET2 5m
        Speaker: Prof. Saul Youssef (Boston University (US))

         

        We've reinstalled the NET2 squid with new software for security and to get rid of a low level GGUS ticket where we have too many failovers.  Seems to have worked.

        Lot's of work re: migrating to NESE.  Gridftp Docker container with Wei's Gridftp (with Adler callout) works.  Lots of work and testing still to do.  ADC informed.  Will have two DATADISK space tokens during transition.  

        Smooth operations and full site otherwise.

        We have two open GGUS tickets.  Both can be closed. 

      • 14:05
        SWT2 5m
        Speakers: Dr Horst Severini (University of Oklahoma (US)), Kaushik De (University of Texas at Arlington (US)), Mark Sosebee (University of Texas at Arlington (US)), Patrick Mcguigan (University of Texas at Arlington (US))

        OU:

        - all working well

        - in the process of reconfiguring OU xrootd storage for automatic space group assignment and http-over-xrootd

         

        UTA:

        1) Migration of UTA_SWT2 to CentOS7 completed

        2) All equipment from recent hardware purchase received - planning deployment schedule

        3) Systems running well post-CentOS7 upgrades

      • 14:10
        HPC Operations 5m
        Speaker: Doug Benjamin (Duke University (US))
      • 14:15
        Analysis Facilities - SLAC 5m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 14:20
        Analysis Facilities - BNL 5m
        Speaker: William Strecker-Kellogg (Brookhaven National Lab)
    • 14:25 14:30
      AOB 5m