US ATLAS Computing Integration and Operations

US/Eastern
Description
Notes and other material available in the US ATLAS Integration Program Twiki
    • 1
      Top of the Meeting
      Speakers: Eric Christian Lancon (BNL), Robert William Gardner Jr (University of Chicago (US))
    • 2
      ADC news and issues
      Speakers: Robert Ball (University of Michigan (US)), Wei Yang (SLAC National Accelerator Laboratory (US))

      bigpanda will shortly transform to https access, from http.  Typically then the CERN SSO will be used to allow access, but JSON can can still be scraped via http.  This transition should take place within the next 2 weeks.  For further details see:
      https://indico.cern.ch/event/642827/contributions/2608310/attachments/1490643/2317018/httpS_for_bigpanda_monitoring.pdf

      Wei is leading an effort to deploy singularity usage in the US cloud.  This is a voluntary effort, where the underlying WN OS should be centos7.  For issues and procedures information, see:
      https://twiki.cern.ch/twiki/bin/view/AtlasComputing/ContainersInUScloud

      From Andrej Filipcic, a brief summary:

      - pilotcode supporting singularity is now in production, we start 
      testing targeted sites (RAL, Manchester, ...) with it
      - singularity could also be started in the wrapper, but since we have 
      the pilotcode ready, we try to use that
      - for now we continue to use catchall, when we get more experience with 
      site specifics, we think about what should be moved to site 
      configuration (singularity.conf), what in AGIS, and if we can simplify 
      things like using scratchdisk by relocating the bind mounts
      - by September we should test most of T1s, some big T2s, so we have some 
      input for the containers task force
      - testing the containers should be done in a similar way as it was done 
      with the new mover
      - we follow up with hammercloud team to implement singularity HC testing
      - for performance reasons, we should migrate to unpacked chroot. The img 
      and the dir should be kept in sync, we will need both (eg img for HPC)
      - we should concentrate on centos7 sites. later on we should also test 
      the centos7 images.
      - at pre-gdb, there was a discussion whether to go with non-suid 
      singularity deployment. We also need to evaluate if this is feasible for 
      ATLAS or not. Some sites might want to use it in the future. (it's not 
      even available at this point in centos7, maybe with RH7.4)
    • 3
      Production
      Speaker: Mark Sosebee (University of Texas at Arlington (US))
    • 4
      Data Management
      Speaker: Armen Vartapetian (University of Texas at Arlington (US))
    • 5
      Data transfers
      Speaker: Hironori Ito (Brookhaven National Laboratory (US))
    • 6
      Containers
      Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    • 7
      Networks
      Speaker: Dr Shawn McKee (University of Michigan ATLAS Group)
    • 8
      FAX and Xrootd Caching
      Speakers: Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Andrew Hanushevsky, Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER), Ilija Vukotic (University of Chicago (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 9
      HPCs integration
      Speaker: Taylor Childers (Argonne National Laboratory (US))
    • Site Reports
      • 10
        BNL
        Speaker: Xin Zhao (Brookhaven National Laboratory (US))

        All services running fine, no major issues

      • 11
        AGLT2
        Speakers: Robert Ball (University of Michigan (US)), Dr Shawn McKee (University of Michigan ATLAS Group)

        Retired two old MD1200 shelves, one each at UM and MSU, for 2TB disk spares.  All storage reporting has been updated accordingly.

        Beyond this all systems are normal, with nothing of significance to report.

         

      • 12
        MWT2
        Speakers: David Lesny (Univ. Illinois at Urbana-Champaign (US)), Lincoln Bryant (University of Chicago (US))
        • Stable operations
        • ATLAS Connect login server had CVMFS prob - solved by Dave, managed by Greg
        • Signed new service contract for preventative maintenance of CRAC units at UC
        • Attempted upgrade to new Plexxi-based SciDMZ but reverted to original as Juniper routing engine failed.  Will attempt again soon. 

      • 13
        NET2
        Speaker: Prof. Saul Youssef (Boston University (US))

        Issues:

        1) Some old CAs fail to authenticate to Bestman.  There is a fix from OSG (updated JGlobus) that we volunteered to test.

        2) Harvard was down for a day to migrate their puppet infrastructure.

        3) Downtime for 2) failed to propagate to AGIS for some reason.  OSG guys are looking into it.

        4) Lots of NESE activity.  

        5) GPFS client issue needs to be resolved before we can go to RH7 (& singularity).

        6) Going to 6 hour reporting for space token sizes for DDM deletion issue that Armen noticed.

        7) Smooth running otherwise.

         

      • 14
        SWT2-OU
        Speaker: Dr Horst Severini (University of Oklahoma (US))

        - nothing to report, all sites running well

         

      • 15
        SWT2-UTA
        Speaker: Patrick Mcguigan (University of Texas at Arlington (US))
      • 16
        WT2
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    • 17
      AOB