US ATLAS Computing Integration and Operations

US/Eastern
Description
Notes and other material available in the US ATLAS Integration Program Twiki
    • 13:00 13:05
      Top of the Meeting 5m
      Speakers: Eric Christian Lancon (BNL), Robert William Gardner Jr (University of Chicago (US))
    • 13:05 13:15
      Singularity / centos 7 deployment in the US cloud 10m
      Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    • 13:15 13:20
      ADC news and issues 5m
      Speakers: Robert Ball (University of Michigan (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 13:20 13:25
      Production 5m
      Speaker: Mark Sosebee (University of Texas at Arlington (US))
    • 13:30 13:35
      Data Management 5m
      Speaker: Armen Vartapetian (University of Texas at Arlington (US))
    • 13:35 13:40
      Data transfers 5m
      Speaker: Hironori Ito (Brookhaven National Laboratory (US))
    • 13:40 13:45
      Networks 5m
      Speaker: Dr Shawn McKee (University of Michigan ATLAS Group)
    • 13:45 13:50
      FAX and Xrootd Caching 5m
      Speakers: Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Andrew Hanushevsky, Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER), Ilija Vukotic (University of Chicago (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 13:50 13:55
      HPCs integration 5m
      Speaker: Taylor Childers (Argonne National Laboratory (US))

      Not much going on in HPC development at the moment. Ongoing activities:

      • NERSC just returned from quarterly maintenance from Friday - Monday.
        • Allocations have been exhausted, working on returning to standard grid setup (brokeron mode without dedicated tasks) to run at the lower level expected when running in backfill queues at NERSC. 
        • New DVMFS setup at NERSC which provides CVMFS natively on worker nodes was deployed during shutdown. Early tests look promising.
      • Harvester Globus Online bulk transfer plugins for Harvester in development to combine multiple files into one transfer request.
        • Current tests are showing instabilities on the BNL Globus endpoint, which are currently being investigated.
        • ALCF has been offline while these tests continue.
      • Titan continues to run in backfill mode (ORNL_Titan_MCORE)
        • Workaround for IO issues seems to be working well
        • Currently averaging about 25k jobs per day 50 events per job (>1M events/day)
      • Titan has begun running jobs against the ALCC allocation (Titan_long_MCORE)
        • jobs use 1000 events/job with a 12hr run time.
        • typical queue times are < 48 hrs
      • Tests of Harvester + mini-pilot will soon begin on Titan. This will bring the ALCF/OLCF sites into alignment with the same solutions. Then need to do the same at NERSC.
    • 13:55 14:30
      Site Reports
      • 13:55
        BNL 5m
        Speaker: Xin Zhao (Brookhaven National Laboratory (US))
        • services have been running fine recently. 
      • 14:00
        AGLT2 5m
        Speakers: Robert Ball (University of Michigan (US)), Dr Shawn McKee (University of Michigan ATLAS Group)

        All dCache pool servers are now running SL7.3.  Open vSwitch is also installed and running on the public NICs.

        A cooling issue at the MSU server room is under repair today.  We have been operating a reduced WN capacity for the past 2-3 weeks while assessing the issue and ordering parts.  We hope to be back to full capacity within the day.

        Preparation for SL7 on the WNs continues.

         

      • 14:05
        MWT2 5m
        Speakers: David Lesny (Univ. Illinois at Urbana-Champaign (US)), Judith Lorraine Stephen (University of Chicago (US)), Lincoln Bryant (University of Chicago (US))

        Site is performing well - Full of jobs the last two weeks

      • 14:10
        NET2 5m
        Speaker: Prof. Saul Youssef (Boston University (US))
      • 14:15
        SWT2-OU 5m
        Speaker: Dr Horst Severini (University of Oklahoma (US))

        - not much to report, all sites running well

        - making slow progress with Singularity, but still problems with bind-mount, it seems; continue to investigate

        - have started running AES jobs on new OSCER hep_killable queue, which overlays all OSCER owned nodes, and those seem to run well

         

      • 14:20
        SWT2-UTA 5m
        Speaker: Patrick Mcguigan (University of Texas at Arlington (US))
      • 14:25
        WT2 5m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    • 14:30 14:35
      AOB 5m