US ATLAS Computing Integration and Operations

US/Eastern
Description
Notes and other material available in the US ATLAS Integration Program Twiki
    • 13:00 13:15
      Top of the Meeting 15m
      Speakers: Eric Christian Lancon (BNL), Robert William Gardner Jr (University of Chicago (US))
    • 13:15 13:20
      ADC news and issues 5m
      Speakers: Robert Ball (University of Michigan (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
    • 13:20 13:30
      Production 10m
      Speaker: Mark Sosebee (University of Texas at Arlington (US))
    • 13:30 13:35
      Data Management 5m
      Speaker: Armen Vartapetian (University of Texas at Arlington (US))
    • 13:35 13:40
      Data transfers 5m
      Speaker: Hironori Ito (Brookhaven National Laboratory (US))
    • 13:40 13:45
      Networks 5m
      Speaker: Dr Shawn McKee (University of Michigan ATLAS Group)

      Networking is part of the "Computing Models" component of the Community White Paper (CWP) and was discussed at the meeting in Annecy last week.   Top level summary:  the experiments (including ATLAS) seem to be happy with the current and planned work in networking.   Near-term focus should be on increasing visibility into our networks and better enabling identification and localization of network problems.   Longer term, it is desirable to work on network programmability and how the experiments may be able to benefit from software control/interaction with the network.  

      There are ongoing efforts to create publicly accessible dashboards showing network metrics, FTS data and transfer information from the LHCOPN and LHCONE networks.   Already have a couple dashboards accessible at CERN:  http://monit-grafana-open.cern.ch/?orgId=16

      We need a campaign to clean up the perfSONAR instances, fixing problems with their updates and firewalls. 

    • 13:45 13:50
      FAX and Xrootd Caching 5m
      Speakers: Andrew Bohdan Hanushevsky (SLAC National Accelerator Laboratory (US)), Andrew Hanushevsky, Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER), Ilija Vukotic (University of Chicago (US)), Wei Yang (SLAC National Accelerator Laboratory (US))

      Addressing fine-grained authorization based on VO attributes.

      Two rare bugs in Xrootd client's handling of metalink

      Helping RAL to use xrootd proxy cache in front of their CEPH: Need a N2N to handle mapping between Object ID and storage path. Dealing with writing pass through in the cache, etc. - Work started.

    • 13:50 14:00
      OS performances testing 10m
      Speaker: Doug Benjamin (Duke University (US))
    • 14:00 14:15
      HPCs integration 15m
      Speaker: Taylor Childers (Argonne National Laboratory (US))
    • 14:15 14:25
      Singularity 10m
      Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    • 14:25 16:00
      Site Reports
      • 14:25
        BNL 5m
        Speaker: Xin Zhao (Brookhaven National Laboratory (US))
        • No major issues
        • GUMS hostcert expired last week, now switched to use puppet to manage certs automatically
        • Stack Clash vulnerability --- interactive nodes patched, will patch batch nodes soon (this week) using Ksplice uptrack

         

         

      • 14:30
        AGLT2 5m
        Speakers: Robert Ball (University of Michigan (US)), Dr Shawn McKee (University of Michigan ATLAS Group)

        We have no known issues.

        The MD3460/MD3060e disks did not appear to like being powered off for 3 days during the UM outage, there were 3 disk failures following power up.  This is just something to keep in mind. 

        Auto-vacuum is not working on our dCache pgsql instances.  We are still not sure that we have parameters correctly adjusted.  The failure to vacuum caused a weekend outage that badly affected our June availability numbers.  We have some advice from dCache support on how to modify the parameters, and will check to see if it is working, otherwise we'll cron up some discrete vacuuming.

         

      • 14:35
        MWT2 5m
        Speakers: David Lesny (Univ. Illinois at Urbana-Champaign (US)), Lincoln Bryant (University of Chicago (US))

        Site is full of jobs and operating well

        No major issues over the last two weeks

        All nodes using upgraded kernel to fix the Stack Clash vulnerablity

        Site upgraded to OSG 3.3.25. Looking into going to OSG 3.4

      • 14:40
        NET2 5m
        Speaker: Prof. Saul Youssef (Boston University (US))
      • 14:45
        SWT2-OU 5m
        Speaker: Dr Horst Severini (University of Oklahoma (US))

        - nothing to report, all sites running well

         

      • 14:50
        SWT2-UTA 5m
        Speaker: Patrick Mcguigan (University of Texas at Arlington (US))

        No problems to report at UTA_SWT2

         

        SWT2_CPB:

        Availability numbers for site suffered in June.  The problem has been traced to the AGIS configuration that SAM relies on.  We are in the process of fixing this.

      • 14:55
        WT2 5m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    • 16:00 16:05
      AOB 5m