US ATLAS Tier 2 Technical

US/Eastern
Fred Luehring (Indiana University (US)), Robert William Gardner Jr (University of Chicago (US)), Shawn Mc Kee (University of Michigan (US))
Description

Meeting to discuss technical issues at the US ATLAS Tier 2 site. The primary audience is the US Tier 2 site administrators but anyone interested is welcome to attend.

Zoom Meeting ID
67453565657
Host
Fred Luehring
Useful links
Join via phone
Zoom URL
    • 10:00 10:10
      Top of the meeting discussion 10m
      Speakers: Fred Luehring (Indiana University (US)), Robert William Gardner Jr (University of Chicago (US)), Shawn Mc Kee (University of Michigan (US))
      • Good running in the last week - pretty rocky before that because CERN ran out of work during a holiday period.
        • AGLT2 and MWT2 affected by Varnish issue that causes the production system to mark successful jobs as failed.
      • Aiden Rosberg started at IU on May 6 as 50% MWT2 sysadmin and 50% WBS 2.3 R&D.
      • Judith will discuss her plans for providing training for Foreman and Puppet.
        • Please join the new provisioning channel at the BNL MatterMost.
      • Keep on provisioning EL9....
      • Network issues are with Taiwan Tier 2 are much improved.
        • Will put the TW-FTT (ASGC) online.
    • 10:10 10:20
      TW-FTT 10m
      Speakers: Felix.hung-te Lee (Academia Sinica (TW)), Han-Wei Yen

      1. During 19-27 May, the average inbound and outbound data is about 7TB/day. The maximal data transferred a day for inbound and outbound are 13TB and 24TB respectively. However, the bandwidth limits of 1Gbps for inbound and 3Gbps for outbound appeared. The inbound data transmission failure rate increased when the transmission workload is higher. 

      2. Local network service provider has been checking the network capacity/quality for us. Still waiting for their report. 

    • 10:20 10:30
      AGLT2 10m
      Speakers: Philippe Laurens (Michigan State University (US)), Shawn Mc Kee (University of Michigan (US)), Dr Wendy Wu (University of Michigan)

       

      • The UPS repair on May 1st went well, and we managed to get the site back online within 3 hours. 
      • New ticket about more cvmfs issues. 
        Caught by wrapper not being able to get pilot from cvmfs.
      • Received small order of new equipment at UM
        Found out R760xd2 is about 5 inches longer than R740xd2
      • 20 UM WNs on EL9
    • 10:30 10:40
      MWT2 10m
      Speakers: David Jordan (University of Chicago (US)), Farnaz Golnaraghi (University of Chicago (US)), Fengping Hu (University of Chicago (US)), Fred Luehring (Indiana University (US)), Judith Lorraine Stephen (University of Chicago (US))
      • Had a GGUS ticket regarding a degraded squid.
      • Tried CentOS 8/ Stream 8 to AlmaLinux 8 conversion (AlmaLinux receives security updates until 2029) and it was successful.
      • IU management hypervisors are upgraded to EL9.
      • UC management hypervisors are being upgraded to EL9.
      • 86% of the UC Storage is upgraded to EL9.
      • All IU and UC workers are upgraded to EL9.
    • 10:40 10:50
      NET2 10m
      Speakers: Eduardo Bach (University of Massachusetts (US)), Rafael Coelho Lopes De Sa (University of Massachusetts (US)), William Axel Leight (University of Massachusetts Amherst)
    • 10:50 11:00
      SWT2 10m
      Speakers: Horst Severini (University of Oklahoma (US)), Kaushik De (University of Texas at Arlington (US)), Mark Sosebee (University of Texas at Arlington (US)), Zachary Thomas Booth (University of Texas at Arlington (US))

      CPB:

      • Testing a fourth DTN is almost done (delayed by waiting for production role to be enabled for Zach) - should be ready to bring it online this week.
      • SWT2_GOOGLE_ARM PanDA queue getting utilized as jobs become available.
      • The LSM is no longer being used for SWT2_CPB. Possibly still some additional tweaks / optimizations to do.
      • RHEL9 migration: We have a few servers set up so far for testing. What would serve as a cluster frontend node is currently running Alma9 with Puppet and Foreman installed and functional. We are currently working on understanding and testing these new systems.
      • Student working on the LOCALGROUPDISK "atime" and alma9 projects.
      • GGUS tickets:
        https://ggus.eu/?mode=ticket_info&ticket_id=166754 (source file transfer errors to three clouds - destination transfers are fine)
        https://ggus.eu/?mode=ticket_info&ticket_id=164771 (support for storage tokens)
        • Both of these issues are being worked on.

      OU:

      • Running well, no major issues.
      • Outbound transfer failures to some IT and ES sites seem to be http/2 case incompatibility issues between xrootd and storm; see GGUS tickets #166759 and #166754. Andy is working on that.
      • OSCER will be fully upgraded from EL7 to EL9 by June 30.