Indico celebrates its 20th anniversary! Check our blog post for more information!

ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

https://cern.zoom.us/j/98434450232

Password protected (same as (new) OPs Mtg)

Videoconference
ATLAS UK Cloud Support
Zoom Meeting ID
98434450232
Host
James William Walder
Useful links
Join via phone
Zoom URL

● Outstanding tickets

    • 154256 TEAM atlas UKI-SCOTGRID-GLASGOW less urgent NGI_UK in progress 2021-10-04 09:11:00 UKI-SCOTGRID-GLASGOW_CEPH has high failed jobs due to “File transfer timed out during stage-in” WLCG

      • Possibly due to the Xcache, not the ceph data movement
    • 154235 TEAM atlas RAL-LCG2 less urgent NGI_UK in progress 2021-10-07 07:57:00 RAL-LCG2 transfers fail with "TRANSFER globus_ftp_client: the server responded with an error 451 Gen… WLCG

      • Nikhef went into downtime; still to be resolved and understood if netowrking issue.
    • 154200 TEAM atlas RAL-LCG2 less urgent NGI_UK in progress 2021-10-06 18:33:00 RAL-LCG2 deletion issues with error “The requested service is not available at the moment” WLCG

      • Ongoing
    • 153550 TEAM atlas UKI-SOUTHGRID-RALPP less urgent NGI_UK in progress 2021-09-29 15:32:00 Transfer failure at UKI-SOUTHGRID-RALPP with “Failed to select pool: All pools are full\n” error WLCG

      • Resolved by updating the uri to include the protocol and port
    • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK in progress 2021-10-05 15:50:00 HTTPS on RAL CTA WLCG

      • Information updated

● CPU

    • RAL

      • Usual switching of jobs
    • Northgrid

      • N/a
    • London

      • QMUL: ssl timeouts; WebDav on runs on one of the two cpu sockets; being looked into
    • SouthGrid

      • Sussex with sigXcpu errors
    • Scotgrid

      • Impacted by the stage-in issues
  •  


 


● Ongoing Items

  • CentOS7 - Sussex

    • SIGXCPU error in pilotlog
    • Lot’s of swap; consider to change it
    •  
  • TPC with http

    • Lots to look at from the post-mortem; RAL import from CERN not really seeing writes from the Data Challenge
  • Storageless Site test (Oxford)

    • NA

 

 


● News round-table

  • Alessandra

    • NTR
  • Dan

    • Refurbishment: now integrated into building refurb.
  • Gerard

    • NTR
  • Matt

    • Sent appologirs
  • Patrick

    • NTR
  • Sam

    • NTR
  • Vip

    • NTR

 

 

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
          • 154256 TEAM atlas UKI-SCOTGRID-GLASGOW less urgent NGI_UK in progress 2021-10-04 09:11:00 UKI-SCOTGRID-GLASGOW_CEPH has high failed jobs due to “File transfer timed out during stage-in” WLCG

            • Possibly due to the Xcache, not the ceph data movement
          • 154235 TEAM atlas RAL-LCG2 less urgent NGI_UK in progress 2021-10-07 07:57:00 RAL-LCG2 transfers fail with "TRANSFER globus_ftp_client: the server responded with an error 451 Gen… WLCG

            • Nikhef went into downtime; still to be resolved and understood if netowrking issue.
          • 154200 TEAM atlas RAL-LCG2 less urgent NGI_UK in progress 2021-10-06 18:33:00 RAL-LCG2 deletion issues with error “The requested service is not available at the moment” WLCG

            • Ongoing
          • 153550 TEAM atlas UKI-SOUTHGRID-RALPP less urgent NGI_UK in progress 2021-09-29 15:32:00 Transfer failure at UKI-SOUTHGRID-RALPP with “Failed to select pool: All pools are full\n” error WLCG

            • Resolved by updating the uri to include the protocol and port
          • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK in progress 2021-10-05 15:50:00 HTTPS on RAL CTA WLCG

            • Information updated
      • CPU 5m

        New link for the site-oriented dashboard

          • RAL

            • Usual switching of jobs
          • Northgrid

            • N/a
          • London

            • QMUL: ssl timeouts; WebDav on runs on one of the two cpu sockets; being looked into
          • SouthGrid

            • Sussex with sigXcpu errors
          • Scotgrid

            • Impacted by the stage-in issues
        •  


         

      • Other new issues / tasks 5m

        Renabling GPU queue for QMUL

        CTA@RAL (Antares):
        https://its.cern.ch/jira/browse/ATLDDMOPS-5573

    • 10:20 10:40
      Ongoing Items 20m
      • CentOS7 - Sussex

        • SIGXCPU error in pilotlog
        • Lot’s of swap; consider to change it
        •  
      • TPC with http

        • Lots to look at from the post-mortem; RAL import from CERN not really seeing writes from the Data Challenge
      • Storageless Site test (Oxford)

        • NA

       

       

    • 10:40 10:50
      News round-table 10m
      • Alessandra

        • NTR
      • Dan

        • Refurbishment: now integrated into building refurb.
      • Gerard

        • NTR
      • Matt

        • Sent appologirs
      • Patrick

        • NTR
      • Sam

        • NTR
      • Vip

        • NTR

       

       

    • 10:50 11:00