ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

https://cern.zoom.us/j/98434450232

Password protected (same as (new) OPs Mtg)

Videoconference
ATLAS UK Cloud Support
Zoom Meeting ID
98434450232
Host
James William Walder
Useful links
Join via phone
Zoom URL

● Outstanding tickets

  • Outstanding tickets

    • 155576 TEAM atlas UKI-SOUTHGRID-RALPP less urgent NGI_UK in progress 2022-01-19 15:48:00 UKI-SOUTHGRID-RALPP frontier squid degraded EGI

      • Reboot of kernel; to be monitored, then close ticket
    • 155460 USER atlas UKI-SOUTHGRID-CAM-HEP less urgent NGI_UK in progress 2022-01-20 14:51:00 Failovers from Cambridge to CERN backup proxy EGI

      • Discussion ongoing on how to monitor the squid in an exception
    • 155430 TEAM atlas UKI-SCOTGRID-ECDF less urgent NGI_UK in progress 2022-01-13 14:27:00 UKI-SCOTGRID-ECDF transfer and deletion errors EGI

      • Awaiting ECDF site outcome of rebuilding the disk server
    • 155141 TEAM atlas UKI-LT2-Brunel less urgent NGI_UK in progress 2022-01-19 17:40:00 Transfers from UKI-LT2-Brunel fail with “Internal Server Error” EGI

      • Needs to be running a consistency checking
    • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2022-01-15 14:43:00 UKI-LT2-QMUL SOURCE transfer failures: [13] Result (Neon): SSL handshake failed EGI

      • JW to try and update; sporadic failures continue
    • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-12-08 12:35:00 DPM storage ACL configuration EGI

      • Lower priority
    • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK in progress 2022-01-19 22:13:00 RAL Echo Davs developments EGI

      • Moved to in progress; closed other tickets, and directed here, where appropriate.
    • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-12-01 15:37:00 HTTPS on RAL CTA EGI

      • likely to need to get davs going before this. Noted that both direct Xroot transfer from T0 and multihop will be needed

● CPU

  • CPU

    • General Rucio issues leading to repeated HC downtimes from last night and ongoing

      • 503 errors when quering rucio database and some networking issues appear related
    • RAL

      • HC issues not setting site online over weekend; improved since
    • Northgrid

      • NTR
    • London

      • NTR; some continuing QMUL volatility
    • SouthGrid

      • Sussex running increased slightly; possibly due to Site tweaks
    • Scotgrid

      • To look at Glasgow

  •  


 


● Ongoing Items

  • TPC with http

    • Several ongoing activities started and tuning
  • Storageless Site test (Oxford)

    • To understand VP, for perhaps BHAM and Ox
    • Job mix updated to push more IO to Oxford, and through the cache
  • LANCS Storage migration

    • Working on OSDs; possibility for next week

 

 


● News round-table

- Fairwell to Patrick

There are minutes attached to this event. Show them.
    • 10:00 AM 10:20 AM
      Status 20m
      • Outstanding tickets 10m
        • Outstanding tickets

          • 155576 TEAM atlas UKI-SOUTHGRID-RALPP less urgent NGI_UK in progress 2022-01-19 15:48:00 UKI-SOUTHGRID-RALPP frontier squid degraded EGI

            • Reboot of kernel; to be monitored, then close ticket
          • 155460 USER atlas UKI-SOUTHGRID-CAM-HEP less urgent NGI_UK in progress 2022-01-20 14:51:00 Failovers from Cambridge to CERN backup proxy EGI

            • Discussion ongoing on how to monitor the squid in an exception
          • 155430 TEAM atlas UKI-SCOTGRID-ECDF less urgent NGI_UK in progress 2022-01-13 14:27:00 UKI-SCOTGRID-ECDF transfer and deletion errors EGI

            • Awaiting ECDF site outcome of rebuilding the disk server
          • 155141 TEAM atlas UKI-LT2-Brunel less urgent NGI_UK in progress 2022-01-19 17:40:00 Transfers from UKI-LT2-Brunel fail with “Internal Server Error” EGI

            • Needs to be running a consistency checking
          • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2022-01-15 14:43:00 UKI-LT2-QMUL SOURCE transfer failures: [13] Result (Neon): SSL handshake failed EGI

            • JW to try and update; sporadic failures continue
          • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-12-08 12:35:00 DPM storage ACL configuration EGI

            • Lower priority
          • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK in progress 2022-01-19 22:13:00 RAL Echo Davs developments EGI

            • Moved to in progress; closed other tickets, and directed here, where appropriate.
          • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-12-01 15:37:00 HTTPS on RAL CTA EGI

            • likely to need to get davs going before this. Noted that both direct Xroot transfer from T0 and multihop will be needed
      • CPU 5m

        New link for the site-oriented dashboard

        • CPU

          • General Rucio issues leading to repeated HC downtimes from last night and ongoing

            • 503 errors when quering rucio database and some networking issues appear related
          • RAL

            • HC issues not setting site online over weekend; improved since
          • Northgrid

            • NTR
          • London

            • NTR; some continuing QMUL volatility
          • SouthGrid

            • Sussex running increased slightly; possibly due to Site tweaks
          • Scotgrid

            • To look at Glasgow

        •  


         

      • Other new issues / tasks 5m

        Re-enabling GPU queue for QMUL

        Multihop failures RAL; no overwrite of failed intermediate steps
        / request XrootD devs to have 'autorm' feature for http-TPC.

    • 10:20 AM 10:40 AM
      Ongoing Items 20m
      • TPC with http

        • Several ongoing activities started and tuning
      • Storageless Site test (Oxford)

        • To understand VP, for perhaps BHAM and Ox
        • Job mix updated to push more IO to Oxford, and through the cache
      • LANCS Storage migration

        • Working on OSDs; possibility for next week

       

       

    • 10:40 AM 10:50 AM
      News round-table 10m

      - Fairwell to Patrick

    • 10:50 AM 11:00 AM