ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

https://cern.zoom.us/j/98434450232

Password protected (same as (new) OPs Mtg)

● Outstanding tickets

  • Outstanding tickets

    • 155473 TEAM atlas RAL-LCG2 less urgent NGI_UK in progress 2022-01-11 09:20:00 BU_ATLAS_Tier2 transfer and deletion errors EGI

      • IPV4 connectivity issues on new webdav aliased hosts
    • 155460 USER atlas UKI-SOUTHGRID-CAM-HEP less urgent NGI_UK in progress 2022-01-12 15:51:00 Failovers from Cambridge to CERN backup proxy EGI

      • Active discussions from site admins
    • 155430 TEAM atlas UKI-SCOTGRID-ECDF less urgent NGI_UK in progress 2022-01-12 16:37:00 UKI-SCOTGRID-ECDF transfer and deletion errors EGI

      • Data at risk, due to problems over Chrsitmas
    • 155141 TEAM atlas UKI-LT2-Brunel less urgent NGI_UK in progress 2021-12-24 08:39:00 Transfers from UKI-LT2-Brunel fail with “Internal Server Error” EGI

      • JW to progress to a solution
    • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2021-12-25 04:28:00 UKI-LT2-QMUL SOURCE transfer failures: [13] Result (Neon): SSL handshake failed EGI

      • Server fell over on Christmas day
      • Moving to adding more ‘oomph’, it’s not the highest priority item however
    • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-12-08 12:35:00 DPM storage ACL configuration EGI

      • other urgent issues are delaying this
    • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK on hold 2021-12-08 13:25:00 RAL Echo Davs developments EGI

      • New webdavs endpoint with new gateways created. Available for more aggressive optimisation tuning and improvements
    • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-12-01 15:37:00 HTTPS on RAL CTA EGI

      • Needs to be tested

● CPU

    • RAL

      • Remains low; some from job scheduling when there’s a large number of transfering FTS files in the queue.
      • Also due to contention from other VOs
    • Northgrid

      • Largely ok
    • London

      • Some brief issues with QMUL
    • SouthGrid

      • BHAM a few days outage (?), but back now.
      • Sussex; running well, but could be running more slots at the site
    • Scotgrid

      • Durham - cooling; power issue over Christmas. SRR not readable; leading to overfilling of the storage.
        • Once SRR accessible, jobs started running and data reduced to below the total.
      • Gla; disk controller appears to have died; Expected to be onlined shortly.
  •  


 


● Ongoing Items

  • TPC with http

    • Davs optimsisation at RAL to take priority with a new webdav alias available
  • Storageless Site test (Oxford)

    • Seeing TLS errors on the Xcache via xrootd; cache is passing through the data
  • LANCS Storage migration

    • JW to ensure endpoint is configured in CRIC
    • Site awainting one last swtich change to begin real testing

 

 


● News round-table

  • Alessandra

    • NTR
  • Dan

    • Storage for Atlas by end of months
    • Refurbishment remains some way off
  • Gerard

    • NTR
  • Matt

    • NTR
  • Patrick

    • NTR; Attempting to work out how to get the full number of slots to run at the site.
  • Peter

    • NTR

  • Sam

    • GLA now restarted.
  • Stephen

    • NTR
  • Vip

    • To arrange a discussion to track down Xcache problems

 

 

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • Outstanding tickets

          • 155473 TEAM atlas RAL-LCG2 less urgent NGI_UK in progress 2022-01-11 09:20:00 BU_ATLAS_Tier2 transfer and deletion errors EGI

            • IPV4 connectivity issues on new webdav aliased hosts
          • 155460 USER atlas UKI-SOUTHGRID-CAM-HEP less urgent NGI_UK in progress 2022-01-12 15:51:00 Failovers from Cambridge to CERN backup proxy EGI

            • Active discussions from site admins
          • 155430 TEAM atlas UKI-SCOTGRID-ECDF less urgent NGI_UK in progress 2022-01-12 16:37:00 UKI-SCOTGRID-ECDF transfer and deletion errors EGI

            • Data at risk, due to problems over Chrsitmas
          • 155141 TEAM atlas UKI-LT2-Brunel less urgent NGI_UK in progress 2021-12-24 08:39:00 Transfers from UKI-LT2-Brunel fail with “Internal Server Error” EGI

            • JW to progress to a solution
          • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2021-12-25 04:28:00 UKI-LT2-QMUL SOURCE transfer failures: [13] Result (Neon): SSL handshake failed EGI

            • Server fell over on Christmas day
            • Moving to adding more ‘oomph’, it’s not the highest priority item however
          • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-12-08 12:35:00 DPM storage ACL configuration EGI

            • other urgent issues are delaying this
          • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK on hold 2021-12-08 13:25:00 RAL Echo Davs developments EGI

            • New webdavs endpoint with new gateways created. Available for more aggressive optimisation tuning and improvements
          • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-12-01 15:37:00 HTTPS on RAL CTA EGI

            • Needs to be tested
      • CPU 5m

        New link for the site-oriented dashboard

          • RAL

            • Remains low; some from job scheduling when there’s a large number of transfering FTS files in the queue.
            • Also due to contention from other VOs
          • Northgrid

            • Largely ok
          • London

            • Some brief issues with QMUL
          • SouthGrid

            • BHAM a few days outage (?), but back now.
            • Sussex; running well, but could be running more slots at the site
          • Scotgrid

            • Durham - cooling; power issue over Christmas. SRR not readable; leading to overfilling of the storage.
              • Once SRR accessible, jobs started running and data reduced to below the total.
            • Gla; disk controller appears to have died; Expected to be onlined shortly.
        •  


         

      • Other new issues / tasks 5m

        Re-enabling GPU queue for QMUL

        Analysis facilities: understand the status and E&D for UK; feedback to Alessandra.

        Multihop failures RAL; no overwrite of failed intermediate steps

        RAL complete Echo rebalancing; Stop greedy deletions for Disk

    • 10:20 10:40
      Ongoing Items 20m
      • TPC with http

        • Davs optimsisation at RAL to take priority with a new webdav alias available
      • Storageless Site test (Oxford)

        • Seeing TLS errors on the Xcache via xrootd; cache is passing through the data
      • LANCS Storage migration

        • JW to ensure endpoint is configured in CRIC
        • Site awainting one last swtich change to begin real testing

       

       

    • 10:40 10:50
      News round-table 10m
      • Alessandra

        • NTR
      • Dan

        • Storage for Atlas by end of months
        • Refurbishment remains some way off
      • Gerard

        • NTR
      • Matt

        • NTR
      • Patrick

        • NTR; Attempting to work out how to get the full number of slots to run at the site.
      • Peter

        • NTR

      • Sam

        • GLA now restarted.
      • Stephen

        • NTR
      • Vip

        • To arrange a discussion to track down Xcache problems

       

       

    • 10:50 11:00