ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

https://cern.zoom.us/j/98434450232

Password protected (same as (new) OPs Mtg)

● Outstanding tickets

  • 155214 USER atlas RAL-LCG2 less urgent NGI_UK in progress 2021-12-07 13:29:00 stuck staging request at RAL EGI

      • File now staged, but why stuck for so long?
    • 155199 TEAM atlas UKI-SCOTGRID-ECDF less urgent NGI_UK in progress 2021-12-07 13:08:00 Transfer and deletion error at UKI-SCOTGRID-ECDF as a destination EGI

      • Transfers appear to have returned to normal. No explanation yet to the cause
    • 155141 TEAM atlas UKI-LT2-Brunel less urgent NGI_UK in progress 2021-12-08 18:43:00 Transfers from UKI-LT2-Brunel fail with “Internal Server Error” EGI

      • Hardware back online, but some files need to be declared as lost.
      • Once files removed from the namespace, JW to delete
    • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2021-11-28 12:34:00 UKI-LT2-QMUL SOURCE transfer failures: [13] Result (Neon): SSL handshake failed EGI

      • Return to investigation once QMUL online
    • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-12-08 12:35:00 DPM storage ACL configuration EGI

      • Site will get to it at some point
    • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK on hold 2021-12-08 13:25:00 RAL Echo Davs developments EGI

      • Update of status given in the meeting
    • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-12-01 15:37:00 HTTPS on RAL CTA EGI

      • Awaiting next year at this point. Don’t want to overlap ATLAS commissioning tests with Migration

● CPU

    • RAL

      • Remain volatile. Networking issue over weekend took out jobs and reduced capacity
    • Northgrid

      • MAN a bit variable
    • London

      • QMUL in DT, IC back to more normal running
    • SouthGrid

      • N/A JW to check on why no jobs using the Xcache (answer - its all Evgen)
    • Scotgrid

      • Blip for GLA; reason unclear

● Ongoing Items

  • TPC with http

    • Planning of Glasgow Ceph -> Cephfs migration
      • Possible options of ATLAS orchastrated migration, or internal ‘rsync’
      • JW to understand ‘rate and depth’ of data churn
  • Storageless Site test (Oxford)

    • Discussions on Xcaches in general; funding them, sizing them, and their usefulnes.
      • Also, whether anything to improve for Ox
  • LANCS Storage migration

    • No update

 

 


● News round-table

  • Alessandra
    • NTR
  • Dan
    • Will be done with updates today
  • Gerard
    • NTR
  • Matt
    • Apologies for absence 
  • Patrick
    • Apologies for absence
  • Peter
    • NTR
  • Sam
    • NTR
  • Steven
    • NTR
  • Vip
    • NTR

● AOB

  • Next week, should be last meeting of the year

 

 

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • Outstanding tickets

          • 155214 USER atlas RAL-LCG2 less urgent NGI_UK in progress 2021-12-07 13:29:00 stuck staging request at RAL EGI

            • File now staged, but why stuck for so long?
          • 155199 TEAM atlas UKI-SCOTGRID-ECDF less urgent NGI_UK in progress 2021-12-07 13:08:00 Transfer and deletion error at UKI-SCOTGRID-ECDF as a destination EGI

            • Transfers appear to have returned to normal. No explanation yet to the cause
          • 155141 TEAM atlas UKI-LT2-Brunel less urgent NGI_UK in progress 2021-12-08 18:43:00 Transfers from UKI-LT2-Brunel fail with “Internal Server Error” EGI

            • Hardware back online, but some files need to be declared as lost.
            • Once files removed from the namespace, JW to delete
          • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2021-11-28 12:34:00 UKI-LT2-QMUL SOURCE transfer failures: [13] Result (Neon): SSL handshake failed EGI

            • Return to investigation once QMUL online
          • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-12-08 12:35:00 DPM storage ACL configuration EGI

            • Site will get to it at some point
          • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK on hold 2021-12-08 13:25:00 RAL Echo Davs developments EGI

            • Update of status given in the meeting
          • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-12-01 15:37:00 HTTPS on RAL CTA EGI

            • Awaiting next year at this point. Don’t want to overlap ATLAS commissioning tests with Migration
      • CPU 5m

        New link for the site-oriented dashboard

          • RAL

            • Remain volatile. Networking issue over weekend took out jobs and reduced capacity
          • Northgrid

            • MAN a bit variable
          • London

            • QMUL in DT, IC back to more normal running
          • SouthGrid

            • N/A JW to check on why no jobs using the Xcache (answer - its all Evgen)
          • Scotgrid

            • Blip for GLA; reason unclear
      • Other new issues / tasks 5m

        Renabling GPU queue for QMUL
        https://atlas-cric.cern.ch/atlas/pandaqueue/detail/ANALY_QMUL_GPU/

        Upgraded dCache; Investigating dCache bug with copying output with NFS
        SRR was looking ok, but had problems with SRR service

        RAL-LCG2 capacity reduction: https://its.cern.ch/jira/browse/ATLDDMOPS-5585

        Xrootd 5.3.X (X>=3) needed for VP sites

        Liverpool: retiring disk servers; space reduction needed

    • 10:20 10:40
      Ongoing Items 20m
      • TPC with http

        • Planning of Glasgow Ceph -> Cephfs migration
          • Possible options of ATLAS orchastrated migration, or internal ‘rsync’
          • JW to understand ‘rate and depth’ of data churn
      • Storageless Site test (Oxford)

        • Discussions on Xcaches in general; funding them, sizing them, and their usefulnes.
          • Also, whether anything to improve for Ox
      • LANCS Storage migration

        • No update

       

       

    • 10:40 10:50
      News round-table 10m
      • Alessandra
        • NTR
      • Dan
        • Will be done with updates today
      • Gerard
        • NTR
      • Matt
        • Apologies for absence 
      • Patrick
        • Apologies for absence
      • Peter
        • NTR
      • Sam
        • NTR
      • Steven
        • NTR
      • Vip
        • NTR
    • 10:50 11:00
      AOB 10m
      • Next week, should be last meeting of the year