ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

https://cern.zoom.us/j/98434450232

Password protected (same as (new) OPs Mtg)

Videoconference
ATLAS UK Cloud Support
Zoom Meeting ID
98434450232
Host
James William Walder
Useful links
Join via phone
Zoom URL
    • 10:00 AM 10:20 AM
      Status 20m
      • Outstanding tickets 10m
        • Outstanding tickets

          • 155008 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-11-22 14:29:00 UKI-SCOTGRID-ECDF failing transfers due to expired certificate EGI

            • Transfers looking better, but need a confirmation
          • 154940 TEAM atlas UKI-NORTHGRID-LANCS-HEP top priority NGI_UK in progress 2021-11-24 10:10:00 UKI-NORTHGRID-LANCS-HEP_DATADISK: Failed to stage-in file / File transfer timed out during stage-in EGI

            • Discussion on how to proceed with decom of DPM and Commissioning of Ceph;
              • To create Jiri’s for each action and prepare test cluster
            • Previous tunings attempts did not help particularly
          • 154883 TEAM atlas UKI-NORTHGRID-LANCS-HEP less urgent NGI_UK in progress 2021-11-24 10:30:00 UKI-NORTHGRID-LANCS-HEP fails in transfers as destination EGI

            • As above
          • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2021-11-22 11:58:00 UKI-LT2-QMUL SOURCE transfer failures: [13] Result (Neon): SSL handshake failed EGI

            • Still in progress, noting failures correlate with large batched transfer requests
          • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-11-12 11:44:00 DPM storage ACL configuration EGI

            • Needs an update
          • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK on hold 2021-11-10 13:52:00 RAL Echo Davs developments EGI

            • Buffer
          • 154200 TEAM atlas RAL-LCG2 less urgent NGI_UK on hold 2021-11-25 09:27:00 RAL-LCG2 deletion issues with error “The requested service is not available at the moment” EGI

            • Testing for deletions working well so far
          • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-11-10 14:08:00 HTTPS on RAL CTA EGI

            • Awaiting testing
      • CPU 5m

        New link for the site-oriented dashboard

        • RAL

          • Jobs remaining largely healthy, but below pledge
        • Northgrid

          • Man slight below usual levels
        • London

          • QMUL dips from HC failures
        • SouthGrid

          • NTR
        • Scotgrid

          • NTR

      • Other new issues / tasks 5m

        Renabling GPU queue for QMUL
        https://atlas-cric.cern.ch/atlas/pandaqueue/detail/ANALY_QMUL_GPU/

        Upgraded dCache; Investigating dCache bug with copying output with NFS
        SRR was looking ok, but had problems with SRR service

        UKI-LT2-BRUNEL_DATADISK (offline, needing new backplane)

        RAL-LCG2 capacity reduction: https://its.cern.ch/jira/browse/ATLDDMOPS-5585

        Xrootd 5.3.X (X>=3) needed for VP sites

        Liverpool: retiring disk servers; space reduction needed

        Wide variance in transfer speeds from Cern to RAL (via gridFTP); similar effects can be seen at other T1s, but RAL per-file throughput is slower than the average

    • 10:20 AM 10:40 AM
      Ongoing Items 20m
      • CentOS7 - Sussex

        • Looking fine; should aim to close this now?
      • TPC with http

        • NTR
      • Storageless Site test (Oxford)

        • Reverting back to original configuration after problems with direct-io

       

       

    • 10:40 AM 10:50 AM
      News round-table 10m
      • Dan

        • NTR
      • Gerard

        • Ntr
      • Steven

        • NTR
      • Matt

        • Ntr
      • Patrick

        • NTR
      • Peter

        • NTR
      • Sam

        • Need discussion on Ceph to CephFS migration and possibilities for ATLAS.

       

       

    • 10:50 AM 11:00 AM