ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

https://cern.zoom.us/j/98434450232

Password protected (same as (new) OPs Mtg)

Videoconference
ATLAS UK Cloud Support
Zoom Meeting ID
98434450232
Host
James William Walder
Useful links
Join via phone
Zoom URL

● Outstanding tickets

  • 155856 TEAM atlas RAL-LCG2 less urgent NGI_UK in progress 2022-02-03 09:50:00 RAL-LCG2: deletion errors EGI

    • RAL disk is full, large numbers of deletions of small files.
    • DDM to try and launch larger file deletions.
    • Generally just very busy …
  • 155430 TEAM atlas UKI-SCOTGRID-ECDF less urgent NGI_UK in progress 2022-01-13 14:27:00 UKI-SCOTGRID-ECDF transfer and deletion errors EGI

    • Awaiting large file list to arrive, do be declared as lost
  • 155141 TEAM atlas UKI-LT2-Brunel less urgent NGI_UK in progress 2022-02-01 18:23:00 Transfers from UKI-LT2-Brunel fail with “Internal Server Error” EGI

    • Needing the ok from Brunel that they’ve cleaned the files from their namespace
  • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2022-01-28 09:16:00 UKI-LT2-QMUL SOURCE transfer failures EGI

    • SSL errors remain large source of transfer failures
    • Restarting of some disk server needed
    • Storm added some dev version to repo, causing SRR issues, now reverted
    • Additional storage being added to the site
  • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-12-08 12:35:00 DPM storage ACL configuration EGI

    • Still needing an update
  • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK in progress 2022-02-03 08:15:00 RAL Echo Davs developments EGI

    • More file transfer failures reported due to known multihop limitations
  • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-12-01 15:37:00 HTTPS on RAL CTA EGI

    • Disucssion on TAPE and Disk tests, following the recommendations of data challenges.

● CPU

    • RAL

      • Reduced capacity for patching camapaign finished; ATLAS running well, so far …
    • Northgrid

      • NTR
    • London

      • QMUL returning after a few issues.
        • Dark data reported
        • Storm dev version was in repo, also doing the SRR?
    • SouthGrid

      • Xcache issues, followed by power glitch on weekend
    • Scotgrid

      • Xrootd service needed restarting. 503 rucio errors masked this from quickly being spotted

● Ongoing Items

  • TPC with http

    • ATLAS wanting to drop gridFTP by March:
      • Castor <-> Echo would still be ok (if not moved to CTA by then)
      • Echo writes remain on gridFTP (predominantly);
        • With additonal hardware, should be able to switch once installed
      • Glasgow; gridFTP; move to cephfs, or, use patched davs version from RAL
        • Could consider to set up additional SE, and move data across managed by DDM
        • JW to liaise with DDM
  • Storageless Site test (Oxford)

    • NTR (discussed in Storage Mtg)
      • JW to get avaialble plots together by end of Feb
  • LANCS Storage migration

    • Viable endpoint for functional tests should be avaialble by today
    • JW to make final updates to CRIC and run checks before informing DDM

 

 


● News round-table

  • Alessandra

    • NTR
  • Dan

    • NTR (discuessed above )
    • Alessandra helping with GPU queue setting
  • Gerard

    • NTR
  • Matt

    • Discussed endpoint avaialbility
  • Sam

    • NTR
  • Stephen

    • NTR
  • Vip

    • NTR

 

 

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • 155856 TEAM atlas RAL-LCG2 less urgent NGI_UK in progress 2022-02-03 09:50:00 RAL-LCG2: deletion errors EGI

          • RAL disk is full, large numbers of deletions of small files.
          • DDM to try and launch larger file deletions.
          • Generally just very busy …
        • 155430 TEAM atlas UKI-SCOTGRID-ECDF less urgent NGI_UK in progress 2022-01-13 14:27:00 UKI-SCOTGRID-ECDF transfer and deletion errors EGI

          • Awaiting large file list to arrive, do be declared as lost
        • 155141 TEAM atlas UKI-LT2-Brunel less urgent NGI_UK in progress 2022-02-01 18:23:00 Transfers from UKI-LT2-Brunel fail with “Internal Server Error” EGI

          • Needing the ok from Brunel that they’ve cleaned the files from their namespace
        • 154806 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK in progress 2022-01-28 09:16:00 UKI-LT2-QMUL SOURCE transfer failures EGI

          • SSL errors remain large source of transfer failures
          • Restarting of some disk server needed
          • Storm added some dev version to repo, causing SRR issues, now reverted
          • Additional storage being added to the site
        • 154543 TEAM atlas UKI-SCOTGRID-ECDF urgent NGI_UK in progress 2021-12-08 12:35:00 DPM storage ACL configuration EGI

          • Still needing an update
        • 154436 TEAM atlas RAL-LCG2 very urgent NGI_UK in progress 2022-02-03 08:15:00 RAL Echo Davs developments EGI

          • More file transfer failures reported due to known multihop limitations
        • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK on hold 2021-12-01 15:37:00 HTTPS on RAL CTA EGI

          • Disucssion on TAPE and Disk tests, following the recommendations of data challenges.
      • CPU 5m

        New link for the site-oriented dashboard

          • RAL

            • Reduced capacity for patching camapaign finished; ATLAS running well, so far …
          • Northgrid

            • NTR
          • London

            • QMUL returning after a few issues.
              • Dark data reported
              • Storm dev version was in repo, also doing the SRR?
          • SouthGrid

            • Xcache issues, followed by power glitch on weekend
          • Scotgrid

            • Xrootd service needed restarting. 503 rucio errors masked this from quickly being spotted
      • Other new issues / tasks 5m

        Strong aim to drop gridFTP requirement shortly.
        Glasgow,
        RAL (Writes), (Castor <-> Echo remains ok).

    • 10:20 10:40
      Ongoing Items 20m
      • TPC with http

        • ATLAS wanting to drop gridFTP by March:
          • Castor <-> Echo would still be ok (if not moved to CTA by then)
          • Echo writes remain on gridFTP (predominantly);
            • With additonal hardware, should be able to switch once installed
          • Glasgow; gridFTP; move to cephfs, or, use patched davs version from RAL
            • Could consider to set up additional SE, and move data across managed by DDM
            • JW to liaise with DDM
      • Storageless Site test (Oxford)

        • NTR (discussed in Storage Mtg)
          • JW to get avaialble plots together by end of Feb
      • LANCS Storage migration

        • Viable endpoint for functional tests should be avaialble by today
        • JW to make final updates to CRIC and run checks before informing DDM

       

       

    • 10:40 10:50
      News round-table 10m
      • Alessandra

        • NTR
      • Dan

        • NTR (discuessed above )
        • Alessandra helping with GPU queue setting
      • Gerard

        • NTR
      • Matt

        • Discussed endpoint avaialbility
      • Sam

        • NTR
      • Stephen

        • NTR
      • Vip

        • NTR

       

       

    • 10:50 11:00