ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

https://cern.zoom.us/j/98434450232

Password protected (same as (new) OPs Mtg)

Videoconference
ATLAS UK Cloud Support
Zoom Meeting ID
98434450232
Host
James William Walder
Useful links
Join via phone
Zoom URL

● Status

  • 153405 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK assigned 2021-08-04 21:38:00 UKI-LT2-QMUL squid degraded WLCG

      • Rebooted; may need new disks if issue reappears
      • Ticket now solved
    • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK in progress 2021-08-04 11:55:00 HTTPS on RAL CTA WLCG

      • Tracking ticket for Tape tests
    • 153295 USER atlas RAL-LCG2 less urgent NGI_UK in progress 2021-08-02 09:37:00 stuck staging requests from RAL MCTAPE WLCG

      • 14 files declared as lost; unclear now whether they were never correctly transfered/staged, or otherwise lost.
    • 153277 TEAM atlas UKI-SCOTGRID-GLASGOW less urgent NGI_UK in progress 2021-07-28 13:14:00 UKI-SCOTGRID-GLASGOW_CEPH job stage-in failures WLCG

      • All should be back to normal now; Cric wan/lan settings quite complex.


 


● CPU

  • RAL

    • 2021 pledge values now applied; awaiting some steady state to assess fairshares
  • Northgrid

    • LANCS in downtime for updates
  • London

    • QMUL - More nodes onlines; user namespace for singularity needed some rebooting
  • SouthGrid

    • OX - Problems with Xcache over weekend; Went back to a previous configuration
    • BHAM; no jobs running for last couple of days
  • Scotgrid

    • Aiming for internal switch monitoring improvements
    • Recovering after the (above) GGUS issues

● Other new issues / tasks

    • Major Downtime for RAL + T1 on 14/15th August

 

 


● Ongoing Items

  • CentOS7 - Sussex

    • NTR
  • TPC with http

    • Both RAL and Glasgow looking ok when Src site; more problems when acting as destination
  • Storageless Site test (Oxford)

    • Following from Storage Mtg. discussions; would be interesting to get accurate / latest numbers for atlas throughputs at sites, and for particular acitvities

● News round-table

  • Dan

    • NTR
  • Gerard

    • NTR
  • Matt

    • NTR;
  • Sam

    • NTR

● AOB

  • Expect to keep this mtg weekly for the Summer


  •  
There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • 153405 TEAM atlas UKI-LT2-QMUL less urgent NGI_UK assigned 2021-08-04 21:38:00 UKI-LT2-QMUL squid degraded WLCG

          • Rebooted; may need new disks if issue reappears
          • Ticket now solved
        • 153367 TEAM atlas RAL-LCG2 urgent NGI_UK in progress 2021-08-04 11:55:00 HTTPS on RAL CTA WLCG

          • Tracking ticket for Tape tests
        • 153295 USER atlas RAL-LCG2 less urgent NGI_UK in progress 2021-08-02 09:37:00 stuck staging requests from RAL MCTAPE WLCG

          • 14 files declared as lost; unclear now whether they were never correctly transfered/staged, or otherwise lost.
        • 153277 TEAM atlas UKI-SCOTGRID-GLASGOW less urgent NGI_UK in progress 2021-07-28 13:14:00 UKI-SCOTGRID-GLASGOW_CEPH job stage-in failures WLCG

          • All should be back to normal now; Cric wan/lan settings quite complex.


       

      • Outstanding tickets 10m
      • CPU 5m

        New link for the site-oriented dashboard

        • RAL

          • 2021 pledge values now applied; awaiting some steady state to assess fairshares
        • Northgrid

          • LANCS in downtime for updates
        • London

          • QMUL - More nodes onlines; user namespace for singularity needed some rebooting
        • SouthGrid

          • OX - Problems with Xcache over weekend; Went back to a previous configuration
          • BHAM; no jobs running for last couple of days
        • Scotgrid

          • Aiming for internal switch monitoring improvements
          • Recovering after the (above) GGUS issues
      • Other new issues / tasks 5m

        T1 RAL-LCG2 Major downtime for 14/15th August. All services offline; may be possible to foreshorten DT if all goes well.
        Site core infrastructure upgrades; all networking (etc.) down.

        Following weekend 21/22 expect minor disruptions (depending on success of the 14th interventions).

          • Major Downtime for RAL + T1 on 14/15th August

         

         

    • 10:20 10:40
      Ongoing Items 20m
      • CentOS7 - Sussex

        • NTR
      • TPC with http

        • Both RAL and Glasgow looking ok when Src site; more problems when acting as destination
      • Storageless Site test (Oxford)

        • Following from Storage Mtg. discussions; would be interesting to get accurate / latest numbers for atlas throughputs at sites, and for particular acitvities
    • 10:40 10:50
      News round-table 10m
      • Dan

        • NTR
      • Gerard

        • NTR
      • Matt

        • NTR;
      • Sam

        • NTR
    • 10:50 11:00
      AOB 10m
      • Expect to keep this mtg weekly for the Summer


      •