ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Videoconference Rooms
ATLAS_UK_Cloud_Support_indico_233262
Name
ATLAS_UK_Cloud_Support_indico_233262
Description
Weekly ATLAS UK Cloud Support Meeting
Extension
109233262
Owner
Tim Adye
Auto-join URL
Useful links
Phone numbers

Outstanding tickets

  • 147698 UKI-SCOTGRID-DURHAM less urgent assigned 2020-07-01 15:32:00 UKI-SCOTGRID-DURHAM squid down

    • Assigned; VM / to reboot
  • 146771 UKI-SCOTGRID-ECDF less urgent reopened 2020-07-01 22:18:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”

    • reopened; hoped that update to centos7 would have resolved most issues
  • 146651 RAL-LCG2 urgent in progress 2020-05-27 10:43:00 singularity and user NS setup at RAL

    • Timescale and planning underway with Grid service
  • 146374 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-06-24 16:18:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE

    • On Hold
  • 145688 UKI-NORTHGRID-MAN-HEP less urgent in progress 2020-06-30 06:45:00 Very old version of squids at UKI-NORTHGRID-MAN-HEP

    • Almost complete; test squid online; try new version for few days on one production squid. Then rollout.
  • 145510 RAL-LCG2 urgent in progress 2020-06-29 07:33:00 RAL-LCG2: timeouts on stage-in/outs

    • Pilot update seems to have improved situation; However was a spike in timeout activity. Will try to close
  • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-06-09 07:59:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1

    • on Hold; access may become increasingly restricted
  • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX

    • on Hold

CPU

  • RAL

    • Problem in aCT appear. Fixed by 2200, but taking time to reclaim the lost slots
  • Northgrid

    • Durham; Aircon on but below full efficiency; may take time to get jobs back in the queue from the backlog of other jobs.
  • London

    • QMUL: To investigate memory issues from jobs
  • SouthGrid

  • Scotgrid

Other new issues

Ongoing issues

  • CentOS7 - Sussex

    • On Hold
  • Grand Unified queues

    • On Hold

News round-table

 

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • 147698 UKI-SCOTGRID-DURHAM less urgent assigned 2020-07-01 15:32:00 UKI-SCOTGRID-DURHAM squid down

          • Assigned; VM / to reboot
        • 146771 UKI-SCOTGRID-ECDF less urgent reopened 2020-07-01 22:18:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”

          • reopened; hoped that update to centos7 would have resolved most issues
        • 146651 RAL-LCG2 urgent in progress 2020-05-27 10:43:00 singularity and user NS setup at RAL

          • Timescale and planning underway with Grid service
        • 146374 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-06-24 16:18:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE

          • On Hold
        • 145688 UKI-NORTHGRID-MAN-HEP less urgent in progress 2020-06-30 06:45:00 Very old version of squids at UKI-NORTHGRID-MAN-HEP

          • Almost complete; test squid online; try new version for few days on one production squid. Then rollout.
        • 145510 RAL-LCG2 urgent in progress 2020-06-29 07:33:00 RAL-LCG2: timeouts on stage-in/outs

          • Pilot update seems to have improved situation; However was a spike in timeout activity. Will try to close
        • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-06-09 07:59:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1

          • on Hold; access may become increasingly restricted
        • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX

          • on Hold

         

         

      • CPU 5m
        • RAL

          • Problem in aCT appear. Fixed by 2200, but taking time to reclaim the lost slots
        • Northgrid

          • Durham; Aircon on but below full efficiency; may take time to get jobs back in the queue from the backlog of other jobs.
        • London

          • QMUL: To investigate memory issues from jobs
        • SouthGrid

        • Scotgrid

         

         

      • Other new issues 5m
    • 10:20 10:40
      Ongoing issues 20m
      • CentOS7 - Sussex

        • On Hold
      • Grand Unified queues

        • On Hold

       

       

    • 10:40 10:50
      News round-table 10m

       

       

    • 10:50 11:00
      AOB 10m