ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))

Outstanding tickets

  • 148589 UKI-LT2-UCL-HEP less urgent in progress 2020-09-15 08:59:00 Failovers from UKI-LT2-UCL-HEP to CERN backup proxy
    • Waiting for reply from site; squid now monitored through gocdb
  • 148401 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-09-16 10:40:00 UKI-NORTHGRID-LANCS-HEP: globus_ftp_client failures
    • 17 further files declared lost; zfs scrubbing continuing
  • 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-09-15 12:59:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
    • Deletions inside DPM ongoing.
  • 146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-09-05 18:57:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
    • Action JW - to check and close.
  • 146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
    • On hold
  • 146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-09-11 13:35:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
    • On hold
  • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
    • On hold
  • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
    • On hold

CPU and transfers

  • UK accounting page not working;

    • JW to follow-up.
  • RAL

    • no issues.
  • Northgrid

    • Mancs move to arc-6; one of the ‘hacks’ reverted (arc-5 to remove some memory limits).
  • London

    • QMUL out of downtime; additional storage needs to be added asap
      • affects gridFTP and SRM, but not xrootd.
      • With new storage SCRATCHDISK will be enabled (set to reasonable size again)
  • SouthGrid

  • Scotgrid

Other new issues

Ongoing issues

  • TPC http
    • NTR

News round-table

  • Vip
    • DPM upgrade in couple of weeks; will enter downtime.
  • Dan
    • (Update added to CPU section on QMUL lustre migration)
  • Matt
    • MD off next week.
  • Alessandra
    • NTR
  • Sam
    • NTR
  • Gareth
    • Question on moving to storageless:
      • Are there values for required sizes of caches (storage per job slot)?
        • eg. xcache. for example per site requirements.
      • example: BHAM (80TB) => OX (200TB) from HS06 scaling
      • AF: Not much from UK using xcache, and CMS a good place to look.
  • Tim
    • Noted AF closed a number of Jira tickets.
  • JW
    • NTR

AOB

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • 148589 UKI-LT2-UCL-HEP less urgent in progress 2020-09-15 08:59:00 Failovers from UKI-LT2-UCL-HEP to CERN backup proxy
          • Waiting for reply from site; squid now monitored through gocdb
        • 148401 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-09-16 10:40:00 UKI-NORTHGRID-LANCS-HEP: globus_ftp_client failures
          • 17 further files declared lost; zfs scrubbing continuing
        • 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-09-15 12:59:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
          • Deletions inside DPM ongoing.
        • 146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-09-05 18:57:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”
          • Action JW - to check and close.
        • 146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
          • On hold
        • 146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-09-11 13:35:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
          • On hold
        • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
          • On hold
        • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
          • On hold
      • CPU 5m
        • UK accounting page not working;

          • JW to follow-up.
        • RAL

          • no issues.
        • Northgrid

          • Mancs move to arc-6; one of the ‘hacks’ reverted (arc-5 to remove some memory limits).
        • London

          • QMUL out of downtime; additional storage needs to be added asap
            • affects gridFTP and SRM, but not xrootd.
            • With new storage SCRATCHDISK will be enabled (set to reasonable size again)
        • SouthGrid

        • Scotgrid

      • Other new issues 5m
    • 10:20 10:40
      Ongoing issues 20m
      • TPC http
        • NTR
    • 10:40 10:50
      News round-table 10m
      • Vip
        • DPM upgrade in couple of weeks; will enter downtime.
      • Dan
        • (Update added to CPU section on QMUL lustre migration)
      • Matt
        • MD off next week.
      • Alessandra
        • NTR
      • Sam
        • NTR
      • Gareth
        • Question on moving to storageless:
          • Are there values for required sizes of caches (storage per job slot)?
            • eg. xcache. for example per site requirements.
          • example: BHAM (80TB) => OX (200TB) from HS06 scaling
          • AF: Not much from UK using xcache, and CMS a good place to look.
      • Tim
        • Noted AF closed a number of Jira tickets.
      • JW
        • NTR
    • 10:50 11:00
      AOB 10m