ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))

● Outstanding tickets

  • 148968 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-10-14 19:49:00 UKI-NORTHGRID-LANCS-HEP: deletion and transfer failures
  • 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-10-09 11:53:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
    • No route to Host transfer errors for DPM storage. To be investigated
  • 146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
    • On hold
  • 146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-09-11 13:35:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
    • On hold
  • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
    • On hold
  • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
    • On hold

● CPU

  • RAL

    • Additional slots from CMS issues
  • Northgrid

    • LANCS: Set offline from disk issues
  • London

    • QMUL; transient issue.
  • SouthGrid

    • Some RALPP fluctuations
  • Scotgrid

    • GLA onlining more CPUs. New Dell nodes. Last two had cvmfs cache issues requiring a manual fix.
      • missing sub-dirs
      • OX noted on some of their nodes, cvmfs is getting full, and can result in blacklisting
    • Durham; DPM disk server poorly; DPM fills up logs. Should be resolved by now.

● Ongoing issues

  • CentOS7 - Sussex

    • NTR

  • Grand Unified queues

    • NTR


● News round-table

  • Vip
    • No downtime next week; few WNs will be offlined for work however.
  • Dan
    • NTR
  • Matt
    • Will give “T2 operations in Covid” in GridPP45.
  • Peter
    • Noted general poor audio; not observed from others.
    • If continues next week, we consider move to zoom (again).
  • Sam
    • cephc05 as production machine runing fine. c02 for dev work to be updated with forked xrootd-ceph shortly
    • Next week Storage meeting will be cancelled for GridPP overlap
  • Tim
    • NTR
  • JW
    • Work on TPC-http with Ceph continues; new problem with stripe alignement.

 

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • 148968 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-10-14 19:49:00 UKI-NORTHGRID-LANCS-HEP: deletion and transfer failures
        • 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-10-09 11:53:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
          • No route to Host transfer errors for DPM storage. To be investigated
        • 146651 RAL-LCG2 urgent on hold 2020-08-10 10:59:00 singularity and user NS setup at RAL
          • On hold
        • 146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-09-11 13:35:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE
          • On hold
        • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-08-10 09:54:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1
          • On hold
        • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX
          • On hold
      • CPU 5m
        • RAL

          • Additional slots from CMS issues
        • Northgrid

          • LANCS: Set offline from disk issues
        • London

          • QMUL; transient issue.
        • SouthGrid

          • Some RALPP fluctuations
        • Scotgrid

          • GLA onlining more CPUs. New Dell nodes. Last two had cvmfs cache issues requiring a manual fix.
            • missing sub-dirs
            • OX noted on some of their nodes, cvmfs is getting full, and can result in blacklisting
          • Durham; DPM disk server poorly; DPM fills up logs. Should be resolved by now.
      • Other new issues 5m
    • 10:20 10:40
      Ongoing issues 20m
      • CentOS7 - Sussex

        • NTR

      • Grand Unified queues

        • NTR

    • 10:40 10:50
      News round-table 10m
      • Vip
        • No downtime next week; few WNs will be offlined for work however.
      • Dan
        • NTR
      • Matt
        • Will give “T2 operations in Covid” in GridPP45.
      • Peter
        • Noted general poor audio; not observed from others.
        • If continues next week, we consider move to zoom (again).
      • Sam
        • cephc05 as production machine runing fine. c02 for dev work to be updated with forked xrootd-ceph shortly
        • Next week Storage meeting will be cancelled for GridPP overlap
      • Tim
        • NTR
      • JW
        • Work on TPC-http with Ceph continues; new problem with stripe alignement.

       

    • 10:50 11:00
      AOB 10m