ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))

Outstanding tickets

  • 148169 UKI-SCOTGRID-ECDF less urgent in progress 2020-08-05 10:25:00 Failovers from jobs running at UKI-SCOTGRID-ECDF_CLOUD to CERN backup proxy

    • In contact with contact the Cloud-Scheduler admins
  • 148120 UKI-SOUTHGRID-RALPP less urgent in progress 2020-08-05 14:02:00 UKI-SOUTHGRID-RALPP: authorization failures

    • Can close?
  • 147979 UKI-NORTHGRID-MAN-HEP less urgent in progress 2020-08-04 09:28:00 UKI-NORTHGRID-MAN-HEP timeout transfer errros and also deletion errors

    • In progress
  • 147841 UKI-SCOTGRID-GLASGOW less urgent reopened 2020-08-04 09:23:00 UKI-SCOTGRID-GLASGOW: deletion problems

    • James to give list to Sam for current set
    • Sam will pick off db entries for residual files in DPM pointing to disk039
  • 146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-08-06 07:47:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”

    • Sutble ACL differences between VOMS roles in DPM (groupnames) and ACLs of the parent directory
    • Why the restarting basically works, appear a peculiar sutblity.
      • All experts actively engaged
  • 146651 RAL-LCG2 urgent on hold 2020-07-23 20:02:00 singularity and user NS setup at RAL

    • Need to provide a timescale to have this done (situation with BNL?)
      • Push the priority up the agenda
  • 146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-07-22 14:53:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE

    • Few jobs got through this week
    • Very little effort able to be allocated to this: partial of one, and zero allocated to Elena.
  • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-06-09 07:59:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1

    • on hold
  • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX

    • on hold

CPU

  • RAL

    • Stable; arc-ce6 should finish this week
  • Northgrid

    • LANCS
      • switch to aCT;
      • still networking issues until late tuesday.
        • possible firewall blacklisting issue.
      • current status appears ok
  • London

    • QMUL back after downtime and singularity Bind points.
      • To test final configuration
  • SouthGrid

  • Scotgrid

    • GLA: Largely stable, but still some AC issues in DC
    • ECDF: fluctuates due to issues seen in related GGUS

Other new issues

Ongoing issues

  • CentOS7 - Sussex

  • Grand Unified queues

News round-table

  • Vip

    • NTR
  • Dan

    • ATLAS storage > 0.5 PB over last few months
    • Once new lustre is commissionined will increase sinificantly
  • Matt

    • Lookng fine
  • Sam

    • NTR
  • Tim

    • NTR
  • JW

    • TPC http smoke test configured; issues with macaroons
  •  

AOB

 ATLAS TWiki for storage recommendataions, including ACL's, etc.

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • 148169 UKI-SCOTGRID-ECDF less urgent in progress 2020-08-05 10:25:00 Failovers from jobs running at UKI-SCOTGRID-ECDF_CLOUD to CERN backup proxy

          • In contact with contact the Cloud-Scheduler admins
        • 148120 UKI-SOUTHGRID-RALPP less urgent in progress 2020-08-05 14:02:00 UKI-SOUTHGRID-RALPP: authorization failures

          • Can close?
        • 147979 UKI-NORTHGRID-MAN-HEP less urgent in progress 2020-08-04 09:28:00 UKI-NORTHGRID-MAN-HEP timeout transfer errros and also deletion errors

          • In progress
        • 147841 UKI-SCOTGRID-GLASGOW less urgent reopened 2020-08-04 09:23:00 UKI-SCOTGRID-GLASGOW: deletion problems

          • James to give list to Sam for current set
          • Sam will pick off db entries for residual files in DPM pointing to disk039
        • 146771 UKI-SCOTGRID-ECDF less urgent in progress 2020-08-06 07:47:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”

          • Sutble ACL differences between VOMS roles in DPM (groupnames) and ACLs of the parent directory
          • Why the restarting basically works, appear a peculiar sutblity.
            • All experts actively engaged
        • 146651 RAL-LCG2 urgent on hold 2020-07-23 20:02:00 singularity and user NS setup at RAL

          • Need to provide a timescale to have this done (situation with BNL?)
            • Push the priority up the agenda
        • 146374 UKI-NORTHGRID-SHEF-HEP urgent in progress 2020-07-22 14:53:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE

          • Few jobs got through this week
          • Very little effort able to be allocated to this: partial of one, and zero allocated to Elena.
        • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-06-09 07:59:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1

          • on hold
        • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-06-04 14:05:00 CentOS7 migration UKI-SOUTHGRID-SUSX

          • on hold
      • CPU 5m
        • RAL

          • Stable; arc-ce6 should finish this week
        • Northgrid

          • LANCS
            • switch to aCT;
            • still networking issues until late tuesday.
              • possible firewall blacklisting issue.
            • current status appears ok
        • London

          • QMUL back after downtime and singularity Bind points.
            • To test final configuration
        • SouthGrid

        • Scotgrid

          • GLA: Largely stable, but still some AC issues in DC
          • ECDF: fluctuates due to issues seen in related GGUS

         

      • Other new issues 5m
    • 10:20 10:40
      Ongoing issues 20m

      On hold

    • 10:40 10:50
      News round-table 10m
      • Vip

        • NTR
      • Dan

        • ATLAS storage > 0.5 PB over last few months
        • Once new lustre is commissionined will increase sinificantly
      • Matt

        • Lookng fine
      • Sam

        • NTR
      • Tim

        • NTR
      • JW

        • TPC http smoke test configured; issues with macaroons
      •  

       

    • 10:50 11:00
      AOB 10m

      ATLAS TWiki page outlining recommended ACLs for Endpoints, etc.