ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)) , James William Walder (Science and Technology Facilities Council STFC (GB))
Description

Meeting to be held via Zoom (https://ukri.zoom.us/j/97404730356)
Password protected (same as OPs Mtg)

Outstanding tickets

  • 149362 UKI-SOUTHGRID-RALPP urgent in progress 2020-11-19 10:11:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
    • Waiting to drain and reinstall CE (On Monday)
  • 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-11-12 17:24:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
    • Will look today and update
  • 146651 RAL-LCG2 urgent on hold 2020-10-16 11:56:00 singularity and user NS setup at RAL
    • on hold
  • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-11-05 10:52:00 CentOS7 migration UKI-SOUTHGRID-SUSX
    • on hold

CPU

  • RAL

  • Northgrid

    • Low CPU efficiency still seen for SHEF; configured to reasonable (expected) baseline setting, and iterate from there.
      • MC generation is good; appears to be network / file transfer that limits the efficiency
  • London

    • OX moving of WNs went ok
  • SouthGrid

  • Scotgrid

    • cvmfs issues; rpm updates may have corrupted cache, and segfault autofs
      • (newer 5 series of) Kernel bug being discussed in cvmfs mailing lists (but may? be backported to old kernels?)
    • out of 60, lost 15 due to cvmfs issues; rebooting and bringing back online.

Other new issues

  • Kibana usage, reported can be difficult beyond doing basic things.
  • Apfmon; using CRIC for informaiton; did manually do a sync with AGIS to update and collect all CE’s

Ongoing issues

  • CentOS7 - Sussex

    • no update
  • Glasgow DPM decommissioning

    • Sam to create LOCALGROUPDISK pool on Ceph this week
  • TPC with http

    • RAL test gateway updated to 5.0.3 with some additional patches; working as before
  • ECDF -

    • Discussed in QoS
      • JBOD vs Raids brought up, and question on how the site would like to proceed.
  • Oxford storageless tests

    • Disks needed for primary disk servers. Replaced with 3TB; Vip to be provided with Arc configs for setting up new queues

News round-table

(NTR)

  • Vip
    • NTR
  • Dan
    • NTR
  • Peter
    • NTR
  • Sam
    • NTR
  • Gareth
    • GR main person to move hardware at the moment.
  • JW
    • NTR

AOB

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • 149362 UKI-SOUTHGRID-RALPP urgent in progress 2020-11-19 10:11:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
          • Waiting to drain and reinstall CE (On Monday)
        • 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-11-12 17:24:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
          • Will look today and update
        • 146651 RAL-LCG2 urgent on hold 2020-10-16 11:56:00 singularity and user NS setup at RAL
          • on hold
        • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-11-05 10:52:00 CentOS7 migration UKI-SOUTHGRID-SUSX
          • on hold
      • CPU 5m
        • RAL

        • Northgrid

          • Low CPU efficiency still seen for SHEF; configured to reasonable (expected) baseline setting, and iterate from there.
            • MC generation is good; appears to be network / file transfer that limits the efficiency
        • London

          • OX moving of WNs went ok
        • SouthGrid

        • Scotgrid

          • cvmfs issues; rpm updates may have corrupted cache, and segfault autofs
            • (newer 5 series of) Kernel bug being discussed in cvmfs mailing lists (but may? be backported to old kernels?)
          • out of 60, lost 15 due to cvmfs issues; rebooting and bringing back online.
      • Other new issues / tasks 5m
        • Kibana usage, reported can be difficult beyond doing basic things.
        • Apfmon; using CRIC for informaiton; did manually do a sync with AGIS to update and collect all CE’s
      • Enables CEs in Panda Queues 20m

        Adding CE's to RALPP, and Glasgow Panda queues during CRIC migration

    • 10:20 10:40
      Ongoing Items 20m
      • CentOS7 - Sussex

        • no update
      • Glasgow DPM decommissioning

        • Sam to create LOCALGROUPDISK pool on Ceph this week
      • TPC with http

        • RAL test gateway updated to 5.0.3 with some additional patches; working as before
      • ECDF -

        • Discussed in QoS
          • JBOD vs Raids brought up, and question on how the site would like to proceed.
      • Oxford storageless tests

        • Disks needed for primary disk servers. Replaced with 3TB; Vip to be provided with Arc configs for setting up new queues
    • 10:40 10:50
      News round-table 10m
      • Vip
        • NTR
      • Dan
        • NTR
      • Peter
        • NTR
      • Sam
        • NTR
      • Gareth
        • GR main person to move hardware at the moment.
      • JW
        • NTR
    • 10:50 11:00
      AOB 10m