ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

Meeting to be held via Zoom (https://ukri.zoom.us/j/97404730356)
Password protected (same as OPs Mtg)

Outstanding tickets

  • 149362 UKI-SOUTHGRID-RALPP urgent in progress 2020-11-13 08:45:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
    • Issue in adding new CE with AGIS/CRIC (see below)
    • Site to take ce into downtime on Monday for general cleanup
  • 148968 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-11-19 06:53:00 UKI-NORTHGRID-LANCS-HEP: deletion and transfer failures
    • gridFTP restarted; looking better, but will keep an eye
    • Other non-Lancs issues with Italy sites, adds a bit of confusion
      • Napoli issue with https available only on LHCONE (via certain IPvX?) whereas,
      • gridFTP available on non LHCONE
  • 148342 UKI-SCOTGRID-GLASGOW less urgent in progress 2020-11-12 17:24:00 UKI-SCOTGRID-GLASGOW with transfer efficiency degraded and many failures
    • Sam to take a look at problem files
  • 146651 RAL-LCG2 urgent on hold 2020-10-16 11:56:00 singularity and user NS setup at RAL
    • no update
  • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2020-11-05 10:52:00 CentOS7 migration UKI-SOUTHGRID-SUSX
    • no update

CPU

  • RAL

  • Northgrid

  • London

  • Server with occasional bad memory issue

    • To discuss with manufactuer to attempt a proper fix (no just bios update)
  • SouthGrid

  • Scotgrid

    • Durham Priorty user that takes the all the priorty, causing loss of ATLAS jobs.
      • Some job loss also from HC test failures from missing files
    • GLA: CMVFS update for CMS, some unindented consequences caused problems
    • GLA: Bringing online additional capacity slowly; aim before Christmas full capactiy.
      • Probable identification of high iops in ceph cluster from offsite xroot direct access reads
        • user from cern, accessing scratchdisk
        • Swtich off to see if this solves the issue.

Other new issues

  • Recent Switcher problem with AGAS/CRIC sending many emails

    • Concerns raised on approprate use of mailing lists.
    • ATLAS uk has cloud-support, uk comp operations, and uk comp users.
      • The comp users list has been unused for 5 years, and it was agreed to be removed
      • For cloud support, this remains the most active discussion list, and will be unchanged.
      • The comp operations contains the daily summary and Switcher notifications. Non automated traffic is on the order of 1 email per year; which may have been unintentionally intended for cloud support.
        • It was decided to keep the Swticher and Daily summary in this list. A simple filter can remove any unwanted emails.
  • Queues:

    • Long-term queues that are not disabled, but not running production:
      • UK ANALY_MANC_TEST_SL7: Still needed
      • UK ANALY_QMUL_GPU_TEST: -> could be renamed to non test
      • RAL-LCG2_TEST: -> not actively used (see comment from Peter)
      • RAL-LCG2_UCORE: Can be disabled
      • UKI-NORTHGRID-LANCS-HEP_TEST (see comment from Peter)
      • UKI-NORTHGRID-MAN-HEP_TEST; testbed -> keep
      • UKI-SCOTGRID-GLASGOW_CEPH_TEST: keep
      • UKI-SOUTHGRID-OX-HEP_TEST: (see comment from Peter)
      • UKI-SOUTHGRID-SUSX_UCORE: not test, should become production, might want remaining
      •  
      • Peter uses TEST queuse for dev test work monitoring
      • QM test queue might be useful

Ongoing issues

  • CentOS7 - Sussex

    • no update
  • Datadisk; watermark reduced.

  • LOCALGROUP disk

    • New pool to be created shortly
  • TPC:

    • Naples -> moved to DPM 1.14.2, networking blocked 443 ipv6, ipv4 open on general network

    • Affects whole of UK (e.g. Lancs, MAN)

    • Retry transfer failures

    • Vunerability from DPM, and dCache

      • Beleive all UK DPM sites up to date (or not affected)
      • dCache issue announced in appropriate channels

News round-table

  • Vip
    • Had to leave before end; NTR
  • Dan
    • NTR
  • Matt
    • NTR; away for next week’s meeting
  • Peter
  • Alessandra
    • NTR
  • Gareth
    • The two CE’s recently added to Glasgow will stay in downtime for time being.
    • JW to check they are included correctly in CRIC / AGIS.
  • JW
    • NTR
  • Sam;
    • Final talk available for Workshop.
    • Positive comments on updates to talk draft
      • Tables are now much better

AOB

NTR

There are minutes attached to this event. Show them.