ATLAS UK Cloud Support

Europe/London
Zoom

Zoom

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Description

https://cern.zoom.us/j/98434450232

Password protected (same as (new) OPs Mtg)

Videoconference
ATLAS UK Cloud Support
Zoom Meeting ID
98434450232
Host
James William Walder
Useful links
Join via phone
Zoom URL

● Outstanding tickets

  • 150820 UKI-LT2-RHUL less urgent in progress 2021-03-03 15:18:00 UKI-LT2-RHUL: 0% Transfer and deletion efficiencies
    • Power incident in center. Possible problems in bringing up some hardware.
  • 150775 UKI-SCOTGRID-GLASGOW less urgent in progress 2021-03-04 04:45:00 UKI-SCOTGRID-GLASGOW transfer and deletion errors
    • External link going down to 1 Gb
    • Many routes of campus; might be cause of issue
    • Association to large numbers of small files attempted to be transfered in via FTS
    • Restart of gridFTP services ‘cures’ problem
  • 149362 UKI-SOUTHGRID-RALPP urgent in progress 2021-02-18 20:00:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
    • No progress
  • 146651 RAL-LCG2 urgent on hold 2021-02-16 17:37:00 singularity and user NS setup at RAL
    • No progress
  • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2021-01-20 20:29:00 CentOS7 migration UKI-SOUTHGRID-SUSX
    • Issue with arex, due to change of hypervisor
    • Networking should be ok now.
    • Potential issues for
    • A few (3/4) different hardware sets; Monitor each set as provisioned.
    • Patrick to update ticket once confirming network

● CPU

  • General problem with non-aCT sites, due to Harvester issue on Sunday evening, casuing worldwide job reductions

  • Weds. another dip in central production; presumed due to Condor update (for ARC / GDPR fixes)

  • RAL

    • Job reductions due to:
      • Capped to 100% to enable more jobs for LHCb
      • ATLAS demands for more SCORE user jobs;
  • Northgrid

    • Lancs: Fairshare (stable to O(3k)), may want to try and bounce other users.
  • London

    • OK
  • SouthGrid

    • OK
  • Scotgrid

    • OK
  • BHAM also not running for LHCb, No update.

  • CAM to go into long downtime; long-term future to be discussed.


● Ongoing Items

  • CentOS7 - Sussex

    • Making good progress now (see abouve )
  • TPC with http

    • Moving to WebDav for wan / lan / TPC.
    • LHCb enabled everywhere, except for RAL, GLA, QMUL(?)
    • CMS: Similar RAL, GLA(~ no local storage), QMUL (use IC)
    • DPM / xrootd; different libCurl versions
      • Push for xrootD sites now
      • Discussion on tokens occured
      • Users still tend to complain on not liking the grid.
    • GLA DirectIO possibility if WAN xrootd enabled. Needs a Cache inplace.
      • LAN - just to test that LAN works
      • WAN to be main focus
    • QMUL - prefer to wait (should already work, but server not quite powerful enough) for new hardware.
      • gridFTP, one main server that redirects for the actual transfers
      • WebDav, might need some configuration to do the same as gridFTP. One powerful node should however work fine.
  • Storageless Site test / storage decomissioning (Oxford)

    • Sam to get to Vip today updated config for Rules refinment.
    • ECDF ES xrootd cache monitoring up, but not seeing Xcache transfers.
  • ECDF volatile storage

    • Rob reconfigured the site
    • JW to make the necessary ATLAS updates
  • Glasgow DPM Decommissioning

    • LOCALGROUPDISK done; Datadisk residual data remain
  • ATLAS: Site Availability/Reliability reports: Glasgow

    • Is this done?

● News round-table

  • Vip

    • On leave next week
  • Dan

    • NTR
  • Matt

    • Disk server reboot - not come up;
      • Couple of TB data
      • 11k files might be declared lost
  • Peter

    • (Needed to leave before end)
  • Alessandra

    • NTR
  • Sam

    • NTR
  • Gareth

    • Will be leaving GridPP in April
  • JW

    • NTR
  • Patrick

    • NTR
  • Rob

    • NTR
There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m
        • 150820 UKI-LT2-RHUL less urgent in progress 2021-03-03 15:18:00 UKI-LT2-RHUL: 0% Transfer and deletion efficiencies
          • Power incident in center. Possible problems in bringing up some hardware.
        • 150775 UKI-SCOTGRID-GLASGOW less urgent in progress 2021-03-04 04:45:00 UKI-SCOTGRID-GLASGOW transfer and deletion errors
          • External link going down to 1 Gb
          • Many routes of campus; might be cause of issue
          • Association to large numbers of small files attempted to be transfered in via FTS
          • Restart of gridFTP services ‘cures’ problem
        • 149362 UKI-SOUTHGRID-RALPP urgent in progress 2021-02-18 20:00:00 ATLAS CE failures on UKI-SOUTHGRID-RALPP-heplnx207
          • No progress
        • 146651 RAL-LCG2 urgent on hold 2021-02-16 17:37:00 singularity and user NS setup at RAL
          • No progress
        • 142329 UKI-SOUTHGRID-SUSX top priority on hold 2021-01-20 20:29:00 CentOS7 migration UKI-SOUTHGRID-SUSX
          • Issue with arex, due to change of hypervisor
          • Networking should be ok now.
          • Potential issues for
          • A few (3/4) different hardware sets; Monitor each set as provisioned.
          • Patrick to update ticket once confirming network
      • CPU 5m

        New link for the site-oriented dashboard

        • General problem with non-aCT sites, due to Harvester issue on Sunday evening, casuing worldwide job reductions

        • Weds. another dip in central production; presumed due to Condor update (for ARC / GDPR fixes)

        • RAL

          • Job reductions due to:
            • Capped to 100% to enable more jobs for LHCb
            • ATLAS demands for more SCORE user jobs;
        • Northgrid

          • Lancs: Fairshare (stable to O(3k)), may want to try and bounce other users.
        • London

          • OK
        • SouthGrid

          • OK
        • Scotgrid

          • OK
        • BHAM also not running for LHCb, No update.

        • CAM to go into long downtime; long-term future to be discussed.

      • Other new issues / tasks 5m
    • 10:20 10:40
      Ongoing Items 20m
      • CentOS7 - Sussex

        • Making good progress now (see abouve )
      • TPC with http

        • Moving to WebDav for wan / lan / TPC.
        • LHCb enabled everywhere, except for RAL, GLA, QMUL(?)
        • CMS: Similar RAL, GLA(~ no local storage), QMUL (use IC)
        • DPM / xrootd; different libCurl versions
          • Push for xrootD sites now
          • Discussion on tokens occured
          • Users still tend to complain on not liking the grid.
        • GLA DirectIO possibility if WAN xrootd enabled. Needs a Cache inplace.
          • LAN - just to test that LAN works
          • WAN to be main focus
        • QMUL - prefer to wait (should already work, but server not quite powerful enough) for new hardware.
          • gridFTP, one main server that redirects for the actual transfers
          • WebDav, might need some configuration to do the same as gridFTP. One powerful node should however work fine.
      • Storageless Site test / storage decomissioning (Oxford)

        • Sam to get to Vip today updated config for Rules refinment.
        • ECDF ES xrootd cache monitoring up, but not seeing Xcache transfers.
      • ECDF volatile storage

        • Rob reconfigured the site
        • JW to make the necessary ATLAS updates
      • Glasgow DPM Decommissioning

        • LOCALGROUPDISK done; Datadisk residual data remain
      • ATLAS: Site Availability/Reliability reports: Glasgow

        • Is this done?
    • 10:40 10:50
      News round-table 10m
      • Vip

        • On leave next week
      • Dan

        • NTR
      • Matt

        • Disk server reboot - not come up;
          • Couple of TB data
          • 11k files might be declared lost
      • Peter

        • (Needed to leave before end)
      • Alessandra

        • NTR
      • Sam

        • NTR
      • Gareth

        • Will be leaving GridPP in April
      • JW

        • NTR
      • Patrick

        • NTR
      • Rob

        • NTR
    • 10:50 11:00
      AOB 10m