ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))
Videoconference Rooms
ATLAS_UK_Cloud_Support_indico_233262
Name
ATLAS_UK_Cloud_Support_indico_233262
Description
Weekly ATLAS UK Cloud Support Meeting
Extension
109233262
Owner
Tim Adye
Auto-join URL
Useful links
Phone numbers

● Outstanding tickets

GGUS #145614 Timeouts also caused by CERN FTS falling over. changed memory allocation for xrootd, and added timeout of 2 mins. Now transfers all green.

GGUS #145688 Problem for many sites in UK accessing CERN Squids directly, rather than using the RAL Stratum 1. Manchester tries to contact RAL squid on IPV6, falls back to IPV4. Now seeing connections with RAL.
Alessandra will contact Jose at RAL again to check if IPv6 is enabled for squid (we believe it is) and if RAL is using the OS Squid or the Frontier version (we believe the latter).
Vip has upgraded Squids at Oxford, but still using IPv4.
Details can be seen on the WLCG Squid monitor page: http://wlcg-squid-monitor.cern.ch/awstats/bin/awstats.pl?month=02&year=2020&output=allhosts&config=cvmfsbproxy.cern.ch&framename=index


● CPU

QMUL hammercloud: not as many jobs running HC across the board. At QMUL, the jobs are failing with SIGCONT.


● Glasgow Ceph storage

Fixed a few niggles, but as of a few days ago, Grid transfers had problems talking to Ceph pool.
xrootd cache endpoint for reads still works, but writes don't work.
xrootd occasionally segfaults.
Using locally compiled version of xrootd (latest from 6 months ago), but will have to rebuild when upgrading to Nautilus.


● AOB

Alessandra: NETR
Dan: NETR
Elena:
    problem with rucio (mentioned on Rucio-support list).
    Is the recommendation to update Singularity to 3.5.3? Will ask Alessandra.
Gareth:
Will need to have fully moved to new server room + Ceph by June/July, requiring move of all storage by then.
James: NETR
Matt: jobs running out of memory at Lancaster - same user as in January.
Sam: NETR

Stewart: NETR
Tim: NETR
Vip: NETR

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
    • 10:20 10:40
      Ongoing issues 20m
      • CentOS7 - Sussex 5m
      • Glasgow Ceph storage 5m

        Fixed a few niggles, but as of a few days ago, Grid transfers had problems talking to Ceph pool.
        xrootd cache endpoint for reads still works, but writes don't work.
        xrootd occasionally segfaults.
        Using locally compiled version of xrootd (latest from 6 months ago), but will have to rebuild when upgrading to Nautilus.

      • Grand Unified queues 5m
    • 10:40 10:50
      News round-table 10m
    • 10:50 11:00
      AOB 10m

      Alessandra: NETR
      Dan: NETR
      Elena:
          problem with rucio (mentioned on Rucio-support list).
          Is the recommendation to update Singularity to 3.5.3? Will ask Alessandra.
      Gareth:
      Will need to have fully moved to new server room + Ceph by June/July, requiring move of all storage by then.
      James: NETR
      Matt: jobs running out of memory at Lancaster - same user as in January.
      Sam: NETR

      Stewart: NETR
      Tim: NETR
      Vip: NETR