ATLAS UK Cloud Support



Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))
Videoconference Rooms
Weekly ATLAS UK Cloud Support Meeting
Tim Adye
Auto-join URL
Useful links
Phone numbers

Outstanding tickets

  • 147299 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-06-03 23:12:00 UKI-NORTHGRID-LANCS-HEP: deletion errors

    • Heading on-site to understand problem; possible the disk has died, ~ 10TB data loss
  • 146918 UKI-SCOTGRID-ECDF less urgent in progress 2020-06-02 10:46:00 Failovers from jobs running at UKI-SCOTGRID-ECDF_CLOUD to CERN backup proxy

    • Pushed back due to other Edingbugh priorities
  • 146771 UKI-SCOTGRID-ECDF less urgent on hold 2020-06-02 10:30:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”

    • Pushed back due to other Edingbugh priorities
  • 146651 RAL-LCG2 urgent in progress 2020-05-27 10:43:00 singularity and user NS setup at RAL

    • Work ongoing to use unprivleged mode.
  • 146525 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-05-15 16:12:00 UKI-NORTHGRID-SHEF-HEP: evicted jobs

    • Active interactions with NORDIGRID mailing lists; discussion on deprication on LCMAPs, and it’s possible replacements
  • 146374 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-05-15 16:11:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE

    • As above
  • 145688 UKI-NORTHGRID-MAN-HEP less urgent on hold 2020-04-02 09:20:00 Very old version of squids at UKI-NORTHGRID-MAN-HEP

    • On hold
  • 145510 RAL-LCG2 urgent on hold 2020-05-13 13:07:00 RAL-LCG2: timeouts on stage-in/outs

    • Will aim to close this week
  • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-02-17 09:51:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1

    • On hold
  • 142329 UKI-SOUTHGRID-SUSX top priority reopened 2020-06-01 08:27:00 CentOS7 migration UKI-SOUTHGRID-SUSX

    • to put on hold; awaiting rollout of all nodes; might require physical access


  • RAL

  • Northgrid

    • MAN - walltime units changed from seconds to minutes (within ATLAS); change reverted.
    • Lancaster’s drop was due to an IPv6 problem over the weekened
  • London

  • SouthGrid

  • Scotgrid

Other new issues

  • Request sent from ATLAS to restart squids due to residual issues with DB / frontier problems from previous weeks

Ongoing issues

  • CentOS7 DPM Lancs

    • No change to plans
  • CentOS7 - Sussex

    • As in GGUS discussion
  • Glasgow Ceph storage

    • xroot message and troubleshooting tricky.
    • External - should be ok (gridFTP, maybe also xrootd external),
      –Internal - bandwidth. 30GB/s 3x 10GB links.
  • Grand Unified queues

    • Awaiting Shefield

News round-table

  • Vip
  • Dan
    • LCMAPS will become deprecated, what will be the solution?
    • Updated mount points - perhaps higher rates of failures
  • Matt
    • NTR
  • Peter
    • Re-opening questions; Sites ; lots of online teaching; re-opening will be cautious
  • Alessandra
    • NTR
  • Sam
    • NTR
  • Gareth
    • NTR
  • Tim
    • TPC: running initially on wrong server; now on test (more allowed connections)
    • RAL as source is fine, RAL as dest. fails; two transfers trying to access same fail
    • If not as dest - it is not the active party; uses pulling, dest gets from the source
  • JW
    • NTR
There are minutes attached to this event. Show them.