ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))

Outstanding tickets

  • 147299 UKI-NORTHGRID-LANCS-HEP less urgent in progress 2020-06-03 23:12:00 UKI-NORTHGRID-LANCS-HEP: deletion errors

    • Heading on-site to understand problem; possible the disk has died, ~ 10TB data loss
  • 146918 UKI-SCOTGRID-ECDF less urgent in progress 2020-06-02 10:46:00 Failovers from jobs running at UKI-SCOTGRID-ECDF_CLOUD to CERN backup proxy

    • Pushed back due to other Edingbugh priorities
  • 146771 UKI-SCOTGRID-ECDF less urgent on hold 2020-06-02 10:30:00 UKI-SCOTGRID-ECDF deletion failures with “The requested service is not available at the moment.”

    • Pushed back due to other Edingbugh priorities
  • 146651 RAL-LCG2 urgent in progress 2020-05-27 10:43:00 singularity and user NS setup at RAL

    • Work ongoing to use unprivleged mode.
  • 146525 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-05-15 16:12:00 UKI-NORTHGRID-SHEF-HEP: evicted jobs

    • Active interactions with NORDIGRID mailing lists; discussion on deprication on LCMAPs, and it’s possible replacements
  • 146374 UKI-NORTHGRID-SHEF-HEP urgent on hold 2020-05-15 16:11:00 ATLAS pilot jobs idle on UKI-NORTHGRID-SHEF-HEP CE

    • As above
  • 145688 UKI-NORTHGRID-MAN-HEP less urgent on hold 2020-04-02 09:20:00 Very old version of squids at UKI-NORTHGRID-MAN-HEP

    • On hold
  • 145510 RAL-LCG2 urgent on hold 2020-05-13 13:07:00 RAL-LCG2: timeouts on stage-in/outs

    • Will aim to close this week
  • 144759 UKI-SCOTGRID-GLASGOW less urgent on hold 2020-02-17 09:51:00 High traffic from UKI-SCOTGRID-GLASGOW on RAL CVMFS Stratum1

    • On hold
  • 142329 UKI-SOUTHGRID-SUSX top priority reopened 2020-06-01 08:27:00 CentOS7 migration UKI-SOUTHGRID-SUSX

    • to put on hold; awaiting rollout of all nodes; might require physical access

CPU

  • RAL

  • Northgrid

    • MAN - walltime units changed from seconds to minutes (within ATLAS); change reverted.
    • Lancaster’s drop was due to an IPv6 problem over the weekened
  • London

  • SouthGrid

  • Scotgrid

Other new issues

  • Request sent from ATLAS to restart squids due to residual issues with DB / frontier problems from previous weeks

Ongoing issues

  • CentOS7 DPM Lancs

    • No change to plans
  • CentOS7 - Sussex

    • As in GGUS discussion
  • Glasgow Ceph storage

    • xroot message and troubleshooting tricky.
    • External - should be ok (gridFTP, maybe also xrootd external),
      –Internal - bandwidth. 30GB/s 3x 10GB links.
  • Grand Unified queues

    • Awaiting Shefield

News round-table

  • Vip
  • Dan
    • LCMAPS will become deprecated, what will be the solution?
    • Updated mount points - perhaps higher rates of failures
  • Matt
    • NTR
  • Peter
    • Re-opening questions; Sites ; lots of online teaching; re-opening will be cautious
  • Alessandra
    • NTR
  • Sam
    • NTR
  • Gareth
    • NTR
  • Tim
    • TPC: running initially on wrong server; now on test (more allowed connections)
    • RAL as source is fine, RAL as dest. fails; two transfers trying to access same fail
    • If not as dest - it is not the active party; uses pulling, dest gets from the source
  • JW
    • NTR
There are minutes attached to this event. Show them.