Indico celebrates its 20th anniversary! Check our blog post for more information!

ATLAS UK Cloud Support



Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))


Timeout issues, disk servers getting overloaded; two new servers in preparation.

Squid failouvers; in progress 

- 146910 UKI-LT2-RHUL
Restrictions on physical access limit what can be done to fix OS problems on disk servers

(once downtime over):  Can ATLAS switch to using IPv4 http(s) to delete files against our DPM head node?
(i.e. can ATLAS not delete files over IPv6?)

- 146651 RAL-LCG2 singularity and user NS setup at RAL
AF and RAL experts put in contact with each other

- 146588 RAL-LCG2 Failovers from RAL-LCG2 to CERN CVMFS
 after upgrading all our squid servers to version 4, we are going to proceed now with the evaluation -and potential fix- of the ACL setup in their configuration. 

 Some problems in arc setup; asking for help. To be set on hold.

- 146523 UKI-NORTHGRID-MAN-HEP  timeouts 
external network connection being overloaded from another VO directly contacting FNAL; should be better now.

- 146374 UKI-NORTHGRID-SHEF-HEP ATLAS pilot jobs idle on
As above: To be set on hold

- 146159  UKI-SCOTGRID-GLASGOW Unaccessiböe files 
Main problematic disk lost, awaiting final loss declaration by ATLAS. Other two disks servers also causing problems, which hampers the much sort-after move to CEPH.

- 145688 UKI-NORTHGRID-MAN-HEP Very old version of squids at
No progress from last week

- 145510 RAL-LCG2: timeouts on stage-in/outs
Investigation on CPU efficiency defintions

Ticket will be updated



  On Weds. MCORE queue disables; Fairshare reduced, scheduler issues

## Northgrid
MAN Microboone - Copying from FNAL -> stopped; looking better now


Other new issues



Ongoing issues

CentOS7 - Sussex

- gfak2 missing dependency

Most likely source is missing HEP_OSlibs ( (some preference for removing this dependency in long-term was made).
Site to check installed RPMS (also e.g. lsb_release

Glasgow Ceph storage

Xrootd -> hot fix release; (previous 10.11; now 12 release hot fix, fixed Voms).

Grand Unified queues

Awaiting Shef, then can close


News round-table

- Vip
Going Diskless; planning in preparation, will announce to Atlas shortly

- Dan
one server that reboots, trying to set up a watchdog

- Matt
Downtime for end-of-June; Include also SL6/7 Centos upgrade (->Unique);  1-2 days expected

- Gareth
Drop the space token storage size made

  Next step for singularity user NS -> sites to move.


There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m

        - 146947   UKI-NORTHGRID-LANCS-HEP    
        Timeout issues, disk servers getting overloaded; two new servers in preparation.

        - 146918   UKI-SCOTGRID-ECDF
        Squid failouvers; in progress 

        - 146910 UKI-LT2-RHUL
        Restrictions on physical access limit what can be done to fix OS problems on disk servers

        - 146771 UKI-SCOTGRID-ECDF
        (once downtime over):  Can ATLAS switch to using IPv4 http(s) to delete files against our DPM head node?
        (i.e. can ATLAS not delete files over IPv6?)

        - 146651 RAL-LCG2 singularity and user NS setup at RAL
        AF and RAL experts put in contact with each other

        - 146588 RAL-LCG2 Failovers from RAL-LCG2 to CERN CVMFS
         after upgrading all our squid servers to version 4, we are going to proceed now with the evaluation -and potential fix- of the ACL setup in their configuration. 

        - 146525 UKI-NORTHGRID-SHEF-HEP
         Some problems in arc setup; asking for help. To be set on hold.

        - 146523 UKI-NORTHGRID-MAN-HEP  timeouts 
        external network connection being overloaded from another VO directly contacting FNAL; should be better now.

        - 146374 UKI-NORTHGRID-SHEF-HEP ATLAS pilot jobs idle on
        As above: To be set on hold

        - 146159  UKI-SCOTGRID-GLASGOW Unaccessiböe files 
        Main problematic disk lost, awaiting final loss declaration by ATLAS. Other two disks servers also causing problems, which hampers the much sort-after move to CEPH.

        - 145688 UKI-NORTHGRID-MAN-HEP Very old version of squids at
        No progress from last week

        - 145510 RAL-LCG2: timeouts on stage-in/outs
        Investigation on CPU efficiency defintions

        - 144759 UKI-SCOTGRID-GLASGOW High traffic from UKI-SCOTGRID-GLASGOW
        Ticket will be updated


      • CPU 5m

          On Weds. MCORE queue disables; Fairshare reduced, scheduler issues

        ## Northgrid
        MAN Microboone - Copying from FNAL -> stopped; looking better now


      • Other new issues 5m
    • 10:20 10:40
      Ongoing issues 20m

      CentOS7 - Sussex

      - gfak2 missing dependency

      Most likely source is missing HEP_OSlibs ( (some preference for removing this dependency in long-term was made).
      Site to check installed RPMS (also e.g. lsb_release

      Glasgow Ceph storage

      Xrootd -> hot fix release; (previous 10.11; now 12 release hot fix, fixed Voms).

      Grand Unified queues

      Awaiting Shef, then can close


    • 10:40 10:50
      News round-table 10m

      - Vip
      Going Diskless; planning in preparation, will announce to Atlas shortly

      - Dan
      one server that reboots, trying to set up a watchdog

      - Matt
      Downtime for end-of-June; Include also SL6/7 Centos upgrade (->Unique);  1-2 days expected

      - Gareth
      Drop the space token storage size made

        Next step for singularity user NS -> sites to move.


    • 10:50 11:00
      AOB 10m

      No objections raise to continue with Vidyo for the time being.