Indico celebrates its 20th anniversary! Check our blog post for more information!

ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), James William Walder (Science and Technology Facilities Council STFC (GB))

 GGUS

- 146947   UKI-NORTHGRID-LANCS-HEP    
Timeout issues, disk servers getting overloaded; two new servers in preparation.

- 146918   UKI-SCOTGRID-ECDF
Squid failouvers; in progress 

- 146910 UKI-LT2-RHUL
Restrictions on physical access limit what can be done to fix OS problems on disk servers

- 146771 UKI-SCOTGRID-ECDF
(once downtime over):  Can ATLAS switch to using IPv4 http(s) to delete files against our DPM head node?
(i.e. can ATLAS not delete files over IPv6?)

- 146651 RAL-LCG2 singularity and user NS setup at RAL
AF and RAL experts put in contact with each other

- 146588 RAL-LCG2 Failovers from RAL-LCG2 to CERN CVMFS
 after upgrading all our squid servers to version 4, we are going to proceed now with the evaluation -and potential fix- of the ACL setup in their configuration. 

- 146525 UKI-NORTHGRID-SHEF-HEP
 Some problems in arc setup; asking for help. To be set on hold.

- 146523 UKI-NORTHGRID-MAN-HEP  timeouts 
external network connection being overloaded from another VO directly contacting FNAL; should be better now.

- 146374 UKI-NORTHGRID-SHEF-HEP ATLAS pilot jobs idle on
As above: To be set on hold

- 146159  UKI-SCOTGRID-GLASGOW Unaccessiböe files 
Main problematic disk lost, awaiting final loss declaration by ATLAS. Other two disks servers also causing problems, which hampers the much sort-after move to CEPH.

- 145688 UKI-NORTHGRID-MAN-HEP Very old version of squids at
No progress from last week

- 145510 RAL-LCG2: timeouts on stage-in/outs
Investigation on CPU efficiency defintions

- 144759 UKI-SCOTGRID-GLASGOW High traffic from UKI-SCOTGRID-GLASGOW
Ticket will be updated

 

CPU

RAL:
  On Weds. MCORE queue disables; Fairshare reduced, scheduler issues

## Northgrid
MAN Microboone - Copying from FNAL -> stopped; looking better now

 

Other new issues

NTR

 

Ongoing issues

CentOS7 - Sussex

- gfak2 missing dependency

Most likely source is missing HEP_OSlibs (https://gitlab.cern.ch/linuxsupport/rpms/HEP_OSlibs/-/blob/el7/README-el7.md) (some preference for removing this dependency in long-term was made).
Site to check installed RPMS (also e.g. lsb_release

Glasgow Ceph storage

Xrootd -> hot fix release; (previous 10.11; now 12 release hot fix, fixed Voms).

Grand Unified queues

Awaiting Shef, then can close

 

News round-table

- Vip
Going Diskless; planning in preparation, will announce to Atlas shortly

- Dan
one server that reboots, trying to set up a watchdog

- Matt
Downtime for end-of-June; Include also SL6/7 Centos upgrade (->Unique);  1-2 days expected

- Gareth
Drop the space token storage size made

-Alessandra
  Next step for singularity user NS -> sites to move.
-JW 
NTR

-Tim 
NTR

There are minutes attached to this event. Show them.
    • 10:00 10:20
      Status 20m
      • Outstanding tickets 10m

        - 146947   UKI-NORTHGRID-LANCS-HEP    
        Timeout issues, disk servers getting overloaded; two new servers in preparation.

        - 146918   UKI-SCOTGRID-ECDF
        Squid failouvers; in progress 

        - 146910 UKI-LT2-RHUL
        Restrictions on physical access limit what can be done to fix OS problems on disk servers

        - 146771 UKI-SCOTGRID-ECDF
        (once downtime over):  Can ATLAS switch to using IPv4 http(s) to delete files against our DPM head node?
        (i.e. can ATLAS not delete files over IPv6?)

        - 146651 RAL-LCG2 singularity and user NS setup at RAL
        AF and RAL experts put in contact with each other

        - 146588 RAL-LCG2 Failovers from RAL-LCG2 to CERN CVMFS
         after upgrading all our squid servers to version 4, we are going to proceed now with the evaluation -and potential fix- of the ACL setup in their configuration. 

        - 146525 UKI-NORTHGRID-SHEF-HEP
         Some problems in arc setup; asking for help. To be set on hold.

        - 146523 UKI-NORTHGRID-MAN-HEP  timeouts 
        external network connection being overloaded from another VO directly contacting FNAL; should be better now.

        - 146374 UKI-NORTHGRID-SHEF-HEP ATLAS pilot jobs idle on
        As above: To be set on hold

        - 146159  UKI-SCOTGRID-GLASGOW Unaccessiböe files 
        Main problematic disk lost, awaiting final loss declaration by ATLAS. Other two disks servers also causing problems, which hampers the much sort-after move to CEPH.

        - 145688 UKI-NORTHGRID-MAN-HEP Very old version of squids at
        No progress from last week

        - 145510 RAL-LCG2: timeouts on stage-in/outs
        Investigation on CPU efficiency defintions

        - 144759 UKI-SCOTGRID-GLASGOW High traffic from UKI-SCOTGRID-GLASGOW
        Ticket will be updated

         

      • CPU 5m

        RAL:
          On Weds. MCORE queue disables; Fairshare reduced, scheduler issues

        ## Northgrid
        MAN Microboone - Copying from FNAL -> stopped; looking better now

         

      • Other new issues 5m
    • 10:20 10:40
      Ongoing issues 20m

      CentOS7 - Sussex

      - gfak2 missing dependency

      Most likely source is missing HEP_OSlibs (https://gitlab.cern.ch/linuxsupport/rpms/HEP_OSlibs/-/blob/el7/README-el7.md) (some preference for removing this dependency in long-term was made).
      Site to check installed RPMS (also e.g. lsb_release

      Glasgow Ceph storage

      Xrootd -> hot fix release; (previous 10.11; now 12 release hot fix, fixed Voms).

      Grand Unified queues

      Awaiting Shef, then can close

       

    • 10:40 10:50
      News round-table 10m

      - Vip
      Going Diskless; planning in preparation, will announce to Atlas shortly

      - Dan
      one server that reboots, trying to set up a watchdog

      - Matt
      Downtime for end-of-June; Include also SL6/7 Centos upgrade (->Unique);  1-2 days expected

      - Gareth
      Drop the space token storage size made

      -Alessandra
        Next step for singularity user NS -> sites to move.
      -JW 
      NTR

      -Tim 
      NTR

    • 10:50 11:00
      AOB 10m

      No objections raise to continue with Vidyo for the time being.