Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description

- This is the weekly GridPP ops & sites meeting - The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6 -- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers. -- The London (UK) service is on +44 (0)161 306 6802. Phone bridge ID 1001002 -- The meeting extension is 109308582. PIN 1234 Chair: Jeremy Minutes:  Apologies:

Videoconference Rooms
GridPP-Operations
Name
GridPP-Operations
Description
- This is the weekly GridPP ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the Janet(UK) Community area. Direct link http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MDMaM82v2nD2Du999sD99D - The phone bridge number is +44 (0)161 306 6802. The phone bridge ID is 1001002 with code: 4880. Apologies:
Extension
109308582
Owner
Alessandra Forti
Auto-join URL
Useful links
Phone numbers

GridPP Ops minutes 23 Jan 2018

==============================

 

Experiments

-----------

 

LHCb: Broadcast about new cvmfs mount point (/cvmfs/lhcb-condb.cern.ch/) but

       should be automatic if cvmfs repos are automounted.

      Lost files ticket at IC, due to lost server. LHCb needs to follow up 

       https://ggus.eu/?mode=ticket_info&ticket_id=132692

CMS: xrootd problems. Tickets about xrootd related to presence of IPv6 support

     at the site.

ATLAS: Storage overloading ticket at Glasgow? Increased number of

       connections allowed per SE.

       Ticket for Sheffield, closed and now reopened.

       Ticket opened for RAL at the weekend, about ARC CE 03 instability

       Deletion errors ticket at RAL.

       RAL ticket about problem with transfers, due to overload?

       IC configuration change so can write to QMUL disk. Should be ok

        within existing QMUL 10Gb/s link.

 

Others: https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator

        Request for more capacity for SOLID

 

GridPP DIRAC status: only a couple of sites not run recently

                     GFAL vs DIRAC problem still being understood

                     Birmingham going to remove CREAM GridPP site 

                       and rely on Vac GridPP site

 

Meetings and updates

--------------------

(Points not already mentioned on this week's bulletin)

http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

 

General updates

---------------

Do CMS and ATLAS Singularity requirements match? Glasgow will be

a test of this as they have both.

 

Tier-1

------

ATLAS CASTOR at RAL back and believed to be ok now.

 

Security

--------

https://wiki.egi.eu/wiki/SVG:Meltdown_and_Spectre_Vulnerabilities has links

Intel have fix for instability introduced by microcode changes

Sites are requested to monitor the situation eg via that wiki page and apply

appropriate updates on the timescales requested

 

Services

--------

Perfsonar tests show order of magnitude differences between sites. Some

sites have been contacted for more information.

 

Steve Lloyd's network tests also available:

http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest_lcg.html

 

Have to be careful because sites may optimise for SE to remote SE rather

than SE to remote WN at random site. eg WNs might be on NAT with a good

connection internally but poorer route to the WAN. Some experiments (eg

CMS and LHCb are streaming already. eg LHCb failover or when stripping

at larger Tier-2s where data is streamed in and out of the WNs without

using site storage.)

 

Tickets

-------

See detailed Bulletin comments

 

GDB review

----------

Please look at the agenda https://indico.cern.ch/event/651349/ for

links to slides

 

Chatroom log

------------

 

Daniela Bauer: (23/01/2018 11:04)

https://ggus.eu/?mode=ticket_info&ticket_id=132692

Raja Nandakumar: (11:12 AM)

Thanks Daniela

Jeremy Coles: (11:16 AM)

https://www.gridpp.ac.uk/wiki/LZ

John Hill: (11:19 AM)

Aren't RHUL in downtime?

Duncan Rand: (11:19 AM)

Yes.. they are in downtime for Network maintenance for last 5 days.

Jeremy Coles: (11:20 AM)

Yes. But intermittent. 

https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

David Crooks: (11:35 AM)

http://operations-portal.egi.eu/vapor/resources/GL2ResSummaryServicesDetail?ngi=NGI_UK

https://wiki.egi.eu/w/index.php?title=IPV6_Assessment

https://wiki.egi.eu/wiki/SVG:Meltdown_and_Spectre_Vulnerabilities

Mark Slater: (11:41 AM)

I'm afraid I've got to head off - email me of there's any bham specific stuff!

Jeremy Coles: (11:45 AM)

http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest_lcg.html

Paige Winslowe Lacesso: (11:59 AM)

Sorry sorry, must go - email me if any brizzle-specific data

Jeremy Coles: (12:01 PM)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=132876

https://indico.cern.ch/event/651349/

Daniela Bauer: (12:21 PM)

Sorry, I've got to go.

Jeremy Coles: (12:21 PM)

https://indico.cern.ch/event/686369/

 

There are minutes attached to this event. Show them.