Operations team & Sites

Name: Operations team & Sites
Start: 2018-01-23T11:00:00+00:00
End: 2018-01-23T12:30:00+00:00
Location: EVO - GridPP Operations team meeting

Tuesday 23 Jan 2018, 11:00 → 12:30 Europe/London

EVO - GridPP Operations team meeting

Description

- This is the weekly GridPP ops & sites meeting - The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6 -- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers. -- The London (UK) service is on +44 (0)161 306 6802. Phone bridge ID 1001002 -- The meeting extension is 109308582. PIN 1234 Chair: Jeremy Minutes: Apologies:

Hide

GridPP Ops minutes 23 Jan 2018

==============================

Experiments

-----------

LHCb: Broadcast about new cvmfs mount point (/cvmfs/lhcb-condb.cern.ch/) but

should be automatic if cvmfs repos are automounted.

Lost files ticket at IC, due to lost server. LHCb needs to follow up

https://ggus.eu/?mode=ticket_info&ticket_id=132692

CMS: xrootd problems. Tickets about xrootd related to presence of IPv6 support

at the site.

ATLAS: Storage overloading ticket at Glasgow? Increased number of

connections allowed per SE.

Ticket for Sheffield, closed and now reopened.

Ticket opened for RAL at the weekend, about ARC CE 03 instability

Deletion errors ticket at RAL.

RAL ticket about problem with transfers, due to overload?

IC configuration change so can write to QMUL disk. Should be ok

within existing QMUL 10Gb/s link.

Others: https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator

Request for more capacity for SOLID

GridPP DIRAC status: only a couple of sites not run recently

GFAL vs DIRAC problem still being understood

Birmingham going to remove CREAM GridPP site

and rely on Vac GridPP site

Meetings and updates

--------------------

(Points not already mentioned on this week's bulletin)

http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

General updates

---------------

Do CMS and ATLAS Singularity requirements match? Glasgow will be

a test of this as they have both.

Tier-1

------

ATLAS CASTOR at RAL back and believed to be ok now.

Security

--------

https://wiki.egi.eu/wiki/SVG:Meltdown_and_Spectre_Vulnerabilities has links

Intel have fix for instability introduced by microcode changes

Sites are requested to monitor the situation eg via that wiki page and apply

appropriate updates on the timescales requested

Services

--------

Perfsonar tests show order of magnitude differences between sites. Some

sites have been contacted for more information.

Steve Lloyd's network tests also available:

http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest_lcg.html

Have to be careful because sites may optimise for SE to remote SE rather

than SE to remote WN at random site. eg WNs might be on NAT with a good

connection internally but poorer route to the WAN. Some experiments (eg

CMS and LHCb are streaming already. eg LHCb failover or when stripping

at larger Tier-2s where data is streamed in and out of the WNs without

using site storage.)

Tickets

-------

See detailed Bulletin comments

GDB review

----------

Please look at the agenda https://indico.cern.ch/event/651349/ for

links to slides

Chatroom log

------------

Daniela Bauer: (23/01/2018 11:04)

https://ggus.eu/?mode=ticket_info&ticket_id=132692

Raja Nandakumar: (11:12 AM)

Thanks Daniela

Jeremy Coles: (11:16 AM)

https://www.gridpp.ac.uk/wiki/LZ

John Hill: (11:19 AM)

Aren't RHUL in downtime?

Duncan Rand: (11:19 AM)

Yes.. they are in downtime for Network maintenance for last 5 days.

Jeremy Coles: (11:20 AM)

Yes. But intermittent.

https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

David Crooks: (11:35 AM)

http://operations-portal.egi.eu/vapor/resources/GL2ResSummaryServicesDetail?ngi=NGI_UK

https://wiki.egi.eu/w/index.php?title=IPV6_Assessment

https://wiki.egi.eu/wiki/SVG:Meltdown_and_Spectre_Vulnerabilities

Mark Slater: (11:41 AM)

I'm afraid I've got to head off - email me of there's any bham specific stuff!

Jeremy Coles: (11:45 AM)

http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest_lcg.html

Paige Winslowe Lacesso: (11:59 AM)

Sorry sorry, must go - email me if any brizzle-specific data

Jeremy Coles: (12:01 PM)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=132876

https://indico.cern.ch/event/651349/

Daniela Bauer: (12:21 PM)

Sorry, I've got to go.

Jeremy Coles: (12:21 PM)

https://indico.cern.ch/event/686369/

There are minutes attached to this event. Show them.

- 11:00 → 11:01
  Ops meeting minutes 1m
  - This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.
  - The team composition has been changing. If everybody contributes then the task comes around less often.
  - Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.
  - Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).
  - Upcoming allocations:
  23rd Jan
  30th Jan
  6th Feb
- 11:01 → 11:20
  Experiment problems/issues 19m
  Review of weekly issues by experiment/VO
  - LHCb
  - CMS
    T1: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
    T2: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel
    CERN blames their IPV6 problems on Brunel: https://ggus.eu/?mode=ticket_info&ticket_id=132876
    AAA Problems at Bristol: https://ggus.eu/?mode=ticket_info&ticket_id=132990
    More xrootd fun at RAL-LCG2: https://ggus.eu/?mode=ticket_info&ticket_id=132802
  - ATLAS
  - Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.
  - GridPP DIRAC status [Andrew McNab]
    -- https://www.gridpp.ac.uk/gridpp-dirac-sam
- 11:20 → 11:40
  Meetings & updates 20m
  With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest
  - General updates
  - WLCG ops coordination
  - Tier-1 status
  - Storage and data management
  - Tier-2 Evolution
  - Accounting
  - Documentation
  - Interoperation
  - Monitoring
  - On-duty
  - Security
  - Services
  - Tickets
  - Tools
  - VOs
  - Site updates
- 11:40 → 12:20
  Discussion topics 40m
  - Review GDB topics: https://indico.cern.ch/event/651349/
  - Follow-up from HEPSYSMAN
- 12:20 → 12:25
  Actions & AOB 5m
  - Q417 Tier-2 reports reminder