Operations team & Sites

EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting


- This is the weekly GridPP ops & sites meeting - The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6 -- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers. -- The London (UK) service is on +44 (0)161 306 6802. Phone bridge ID 1001002 -- The meeting extension is 109308582. PIN 1234 Chair: Jeremy Minutes:  Apologies:

GridPP Ops minutes 23 Jan 2018






LHCb: Broadcast about new cvmfs mount point (/cvmfs/lhcb-condb.cern.ch/) but

       should be automatic if cvmfs repos are automounted.

      Lost files ticket at IC, due to lost server. LHCb needs to follow up 


CMS: xrootd problems. Tickets about xrootd related to presence of IPv6 support

     at the site.

ATLAS: Storage overloading ticket at Glasgow? Increased number of

       connections allowed per SE.

       Ticket for Sheffield, closed and now reopened.

       Ticket opened for RAL at the weekend, about ARC CE 03 instability

       Deletion errors ticket at RAL.

       RAL ticket about problem with transfers, due to overload?

       IC configuration change so can write to QMUL disk. Should be ok

        within existing QMUL 10Gb/s link.


Others: https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator

        Request for more capacity for SOLID


GridPP DIRAC status: only a couple of sites not run recently

                     GFAL vs DIRAC problem still being understood

                     Birmingham going to remove CREAM GridPP site 

                       and rely on Vac GridPP site


Meetings and updates


(Points not already mentioned on this week's bulletin)



General updates


Do CMS and ATLAS Singularity requirements match? Glasgow will be

a test of this as they have both.




ATLAS CASTOR at RAL back and believed to be ok now.




https://wiki.egi.eu/wiki/SVG:Meltdown_and_Spectre_Vulnerabilities has links

Intel have fix for instability introduced by microcode changes

Sites are requested to monitor the situation eg via that wiki page and apply

appropriate updates on the timescales requested




Perfsonar tests show order of magnitude differences between sites. Some

sites have been contacted for more information.


Steve Lloyd's network tests also available:



Have to be careful because sites may optimise for SE to remote SE rather

than SE to remote WN at random site. eg WNs might be on NAT with a good

connection internally but poorer route to the WAN. Some experiments (eg

CMS and LHCb are streaming already. eg LHCb failover or when stripping

at larger Tier-2s where data is streamed in and out of the WNs without

using site storage.)




See detailed Bulletin comments


GDB review


Please look at the agenda https://indico.cern.ch/event/651349/ for

links to slides


Chatroom log



Daniela Bauer: (23/01/2018 11:04)


Raja Nandakumar: (11:12 AM)

Thanks Daniela

Jeremy Coles: (11:16 AM)


John Hill: (11:19 AM)

Aren't RHUL in downtime?

Duncan Rand: (11:19 AM)

Yes.. they are in downtime for Network maintenance for last 5 days.

Jeremy Coles: (11:20 AM)

Yes. But intermittent. 


David Crooks: (11:35 AM)




Mark Slater: (11:41 AM)

I'm afraid I've got to head off - email me of there's any bham specific stuff!

Jeremy Coles: (11:45 AM)


Paige Winslowe Lacesso: (11:59 AM)

Sorry sorry, must go - email me if any brizzle-specific data

Jeremy Coles: (12:01 PM)



Daniela Bauer: (12:21 PM)

Sorry, I've got to go.

Jeremy Coles: (12:21 PM)



There are minutes attached to this event. Show them.
    • 11:00 AM 11:01 AM
      Ops meeting minutes 1m
      • This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.

      • The team composition has been changing. If everybody contributes then the task comes around less often.

      • Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.

      • Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).

      • Upcoming allocations:

      23rd Jan
      30th Jan
      6th Feb

    • 11:01 AM 11:20 AM
      Experiment problems/issues 19m

      Review of weekly issues by experiment/VO

      • LHCb

      • CMS
        T1: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
        T2: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel
        CERN blames their IPV6 problems on Brunel: https://ggus.eu/?mode=ticket_info&ticket_id=132876
        AAA Problems at Bristol: https://ggus.eu/?mode=ticket_info&ticket_id=132990
        More xrootd fun at RAL-LCG2: https://ggus.eu/?mode=ticket_info&ticket_id=132802

      • ATLAS

      • Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.

      • GridPP DIRAC status [Andrew McNab]
        -- https://www.gridpp.ac.uk/gridpp-dirac-sam

    • 11:20 AM 11:40 AM
      Meetings & updates 20m

      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

      • General updates
      • WLCG ops coordination
      • Tier-1 status
      • Storage and data management
      • Tier-2 Evolution
      • Accounting
      • Documentation
      • Interoperation
      • Monitoring
      • On-duty
      • Security
      • Services
      • Tickets
      • Tools
      • VOs
      • Site updates
    • 11:40 AM 12:20 PM
      Discussion topics 40m
      • Review GDB topics: https://indico.cern.ch/event/651349/
      • Follow-up from HEPSYSMAN
    • 12:20 PM 12:25 PM
      Actions & AOB 5m
      • Q417 Tier-2 reports reminder