HEPiX IPv6 working group F2F meeting

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
Description

Timings are approximate. Draft agenda - suggestions for topics welcome.

Please register if you plan to attend in person.

 

 

 

Registration
Participants

IPv6 WG F2F – Cern 2/3 May 2019 - Notes

Agenda: https://indico.cern.ch/event/797500/

Thursday 2nd May 2019 Afternoon

(notes by Martin)

F2F: Edoardo, Dave K, Martin, Duncan, Costin, Coralie, Francesco, Andrea, Bruno, Marian, Kars

Online: Jiri Chudoba, Catalin Condurache.

Review minutes and actions and other urgent topics: (DK et al).

F2F in Jan, two other Vidyo conferences since. EM chaired second of these.  F2F: can we incentivise the T2s to move?  March 7th VC: Francesco’s ipv6 dCache patch. Needs more testing. Small traffic level on IPv6 at BNL because only a few of the US-Atlas T2s are dual stack.  Not much we can do to incentivise.  CentOS7 for latest perfSONAR nodes. April 7th VC: RAL packet loss.  Fermilab still need to turn on IPv6. Abstract for CHEP 2019. CHEP 2018 paper submission – done, awaiting publication.

DK: CHEP abstracts – what are we going to concentrate on in the coming year? When do we disband?  Are there still problems to solve?

Round Table updates: All

Jiri Chudoba: Replaced old cisco router with new catalyst 9500.  3 x 10Gb -> 100Gb.  Everything worked. Now IPv6 works but performance isn’t sufficient – NAT is limited and the table sizes in the routers. No issues with ATLAs for a few days but then jobs not working optimally – not all cert authorities offer CRLs via IPv6 so have implemented a local Squid to cache. ALICE problem: can’t IPv6 use this so have an external NAT box.  Question: Smallest IPv4 subnet on LHCONE?  Very small is OK – no rule but /28 is probably the current smallest.  Summary: IPv6 very useful.  Also support DUNE + …

Catalin Condurache: Nothing from RAL. Brief update from colleagues from Romania: no useful developments – failed test for routing for all traffic when they try to do IPv6 over LHCONE. Andrea has no further information.  Need to add information to the relevant ticket.

Edoardo: Started to deploy Juniper routers to replace brocade/HPs – finding and fixing several bugs (again) in the new kit, specifically DHCPv6. .  Lxplus now IPv6.

Duncan: Seems that Atlas needs to use SW v21 to do ipv6-only to make it work.  Dual stack works. Rediscovered problem with Atlas FTS hosts designated as site-unknown in site monitoring – using the wrong VO feed for monitoring.  Atlas distributed computing team have no interest in this. Tim and Duncan have discovered that Globus-Direct is IPv4 only – widely used in the US. This would be a problem in a world where the US HPC facilities are accessed using Globus tools. On the developers list of things to do but not high priority atm.

Bruno: Alice is not ready yet but site is ready with full dual-stack on the Alice components.  IPV4 showing better performance that IPv6, same two end points (SARA, local.)

Francesco: Frascati have closed their ticket, assume they have done IPv6 rollout.  Rome site has Atlas done but CMS on hold! Need to push a bit. Still waiting on news of testing of the dCache patches. Simple set and two more sophisticated patch sets in case  system is sensitive to the simple patch.

Marian: Later

Kars: Wide area connectivity bandwidth now 100Gb with IPv6. 40-50% IPv6.

Andrea: No updates from CMs or the tier2 uptake.

Costin: 60% of Alice storage on ipv6. Bad news: slow degrading of connectivity – usually default routes disappear – 2 minute Xrootd failover cause 4 min delays om very file open. Fun debug. Tuning the timeouts to much less than defaults. Client-side.  Users are seeing these issues.  Discussion- how is the loss of routes happening – manually set routes disappearing from the storage hosts. Storage sites at Bari & Subatech.

Martin: RAL still has problem with packet loss on IPv6 traffic on to Janet.

Tea/Coffee

Tier 2 status: Andrea Sciaba

Andrea discussed the T2 IPv6 conversion status. Overall 65%. Some sites delayed due to not having a sufficient business/science case to move. Discussion on motiving the sites.

Monitoring including perfSONAR & ETF: Marian Babik

perSONAR news: 4.1.6 last f the 4.1 series.  Next 4.2 has pre-emptive scheduling and gridftp.  Ongoing campaign to update all endpoints to CC/CentOS7 and 4.1.  UK and FR meshes in good shape.  Various issues to resolve elsewhere. Plans for 100Gb/s capable perfSONAR – various sites have more end points with more than 10Gb connectivity.  For 100Gb, Mellanox ConnectX cards OK, CPU’s important, USNET have documented tuning etc.  Some updates for central services: maddash more stable, new collector in production. New projects SAND and IRIS-HEP started, objective to publish metrics direct from toolkits, new analytics and visualisation.  Note on OCRE cloud testing (Geant project to procure resources from public cloud providers.)  Review of developments to the persfSonar configuration (done, dual-stack mesh removed), monitoring (new ipv4/ipv6 efficiency tests) and dashboards (maddash and Grafana improvements). Notes on ETF activities: Atlas IPv6-only more stable, SAM ready – aggregating IPv6 results and computing either IPv6-only or IPv4/IPv6 profiles.

Abstract for CHEP2019

Discussion of what to put forward as an abstract for CHEP 2019.  See later.

 

 

Friday 3rd May 2019 Morning

(notes by Duncan)

F2F: Kars, Bruno, Francesco, Martin, David, Duncan, Edoardo, Costin. Apologies: Andrea.

Online: Jiri, Dave Crooks (for security discussion)

Review agenda

Dave would like to discuss the methodology for calculating the proportion of traffic going via ipv6.  Clearly need to decide what the Chep abstract should be and what the future focus of the group should be.

Data Transfer & network monitoring - Is IPv6 preference working?

Stats show a big  increase in March to 36% form 23% in February as seen at Cern T0 (Sflow data from the router.)  Traffic preference for T0-T1 is via OPN not LHCONE. PIC cannot do VLAN separation so preffered method of monitoring traffic isn’t practical.   Not obvious which site may be responsible but could be BNL. FTS – discussion of failure state to report status of transfers as ipv4 or ipv6 or fail.  Jiri – true that most OPN traffic is FTS? Except Xrootd. Alice not using FTS.  Duncan showed results from Nebraska which is the only SE that has the relevant patch (Xrootd) to record and report correctly. Is able to record the ipv4/ipv6 stats.  Need to encourage the installation of the patch to obtain useful view of Xrootd traffic. Alice need to upgrade the AliRoot client to use xrrotdv4. Old client needs to be completely upgraded to be able to use v4. Will be this year.  OSG Xrootd developers aware of need to fix up some of the dashboards.

IPv6 Security: Discussion of Internet Society document

Francesco introduced the IS document.  Called attention to the 2nd para of the Introduction.  The statements seem to be very much aimed at scaring security officers. Francesco points out there are no statistics that show incident numbers have gone up because of running ipv6.  Noted that RAGuard (recommended mitigation) on Cisco is broken and doesn’t work – and seems many not be fixed. Paper doesn’t cite Cisco book.  Overall flavour of the paper is not optimistic.   RA not generally in the server environment  DCs but CERN have seen VMs with virtual switches doing RA.  Rogue DCHP servers a problem whether one is IPv4 or IPv6.  Yes, the code is new but it’s not a fundamental change so from what we already do.  Code will get fixed. DC: next steps?  DK: probably discuss with Tim Chown.  

Tea/Coffee

Dates of Next Meetings:

Next F2F: 17/18 September 2019.

VC: 1hr Thu 27th June, 16:00 CEST.

VC: 1hr Thu 25th July, 16:00 CEST.

Tier 0/1, LHCOPN and LHCONE status: Bruno Hoeft (KIT)

No change in status.  Russia KI-KRI– acknowledged ticket but no one actually engaging with IPv6 - have peering but no services behind it. Affects Alice and CMS?  Alice @ KISTI missing from Grafana – they are IPv6 enabled but no traffic registered because (probably) it’s Alice and they’re not using ipv6. KISTI has IPv6 perfSONAR results.  WhenFermi turns on soon, then only the Russians will be left.

DK: Andrea doing calculations based on experiment pledges.  Has data, but may be overestimating.  Need to ask other experiments to track how much data is transferred by v4/v6. Costin: Alice can do it (when change underway.)

No more news on LHCOPN/ONE.

Plans for CHEP2019 - Adelaide - Nov 2019.

Agreed that paper should be about moving to IPv6 only.   Move LHCOPN to IPv6 only.  Could use multiple ipv6 addresses on servers to distinguish the traffic groups, useful in LHCONE as more and more communities join. Discussion of issues surrounding ipv6-only WNs.  Try to push ipv6-only WNs.  Might need Nat64 on boarders.  Moving NATed ipv4 to dual stack IPv6 with no NAT would gain performance as load on NAT reduces. KIT has NATed WNs – could act as test site for dual-stack  ipv6 WN?  Dual-stack ipv6 will come with SL7 in the config system. Lots of the CVMFS SW payloads still using default config which say ipv4.

Thrust of paper is transfer is to dual stack is almost complete but wth long tail => T1 transition done.  Now storage is dual stack, can entertain ipv6-only WNs.  QML strategy with NAT.

DK will draft a proposed abstract and sent it round .

AOB

There are minutes attached to this event. Show them.
    • 14:00 18:00
      Session 1 600/R-002

      600/R-002

      CERN

      15
      Show room on map
      • 14:00
        Introductions, agenda, note takers 10m
      • 14:10
        Review minutes and actions and other urgent topics 10m

        Matters arising at previous meetings:

      • 14:20
        Roundtable updates 40m
      • 15:00
        Tier 2 status 30m

        Analysis of tickets and their responses

        Speaker: Andrea Sciaba (CERN)
      • 15:30
        Coffee 30m
      • 16:00
        Monitoring including perfSONAR & ETF 45m
        Speaker: Marian Babik (CERN)
      • 16:45
        Abstract for CHEP2019 45m
    • 09:00 13:00
      Session 2 513/1-024

      513/1-024

      CERN

      50
      Show room on map
      • 09:30
        Review agenda 5m
      • 09:35
        Data Transfer & network monitoring - Is IPv6 preference working? 25m
      • 10:00
        IPv6 Security 30m

        Discussion of Internet Society document

      • 10:30
        Coffee 30m
      • 11:00
        Tier 0/1, LHCOPN and LHCONE status 30m
        Speaker: Bruno Heinrich Hoeft (KIT - Karlsruhe Institute of Technology (DE))
      • 11:30
        Other issues 20m

        Plans for CHEP2019 - Adelaide - Nov 2019.

      • 11:50
        AOB and next meetings 10m
      • 12:00
        Close meeting 1m