HEPiX IPv6 Working Group - virtual F2F meeting

Europe/Zurich
Zoom

Zoom

David Kelsey (Science and Technology Facilities Council STFC (GB))
Description

In place of the normal face to face meeting at CERN - a fully virtual Zoom meeting.

Please register to say you will attend the Zoom meeting - connection details will be available to those who register. 

The agenda is DRAFT - still being defined

 

 

 

 

Videoconference
hepix-ipv6-wg
Zoom Meeting ID
64006864374
Host
Edoardo Martelli
Useful links
Join via phone
Zoom URL
Registration
IPv6 virtual F2F participants

October 14, 2021, 15-18
first session of another two half-day IPv6 meeting 
on Zoom.

(notes by Francesco P)

In attendance: Marian Babik, Andrey Bobyshev, Nick Buraglio, Tim Chown, Dave Kelsey, Edoardo Martelli, Raja Nandakumar, Kars Ohrenberg, Francesco Prelz, Duncan Rand, Andrea Sciaba'.

Agenda at: https://indico.cern.ch/event/1083277/

Dave reviews the agenda.

20' for a first round of roundtable updates:

 

Francesco P. (INFN): No news from INFN-Turin on the storage IPv6 migration. They stopped even sending apologies via e-mail... No worker node PCAP file was dropped in for IPv4 application analysis . Just a reminder, the dropbox is available here (link to CERNbox): https://l.infn.it/ipv6dumpdrop

 

Apologies from Martin B. RAL report was sent via e-mail:

+--------[TODO: Please paste Martin's e-mail here] [DONE: see below]---------+

The Tier1 is progressing with the implementation of the new Leaf/Spine network – we now have connectivity between the new L/S network and the Legacy Network via the SCD Super Spine (no hosts on the L/S network as yet).  The CTA (new tape service) network is also connected to the Super Spine and it has some hosts with Tier1 addresses on it as well as its own separate address space.  We now have connectivity between the L/S network and the site core, and are currently working on connectivity to the border routers. This takes time as there are change control processes to go through and scheduling to avoid collisions with other central networking changes windows. When the connectivity is fully implemented we anticipate changing the connectivity for the Tier1 over to use the new exit routers in all cases and the legacy side will access ‘outside’ via the Super Spine to the new L/S network.  This will provide L/S/SS link and external links over 100Gbps pipes and remove all of the current exit choke points (40Gbps).  

I think I reported last time that our current T1 <-> Site Core pipe (40Gbps) now carries both ipv4 and ipv6 traffic removing the 10Gbps choke for ipv6.  We had hoped that this would solve the problem of ipv6 not switching to the standby path in the event of an issue – this has not proved to be the case.  Given these links will be retired in the next few weeks, we chose not to spend effort investigating this issue. 

We intend to have a ritual immolation of the Extreme x670v routers on their retirement, no flowers by request.  The OPNR s4810p units will be allowed a graceful retirement

 

Edoardo M.: No IPv6 news from CERN.

 

Kars O.: Things are happening at DESY, but nothing related with IPv6.

 

Bruno H.: There are slight improvements here and there, but we are still on the move at getting workable administrative access via IPv6 to our network equipment. There there is still legacy hardware that will never support IPv4.

 

Duncan R.: no news from Imperial College.

 

Tim C.: No news besides the IETF activity that will be reported tomorrow.

 

Dave K.: May want to ping GridPP and see what they are up to, or what they are actually monitoring.

 

Duncan R.: Can try to revive the links between Imperial and Queen Mary, and perhaps provide some WN PCAP dump for analysis.

 

Francesco P.: To add a figure besides our generalised lack of news,

         the usual Google statistics

         (https://www.google.com/intl/en/ipv6/statistics.html)

         show that the linear increase in IPv6 adoption continues, and the global IPv6 traffic fraction, as seen by Google, has now crossed the 35% mark.

 

Dave K.: Any news from the experiments? Dimitrios and Atlas aren't here.

 

Raja N. shows a few slides (wearing first his LHC-B hat, then his DUNE hat...):

https://indico.cern.ch/event/1083277/contributions/4554779/attachments/2328044/3966367/ipv6F2F-12Oct2021.pdf

 

Duncan will encourage the addition of more IPv6-only worker nodes at Brunel.

 

Dave K.: Is the dCache IPv6-only testbed something that the developer runs ?

 

Raja N. I suspect that Fermilab has a dual-stack dCache instance available, and this is used for testing from IPv6-only.

 

Andrea S.: No news from CMS - CMS is good!

 

Marian B. talks about the packet-marking activities at the Research Network

     Technical WG and shows these slides:

     https://indico.cern.ch/event/1083277/contributions/4554784/attachments/2328133/3966503/Research%20Network%20Technical%20WG%20update%20IPv6.pdf

 

Tim C.: You mentioned you could include other information in the UDP fireflies, of interest of the far end - or people in the middle - security allowing.

 

Marian B.: There is a linux tool called 'ss' that gets a full dump of all open sockets. I hope to be able to add the flows and explore various options. We definitely want to discuss This.

 

Tim C.: The technique has a lot of potential, bu the interesting question is what should be include in the metadata.

 

Dave K.: whichever of the two techniques is used, monitoring has to be done at different places. What can cause UDP and TCP traffic to take different paths ?

 

Marian B.: It's unlikely that a TCP transfer takes a completely different path than UDP.

Dave K.: Are the two techniques somehow in competition ?

MArian B.: We will be pursuing both techniques. We need components with the ability to access the network packets at various places. The extension header with the destination label is new.

 

Tim C.: This issue has come out within IETF too, to help with the

    monitoring of different applications. The nice thing about the fireflies

    is that they can use more than the 20 bits allowed by the packet header.

 

Marian B.: The 2-million UDP fireflies that ESnet has captured provide info

    on both IPv4 and IPv6.

 

Francesco P.: Out of curiosity: did the usage of the term 'firefly' for this

    kind of marker UDP packets start within this worker group? I never

    heard about this term before.

 

Marian B.: Fundamentally yes, it was invented here.

 

Tim C.: A better name could be 'tracer packets', like 'tracer bullets'.

 

Phil de Mar shares a few slides on the IPv6 mandate on US labs:

  (to be uploaded: *** add link here)

  (Andrey comments on the details on slide 7)

 

Questions on Phil's talk:

 

Duncan R.: Why were the US national labs left out last time around?

 

Phil D.M.: They probably knew it would be hard, and that DOE could have

    used some more time to get ready.

 

Dave K.: Similar to WLCG, and this group's approach, where we threaten to

    go IPv6-only so as to get at least dual-stack.

 

Cybersecurity concerns are prevalent everywhere.

 

1+1 > 2 in terms of vulberability.

 

Nick Buraglio introduces himself:

I'm leading the implementation of the ESNET IPv6-only program. I wrote the

implementation program, which is now pending DOE approval - will be

submitted to the US OMB (Office of Management and Budget).

My background is in supercomputing in science. Worked @ NCSA and University

of Illinois - so I'm keeping a keen eye on the scientific community needs.

ESNET had been working for 1 1/2 years to make the network management layer

IPv6-only. My office has been IPv6-only for about 18 months (since the

beginning of the pandemics). We've made enough experience up from Layer-1

and all the way through the network stack.

I've been involved with IPv6 since 2002: I understand this is an iterative

process. You cannot write a document with nice plans without making provisions

for changes. Every document has to be a living document, and be adjusted

promptly in case of need. We want this process to succeed as best as we can:

the labs will likely not be excluded from this process.

One caveat: there's one exclusion on the memo: National Security Systems -

as they include specialty, uncommon items. Hovever science is similar,

in a different way.

 

Note: My office is behind a DNS64/NAT64, as certain things "out there"

just don't work with IPv6 only.

 

Francesco P.: We've been discussing time and again about the class

    of applications that cannot work behind NAT64 (IPv4 literals in the

    payload or in databases, etc...). What did you find in your experience ?

 

Nick B.: House-developed applications are always the "long pole in the tent",

    and there's no incentive to change them.

    We experienced significantly less problems this time around w.r.t

    the first time DNS64/NAT64 was tested 10 years ago.

    The biggest problem we encountered was with Spotify (which is important in

    campus environments): the desktop application doesn't work behind NAT64.

    The mobile app is better. Github, too is irksome here and there.

    'Tectonic shifts' are occurring in that organisation...

 

( - 10 minutes coffee break - )

 

As we are a bit behind with the agenda, and Andrey may want to

hear this today, Ben Jones presents the slides on IPv6-only right away:

https://indico.cern.ch/event/1083277/contributions/4554779/attachments/2328044/3966761/UpdatesIPv6-2.odp

 

Andrea S.: How likely will the new computer centre be IPv6-only ?

 

Ben J. There are things we need that will be IPv4-only, as far as I can see.

     There's no way we can be out of AFS by the time new computing centre

     is turned on.

 

Andrea S.: Doesn't AFS support IPv6 now ?

 

Ben J.: The new kernel module may support that, but we haven't checked it out.

 

Edoardo M.: It's in the roadmap, but not implemented yet.

 

Ben J.: Fascinating to see how this "impossible" task is now progressing.

 

Francesco P., wearing his small CERN/AD experiment data 'manager' hat: I've

     been threatened with unpredictable consequences if we didn't move out

     of AFS 6 years ago, and we moved out... How come AFS is still an issue ?

 

Ben J. I've been seeing plans to move out of AFS going nowhere for most of

     my career in IT. Very realistically, it's not going - for certain

     things it's just too good... It's still the filesystem used for batch

     submission.

 

Dave K.: Where ?

 

Ben J.: In the Atlas Tier-0.

     And... CERN is not running out of IPv4 addresses - is actually handing

     them back. But probably we need an optimist here, not a cynicist...

 

Tim C.: If you tweak the address selection recipe the fraction of

     data transfers may increase. There may be transfers that 'could'

     be using IPv6.

 

Nick B.: Getting IPv6 deployed is largely an issue of management buy-in, and

     management will buy if they are forced to (e.g. by the US Government) or

     if there is a business driver (better performance, etc.), that may cause

     it to turn into a fire that need to be taken care of. Otherwise the

     process will keep lagging.

 

Tim C.: The carrot of getting access to IPv6-only resources or running

     out of IPv4 addresses may have to turn into a stick.

 

Nick B.: At NCSA the biggest issue is that it was a big AFS shop. We put IPv6

     everywhere it could go, but not "elsewhere".

 

Edoardo shows the CERN graph of LHCONE+LHCOPN, showing a 50% IPv4 share

https://twiki.cern.ch/twiki/bin/view/LHCOPN/LHCOPNEv4v6Traffic

 

Dave K.: Why is that, as 100% of LHCOPN should be IPv6 capable ?

          Is that because the LHCONE data are included ?

 

Tim C.: More data could be fed into netsage, that I'm a big fan of.

     The netsage ingest pipeline can also be run a local container, if privacy

     concerns have to be addressed.

 

Francesco P.: I can also look at a few PCAP snippets - or other flow data,

     if they are shareable and/or available.

 

Dave K.: It would be nice to see the LHCOPN data only, disentangled from

     LHCONE.

 

Edoardo M.: Ok.

 

Andrea S. shows the Tier-2 migration status, as of this morning, on the

     usual page:

     https://twiki.cern.ch/twiki/bin/view/LCG/WlcgIpv6#WLCG_Tier_2_IPv6_deployment_stat

 

Dave K.: Didn't USATLAS, one of the main 'culprits', have a plan to be finished

     by the end of the year?

 

Andrea S.: There is a Google doc maintained by US-ATLAS, referenced in the

     page above (https://docs.google.com/spreadsheets/d/1d2FbmFoXZkBP_cAmJ5q5kWgdsGnWuyFT0ot1n9Gf4ns/edit?usp=sharing).

     There was some progress w.r.t. last year, but don't know how often this

     document is updated.

     Some sites just don't see the need to use IPv6, so there's no workable

     stick that can be used. Some Tier-2's are living inside a campus, and have

     no leverage to impose IPv6 on the campus.

 

Bruno H.: Freiburg plans to get IPv6 ready by either end of 2021 or 2022.

     Progressing slowly but steadily...

 

Dave K.: No news on Tier-1 status ?

 

Bruno H.: They are all done. Kurchatow Institut in Russia, the last missing

     T1 site, communicated they are now IPv6 ready and tested.

 

------------------------------

HepIX ipv6 meeting minutes : 15 Oct 2021

notes by Raja N

Monitoring including perfSONAR & ETF
Speaker: Marian Babik (CERN)

  •  Tim Chown Throughput is poor for many sites in the data challenge. Possibly because the sites are not taking part. How do you target sites which are poor, rather than just targetting sites which work well?
  •  Marian Babik Networking is now a critical service. Experiments encouraged to report issues. More fine grained meshes are being created to have a fine-grained view
  • Also use flow information from ipv6 routers(?)
  • Are we monitoring the right protocols and ends?
  • We primarily lack the instrumentation to monitor all the network transfers, what fraction is ipv6, …
  •  DaveK Most monitors assume it is ipv4 by default. So, estimates of ipv6 fraction are likely minimum valaues.
  • [user=Marian Babik]Monitoring Link : Grafana
  • Each site needs to add its information into sFlow
  • Some sites have WNs behind NAT. Not a thing at CERN, but some sites do have it. That muddles up the experiment contribution to a given throughput.

IPv4 transfers, FTS, XrootD monitoring etc
Discussion

  •  DaveK We have traffic statistics in the dashboard from Marian above.
  • LHCOPN with new configuration has KIT, RAL, NDGF, NL-T1, IN2P3
  • These sites have newer links and run ipv4 and ipv6 through separate VLANs
  • Not yet possible to do in Spain, maybe possible with new 100Gbps link. Eduardo to have a look
  • CNAF do not want to split. The counters have statistics and are visible to the IT admins, but not for CERN admins.
  • For Tier-1s which support multiple VOs, we cannot separate by experiment.
  • Chase up with Tier-1s which have a large ipv4 component?
  • The KIT WNs are all dual stack. However, the transfers seem to be over ipv4 from a monitoring by  Eduardo
    • Possibly due to singularity which may not not have ipv6 address?
    • To be followed up with experiments - the experiments run their own singularity versions.
    • Also to be followed up with sites which run containers. So, just dual stacking WNs does not solve the whole problem
    •  Bruno to follow with KIT
    •  Dave K to follow up with RAL
    •  Raja N to follow up with LHCb
    •  Andrea CMS is probably kosher in this
    • Some of the traffic is to hypervisors at CERN
    • Look at Eduardo’s monitoring and follow up with a few hosts. That may point us to some more generic issues.
    • eg. One transfer is from EOS-CTA -> KIT which is went via ipv4. Why?
    • Transfer between CEs and WNs is ipv4 at KIT, even though both are dual-stacked. Will take time to investigate ( Bruno H)
    •  Tim C We could reverse lookup DNs of a few of thes machines and see if they have only ipv4 addresses in the system
    • Mostly KIT, Brookhaven, CNAF, India among the “top talkers”

[Break for 15 minutes]

IPv6 at IETF - DHCPv6 and other issues
Speaker: Tim Chown

  • RFC8981 : Allows to have temporary “privacy” ip addresses. So, some WNs can have them if implemented correctly.
    • On by default. Can be turned off
    • Can have devices with only temporary addresses
  • No DHCPv6 default gateway option in Android. Problem with ipv6-only.
  • RFC4291 : Subnets are only 64-bit in size. (Why? - explained by RFC7421). Cannot have /112 addresses
  • RIPE554-bis document : Advice on specifying ipv6 requirements during procurement
    • Focus on enterprise scenarios
    • Useful for WLCG sites
    • iOS has a feature like this.
  • New routers / switches procured by CERN have Arrayguard. We also have DHCPguard and probably now DHCPv6guard now.
  • Need hardware to support the needs - choose carefully from the vendors, use ones who have implemented properly. Cisco is better than Juniper networks?
  • In Italy no commercial ISP offers ipv6. Mystery.
  • Using NAT (of varioous types) to move from one provider to another in ipv4. There is a solution (prefix based mesh?) that is implemented by Cisco.
  • Discussion on some edge cases - some of which can get features implemented by vendors if there is a big enough community requesting this.
  • We have possibly the following issues. Is there a way of tracking this? Answer : Not yet. Moving to new version of Drupal and once it is up, can look at adding this to a knowledgebase.
    • Protocol issues
    • Vendor implementation of protocol issues
    • Others

Conference submissions
HEPiX, ISGC2022, TNC22?

  • CHEP has been postponed until 2023
    • Probably will have good progress by then and can write a paper
  • HEPiX.  DaveK will submit standard draft (next week).
  • ISGC2022 : Planning to be in-person (next March). Probably going to be virtual again.
    • Can give an update report.
    • Gave a talk last year. Not much has changed since then.
    • Probably can talk about container issues, if we make progress on the topic by then.
      • No hurry
  • TNC22 : Probably poster, not clear if enough information for a talk
    • University / site networking progress / measurement probably good topic for TNC?
    • Still planned as an on-site meeting.
      • Again - no hurry

Roundtable updates (continued)

  • Finished yesterday. Nothing else …

Next meetings:

  • 1 hour meeting:
    • Thursday, 2 Dec 2021. 16:00 CET
  • Traditional F2F meeting in January
    • Keep it virtual
    • 2 half days - both afternoons so that US participation is easier.
    • Tuesday / Wednesday (18, 19 Jan 2022) 3pm - 6PM CET on each day

AOB, future plans

 

 

 

There are minutes attached to this event. Show them.
  • Thursday, 14 October
    • Session 1
      • 1
        Welcome, agenda, note taker(s)
      • 2
        Roundtable updates

        News and short reports from all WG members - sites, national networks, experiments

        This is part 1 - to be continued tomorrow morning

      • 3
        Update from RNTWG & Packet Marking subgroup
        Speaker: Marian Babik (CERN)
      • 4
        IPv6 at Fermilab
        Speaker: Philip DeMar (FNAL)
      • 16:15
        Break
      • 5
        Brief introduction to the IPv6 working group activities

        For the benefit of first-time attendees.
        Not actually shown during the meeting but available as a "reference"

        Speaker: David Kelsey (Science and Technology Facilities Council STFC (GB))
      • 6
        Status of Tier1/Tier2/LHCOPN/LHCONE
        Speakers: Dr Andrea Sciabà (CERN), Bruno Heinrich Hoeft (KIT - Karlsruhe Institute of Technology (DE)), Edoardo Martelli (CERN)
      • 7
        IPv6-only testing at CERN and elsewhere

        Including a short report from Raja on DUNE @ Fermilab

        Speakers: Ben Jones (CERN), Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
  • Friday, 15 October
    • Session 2
      • 8
        Monitoring including perfSONAR & ETF
        Speaker: Marian Babik (CERN)
      • 9
        IPv4 transfers, FTS, XrootD monitoring etc

        Discussion

      • 10
        IPv6 at IETF - DHCPv6 and other issues
        Speaker: Tim Chown
      • 10:30
        Break
      • 11
        Conference submissions

        HEPiX, ISGC2022, TNC22?
        CHEP has been postponed until 2023

      • 12
        Roundtable updates (continued)
      • 13
        AOB, future plans and next meetings

        Deployment of dual-stack WNs