HEPiX IPv6 Working Group - virtual F2F meeting

Europe/Zurich
Zoom

Zoom

David Kelsey (Science and Technology Facilities Council STFC (GB))
Description

In place of the normal face to face meeting at CERN - a virtual Zoom meeting.

Please register to say you will attend the Zoom meeting - connection details will be sent to those who register. 

 

 

 

Videoconference
hepix-ipv6-wg
Zoom Meeting ID
64006864374
Host
Edoardo Martelli
Useful links
Join via phone
Zoom URL

IPv6 meeting notes 2021-06-29 - Day 1 - Zoom meeting

(notes by Duncan Rand)

Two day meeting. Usually F2F at CERN. Agenda: https://indico.cern.ch/event/1012420/

Present:

Apologies:

Roundtable updates

Bruno: Rolling out IPv6 across KIT campus. Tier-1 almost already done including WN.

Shawn: AGLT2 recently spent a week in downtime - re-cabled everything. Put in new devices - WAN & LAN. Encountered a IPv6 neighbour discovery problem - only affected a few storage nodes - perhaps random holes in the route table. Ciscos apparently interfered with Juniper device’s IPv6 discovery. One host affected was the perfSONAR MaDDash host. Hopefully by end of July will be fully transitioned to 100G.

Tim: Nothing much to report. Put out new versions of Jisc IPv6 technical guide - also noticed after talking to the GEANT compendium service editor that a few (15%) European NRENs don’t support IPv6 - perhaps they have WLCG sites? Discussion regarding GridPP sites.

Shawn: Presentation on packet marking: https://indico.cern.ch/event/1012420/contributions/4248574/attachments/2273105/3860901/Packet%20Marking%20WG%20Status.pdf

Edoardo: perhaps write an RFC on this? Would be a good idea but would also take a long time. Have a technical specification - a bit like an RFC - could evolve into an RFC perhaps.

UDP fireflies - do they have to be IPv6 - no - could be either IPv4 or IPv6. Suggested them because of the technical challenges of writing to part of the header in linux. Also a lot of NRENs have a policy that they don’t inspect packets. Easier to give permission to inspect the user-space firefly packets. Will we do both? Possibly with IPv6. Certain networks might not be able to capture packet flow labels so could use the fireflies. There is still an advantage with flow labels. Problem with fireflies is that they are intermittent, need to correlate them. Also, dCache might have problems interacting with sockets it opens as it uses Java. Different storage providers use different languages - e.g. xrootd uses C. Another issue is whether tc and eBPF will scale up to 100Gbps.

 

Bruno: status of Tier-1. No change. Fermilab FTS server is dual-stack but prefers IPv4 at the moment. What about storage at Tier-1s, e.g. at RAL new storage is dual-stack but Castor is IPv4 only. CTA and Echo will be behind gateways - not accessed directly. Suspect there are other Tier-1s out there that still have legacy IPv4 devices. Only 42% is IPv6 at the moment.

Dave: Do we need a more concerted campaign? There is no compulsion to use IPv6.

As experiments are moving to HTTP-TPC are they moving back to IPv4? Perhaps new equipment being put online is being put on as IPv4 only.

What is the percentage of IPv6 traffic on LHCOPN? Want to turn off IPv4 on LHCOPN. Will try to look in more detail at the traffic of a few Tier-1s e.g. RAL, KIT, CERN.

Andrea: Tier-2s - nothing has changed in the plots because no ticket has been closed. News from USATLAS sites - see slides.

-----------------------------------------------------------------------------------------------------------------------------

IPv6 meeting notes 2021-06-30 - Day 2 - Zoom meeting

(notes by Francesco Prelz)

In attendance: Luis Alvarez, Marian Babik, Martin Bly, Jiri Chudoba, Dave Kelsey, Edoardo Martelli, Mihai Patrascolou, Francesco Prelz, Duncan Rand, Andrea Sciaba'.

 

Agenda is reviewed. The status of various testing and monitoring efforts will now be covered.

 

Marian Babik presents on ETF and Perfsonar

(slides at: https://indico.cern.ch/event/1012420/contributions/4248579/attachments/2273554/3861744/ETF%20%40IPv6%20F2F.pdf)

 

The info available on ETF monitoring (now based on checkMK 2.0) is shown in a live demo:
https://etf-10.cern.ch/etf/check_mk

 

Answers to questions by Dave K.:

* Passing the information from ETF to Sitemon to compute site availability is still to be implemented.

* CheckMK has very good, native, straightforward support for IPv6, also for Kubernetes.

Dave K.: Thank you for keeping up: IPv6 is just one minor change in a tremendous flow of constantly changing software.

Marian Babik also presents on Perfsonar

(slides at https://indico.cern.ch/event/1012420/contributions/4248579/attachments/2273554/3861745/perfSONAR%20Monitoring%20Update%20IPv6%20F2F.pdf)

 

Dave K.: why are IPv6 measurements available only from Geant? Aren't the NREN supposed to be supporting IPv6 for a long time?

Marian B.: The NRENs have no problems in supporting IPv6. What takes a long time is deploying the Perfsonar technology on available resources (e.g.: network namespaces have to be set up).

The colors in the reports may be changing, as 1 to 5 Gb/s thresholds are not that significant with 100 Gb/s links.

On the upcoming WLCG data challenges. Do they cover IPv6 ?

Andrea S.: Probably not for CMS.

Marian B.: Different experiments have different ideas of what they would like to challenge.

Dave K.: The challenge should happen this calendar year?

Marian B.: Yes, supposedly in September, with all experiments testing in parallel to stress the network infrastructure.

Dave K.: Will we be able to tell which fraction of the challenge happened on IPv6?

Marian: We are still having issues at the level of being able to measure the total traffic per site!

 

Jiri C.: Who is maintaining the dashboard and Elasticsearch sites?

Marian B.: Dashboard is maintained by Shawn. The Elasticsearch contact is Illya from UChicago.

Mihai P.: Presents the FTS monitoring activity. (Slides at

https://indico.cern.ch/event/1012420/contributions/4248578/attachments/2273568/3861762/FTS_IPv6_Reporting.pdf)

 

Andrea S.: Are changes required from the xrootd developers?

Mihai P.: All that's needed is there.

Dave K.: The development that's starting in September is just the httpd plugin/backend ?

Mihai P.: No, also for the xroots plugin. We need XXX (sorry - couldn't get it) implemented in xrootd in order for GFAL to detect the transfer.

Dave K.: If IPv6 is FALSE it doesn't mean it's IPv4. There was a discussion or ticket proposing to add a third, 'unknown' status, otherwise IPv6 seems more efficient because the failed transfer aren't counted.

Mihai P.: The IPv6 flag is relevant only for the gridftp plugin.

Duncan R.: there was indeed a proposal to change the FTS status to three-state.

Mihai P.: we can add a new boolean, or change the current IPv6 flag.

Dave K.: Is there an open ticket in this issue ?

Mihai P.: No ticket - just a discussion so far.

Dave K.: Can Duncan turn this into a ticket ?

Duncan R.: Yea.

Dave K.: As the development is targeted for september, we are unlikely to be tracking the September data challenge with this, right ?

Mihai P.: This is not going to be ready for the September data challenge.

Duncan R.: There was discussion (with Brian Bockelman and others) about  tracking http traffic, as the different data transfer streams could be on different protocols.

Dave K.: Isn't there a way to express a policy preference, with the default being "no protocol preference". Can we express an IPv6 preference? We see a lot of traffic between dual-stack instances that seem to prefer IPv4.

Mihai P.: In some cases one of the endpoint was actually IPv4-only.

Dave K.: Who made the decision that the default is "no preference" ?

Andrea S.: In practice it looks like a random choice ?

Mihai P.: In practice it depends on the actual FTS server, I suspect. CERN will go with IPv6. BNL is IPv4 only, and will do IPv4.

Dave K.: Wasn't just Fermilab that had issues with IPv4 ?

Mihai P.: Perhaps it was Fermilab, indeed.

Dave K.: Sorry to insist, but the preference for IPv6 should be configured.

Mihai P.: That would be good. The default is "do nothing". Doing something. requires work.

Dave K.: When we attempted, 3 years ago, to go dual-stack, both ourselves and the management board assumed that traffic would naturally start flowing through IPv6.... The resistance against IPv6 at Fermilab was due to some legacy experiment. We have many places where IPv4 ends up being preferred almost by accident.

Mihai P.: Storage endpoints and FTS servers have spefific IP policy preferences that can be configured. For the issues with HTTP: we are more familiar with the other plugins, and haven't investigated it thoroughly.

Duncan R.: Do people remember whether it's HTTP Third Party Copy (httpTPC) or xrootd that's about to be used/preferred in WLCG?

Mihai P.: The trend is mostly towards httpTPC, with some areas where xrootd is used.

Duncan R.: Alice will stay with xrootd, I suppose.

Mihai P.: yes.

 

After the coffee break dates for next meeting are discussed:

 

Due to the usual uncertainties on abilities to travel we decide to reserve three half-day sessions, all day on Thursday, October 14th, 2021, and Friday morning, October 15th. As the dates get near, just two half-day sessions out of the three will be selected. The CERN IT auditorium, where just 15 people are currently admitted, is booked for these dates.

 

An update 1-hour call is proposed for Tuesday, September 7th, 2021, 16-17 (note: subsequently changed to Wed 22 Sep 2021 at 14-15 Central European Time). This can be also used to negotiate the agenda for the October meeting.

We then continue with the agenda, and deal with the situation in the US with president Trump's directive on the timetable to go IPv6-only for all federal agencies, DOE labs included.

After resisting the drive to go dual-stack, now the US labs may be driving the transition to IPv6-only...

By end of FY 2023 (september 2024 ?) 20% of the services are supposed to be IPv6-only.  80% by FY 2025 (all details on

https://www.whitehouse.gov/wp-content/uploads/2020/11/M-21-07.pdf)

Phil DeMar is quoted to believe that just because it's a directive it doesn't mean it will happen.

 

Duncan R.: NAT64 and DNS64 will have to be used to connect to IPv4 anyway.

Francesco P.: 464XLAT is just useful for services and protocols where IPv4 literals are stored, transferred and signalled.

Dave K: What about a few IPv4 sites that didn't move in a sea of IPv6 ?

Francesco P: Likely the issue preventing those sites to move was person-power. There's no zero-personpower solution to deploy some form of '646XLAT' to let IPv4 islands talk to a mostly-IPv6 internet. The required effort would be comparable to enabling IPv6.

Martin B.: Gatewaying IPv6 to IPv4 will not be a priority.

Dave K.: Most of the Tier-2 centres that didn't move to dual-stack storage are from the US, as we saw yesterday.

Duncan R.: The summary in the US directive says that federal agencies should make "large portions" of their networks IPv6-only.

Dave K.: Large organisations have already made a management decision to have an IPv6-only backbone. That would be easier to deal with at the management level.

Andrea is asked whether a discussion on the US federal IPv6 requirement iS going to happen at the July management board.

Andrea S.: The management board meeting is on July 20th, but  there's no agenda yet.

Martin B.: The US document, page 3, section 4, requires a plan by September 2021. Systems that cannot be moved to IPv6 will have to be replaced or retired. It doesn't say that IPv4 will be banned entirely.

Duncan R.: I suppose that many of the IT assets are on networks that don't communicate with anyone. Otherwise you cannot have IPv6-only hosts - they won't be able to communicate

Dave K. Perhaps the message for the management is that we should now push to dual-stack everything, and not just storage. Also, turning off Ipv4 in LHCOPN should be done sooner rather than later.

Andrea S: Some smart cache could be deployed to hide IPv4-only sites. This doesn't look like a big issue to me.

Dave K. If DOE goes IPv6-only we can cope. It was our hope. We've been putting in 10 years of work towards this goal!

Andrea S. In a data-lake environment you usually access only data that are in your region.

Dave K. WLCG is more ane more using direct access from the worker node  to remote storage.

Andrea S.: The US decision is good to continue lobbying for IPv6, but if tthey go IPv6-only it will not be a catastrophe.

Items for the completion of the round-tables update:

Martin B: The big news at RAL is 'forklifting' new routing code for the 100 Gb/s infrastructure. Was supposed to happen mid-July, but will happen mid-August, with two week-end of significant disruption.

Dave K.: What are the plans to enable IPv6 on the site? I don't have it in my office....

Martin B.: You are asking the wrong person. There was a big security audit because of a succesful ransomware attack. Two-factor authentication had to be added.

Dave K.: People are encouraged to drop PCAP snippets to the CERNbox write-only mailbox accessible at https://l.infn.it/ipv6dumpdrop to help with the search of forgotten IPv4 services.

 

 

 

 

 

 

 

 

 

There are minutes attached to this event. Show them.