HEPiX IPv6 Working Group - in person - Santiago de Compostela, Spain

Name: HEPiX IPv6 Working Group - in person - Santiago de Compostela, Spain
Start: 2025-01-29T09:30:00+01:00
End: 2025-01-30T18:30:00+01:00
Location: Instituto Galego de Fisica de Altas Enerxias - IGFAE

29 Jan 2025, 09:30 → 30 Jan 2025, 18:30 Europe/Madrid

Instituto Galego de Fisica de Altas Enerxias - IGFAE

Instituto Galego de Fisica de Altas Enerxias - IGFAE Rúa de Xoaquín Díaz de Rábago, 15705 Santiago de Compostela, A Coruña, Spain

David Kelsey (Science and Technology Facilities Council STFC (GB))

Description

This is still very much a draft agenda. Suggestions and offers of talks are still welcome.

Timing of breaks during the morning and afternoon and timing of the lunch sessions may change. Content and timing of agenda topics are also subject to change. We are likely to adjust the agenda following discussion at the start of the meeting.

We plan to meet for two full days. All of Wednesday 29th and all of Thursday 30th January 2025.

We encourage attendees to arrange to stay 3 nights in a hotel; travel in on Tuesday 28th Jan and depart on Friday 31st Jan.

We will strive to make Zoom connectivity available for those of you who cannot attend in person - but there are no guarantees yet.

Please register for attendance - being sure to say Yes or No to the question "will you attend in person?"

Venue Instituto Galego de Fisica de Altas Enerxias - IGFAE
Address Rúa de Xoaquín Díaz de Rábago, 15705 Santiago de Compostela, A Coruña, Spain
Map URL https://www.google.es/maps/place/Instituto+Galego+de+Fisica+de+Altas+Enerxias+-+IGFAE/

64006864374

Edoardo Martelli

Join via phone

Hide

HEPiX IPv6 Working Group meeting - Santiago De Compostela, Spain - 29/30 January 2025

**** STILL BEING MODIFIED *****

Day 1 - Meeting - Wednesday 29th January 2025 - starting at 09:30

(notes by Francesco Prelz)

Agenda: https://indico.cern.ch/event/1486358/

In attendance at IGFAE: Andrea Sciaba' (AS), Bruno Hoeft (BH), Costin Grigoras (CG), David Kelsey (DK), Edoardo Martelli (EM), Francesco Prelz (FP), Jose Flix Molina (PF), Marcos Seco Miguelez (MM), Carmen Misa Moreira (CM), Martin Bly (MB)

Via Zoom: Tim Chown (TC), Duncan Rand (DR), Christopher Walker(CW),

in the afternoon (remote) : Garhan Attebury(GA), Mihai Patrascoiu (MP), Borja Garrido Bear (BG)

Start with roundtable introductions

General news

The only 'momentous' ocasion was Tony Cass' keynote at CHEP in Krakow on he state of WLCG Networking where he said "we *must* turn off IPv4 before HI-Lumi LHC".

Good that this is coming when there is still so much trouble in identifying/logging IPv4 vs. IPv6 traffic. We should continue to aim at getting rid of IPv4 on the WAN.

Agenda for the day is reviewed (https://indico.cern.ch/event/1486358/).

DK: *The* most urgent thing is having a plan for the CHEP paper. We need to take steps and decide what we want to say. We aren't even able to measure IPv4 vs IPv6 data traffic, apart from LHCone/LHCopn stats.

"History and aims" of the working group

This is meant for newcomers, so given not many of them, doesn't warrant to show a long presentation.

Some slides were prepared here and are skimmed through:

https://indico.cern.ch/event/1486358/contributions/6347145/attachments/3004781/5296260/Kelsey-IPv6-WLCG-28jan25.pdf

Considerations emerging from the history:

- Attendance to IGTF and HEPix in Lugano is discussed. An abstract should be submitted.

- The human factor in the IPv6 'obstacles' - which we refrained from mentioning on paper - is definitely still with us.

- Coffee Break -

Roundtable updates

PF gives a report on the Spanish sites:

https://indico.cern.ch/event/1486358/contributions/6309819/attachments/3004721/5296109/20250129_IPV6_Updates_Spanish_Sites.pdf

On the matter of assigning traffic to individual WLCG experiments, the only Site in Spain that supports more than one experiment is the Tier-1 @PIC.

DK: Is Spain part of other research communities, such as SKA?

PF: Seville is part of SKA. They will have their computing infrastructure, but we don't (yet) talk to them so mach. Same for the the gravitational waves or astroparticle communities, etc.

DK: On the matter of sites not addressing the GGUS tickets, no response to a ticket usually means that the transition was not yet done.

AS: IPv6-only worker nodes can be tested by sending jobs that request a token.

DK: You described the relation between the various instiututes - are they all national institutes or funded by different entities?

PF: The funding model is very different. There are regional and national scope sites. CIEMAT e.g. stems out of the nuclear fusion investments.

MM: IGFAE has two sources of funding - national and regional.

DK: What about the relation with the NREN(s)?

PF: There is one NREN (REDIris), but they only cover the last mile at a few sites (e.g. Madrid). PIC, IGFAE are connected to regional network infrastructures, with routing info (including LHCONE) propagated via eBGP.

BH presents the KIT status report:

https://indico.cern.ch/event/1486358/contributions/6309819/attachments/3004721/5296244/HEPiX-IPv6-2025-01-19.v0.1.pdf

CG: W.r.t. the ALICE finishing jobs failing to "call home", we start a JAVA process that communicates between the job payload and the coordinating service via port 17001 on localhost, which defaults to IPv4. There are two flags for JAVA, one for preferring IPv4 stack, which defaults to false, and one to prefer IPv6 addresses, which also defaults to false. Changing the latter, and leaving only the IPv6 version of 'localhost' in /etc/hosts should address the issue. In fact, this was done recently, so jobs shouldn't fail anymore.

AS presents an update of the status of the currently on-going CPU ticket campaign

as of "now":

https://indico.cern.ch/event/1486358/contributions/6309820/attachments/3004609/5295876/IPv6%20compute%20deployment%20250129.pdf

There are a number of sites affected by the Skyway Infiniband router lack of IPv6 support (CNAF, Purdue, KIT, others):

DK: NVidia probably doesn't get IPv6 as a requirement from the HPC sites.

MB: Our CEPH farm talks IPv4 internally, and does protocol conversion only at the border - which is connected via a specific connection to each individual worker node.

CG: We have the same issue on the 40k core server farm at CERN, that uses

IPv4 internally (and Infiniband as well).

DK: Could this be a case that can be worked around with translation,

via a combination 464XLAT boxes or the like ?

MB: Depends on the scale of the resources that need to be connected.

If it is in the O(1000) scale, it becomes very difficult.

PM: IPv6-only compatibility, or compliance, should be a requirement

for new HPC sites.

HPC sites running similar machines can also be managed in

very different ways.

On the ticket processing time evolution graph:

MP: Would suggest to weigh the graph by the amount of resources

provided/managed by each site.

DK: Logistical question: this week the new GGUS is opening,

but at one stage it was said that the existing tickets would

not be transferred...

AS: They will be.

- Lunch break -

DK proposes that the discussion on the residual user of IPv4 should

be continued. E.g.: what are we supposed to say about this in the CHEP

paper (one or two concrete items to report on)?

Roundtable updates continue:

Updates from the UK? JISC, IETF:

TC: Uses of IPv6 flowlabel for WLCG. Not to be confused with WGLC (Working

group Last Call - part of the IETF process).

Destination option extension header is explored as an alternative

to flowlabel. We are working (with Chris, Duncan) "with GridPP hats"

on Perfsonar meshes and the problems that arise. There is a UK test mesh here:

(https://ps-mesh.perf.ja.net/grafana/d/aa15bb77-25d6-53d1-b3c7-bf51b6417afa/j

isc-uk-tests?orgId=1).

CW: The latency measurements are not working, while IPv6 are.

TC: There are other discrepancies in bandwith with IPv6 getting less

than IPv4. There are also IPv4/6 measurements being performed @SKA.

Difficult to enforce by policy - or by convincing developers to

comply. Richard Hugues Jones mentioned in a SKA talk that it will be

their policy to support IPv6.

MB: The restriction on the amount of IPv4 address space available at RAL may

convince them.

DK: Rosie gave a SKA report at CHEP2023, omitting IPv6 - then she added

"we'll do whatever Rich says". There will be another SKA meeting in

March - another chance to convince them.

CW: We could persuade them that new sites should be IPv6-only.

DK: Tried that in the past, and received "why me and not them ?" responses.

Will continue to follow up on that.

Updates from CMS (AS): Nothing special - asked my CMS colleagues. They asked

me to push on the transition agenda - they are eager to get IPv6 everywhere.

Due to manual network configuration San Diego finds it easier to deploy IPv6

on new worker nodes, and progressively retire the old ones.

DK: Shouldn't the experiment be helping us in pushing the sites, instead

of pushing you ?

Updates from CERN (EM, CM):

CM reports on the IPv6-only testbed that Ben Jones was running on

Centos6 was updated to Alma9 and broke. CERN "LANDB" is reportedly not

working on IPv6.

FP: Why does that testbed even need LANDB ?

EM: Anyhow LANDB did not change - and it worked before.

CK: Does the configuration remove 127.0.0.1 from the loopback resolution

in etc/hosts (as seen and done at KIT)?

R: Someone will check.

Testing Jumbo Frames on the US test cluster between CERN and Fermilab,

on IPv6. Were able to saturate the 400 Gb/s transatlantic link - no errors.

CMS is proposing to enable Jumbo Frames for the CMS production traffic.

DK: Jumbo frames are good for the carbon footprint - should require

less cycles for routing

EM: They also require no NAT at all. Definitely better. It could be

measured.

DK: We should call IPv4 the "legacy high carbon" protocol ?

CG: The "LHC"...?

Updates from RAL (MB):

Retirement of hardware has been slowing down for lack of resources

to buy new hardware. May have to "re-task" some old routers.

This way we should have IPv6-only capable network hardware for the

Batch & Storage system.

By mid-June we could have everything in the Tier-1 dual-stackable.

We are not far.

SKA will have another leaf-spine pod in the leaf-spine system at RAL.

Hardware replacement to reach other areas of RAL mey require a multi-year

efforts.

Updated from ALICE (CK):

Computing nodes are 70% dual-stacked.

In 4% of the cases IPv6 is explicitely disabled. A small number,

but we don't know why.

Incident with IPv6 SCITAGS: GEANT contacted CERN security, detectng

a spike of 20-30 Gb/s to two romanian sites. This was legitimate

IPv6 traffic. As of now, SCITAGS fireflies are sent to a collector at CERN,

but not alongside the traffic: this would have helped in identifying

the traffic as legitimate.

DK: What is the normal destination of the UDP traffic?

CG: Each storage has to be configured with a specific target (collector).

DH: There were talks to install a collector c/o GEANT.

CG: Embedding them in the IPv6 header would address the issue for

everyone.

DK: There was a statement by Andy Hanushevsky at CHEP: IETF will never accept

this. He also says that all this IPv4 traffic is due to old xrootd

clients - but there are none @ALICE.

DK underscores ther group's trust in TC to make the case with IETF.

Updates from INFN (FP):

No news, unfortunately - and we've seen our dismal performance

in the GGUS ticket response. CNAF is in the middle of moving

equipient to the new location at the Bologna technopole.

Still evaluating the real-life impact of IPV6-only on the desktop:

Without general-purpose 464 configuration, so far the following

list of reverse proxies for IPv4-only services (ipname:port) was needed:

GITHUB:

github.com:22; github.com:443; api.github.com:443; codeload.github.com:443;

objects.githubusercontent.com:443; ghcr.io:443; pkg.github.com:443;

INFN central Gitlab (this we could do something about but we don't):

baltig.infn.it:22; baltig.infn.it:443; baltig.infn.it:4567;

Docker images hosted on public.ecr.aws (other *.ecr.aws are dual-stack):

public.ecr.aws:443;

Updates from the US/US-CMS (GA):

"Dishearteningly" nothing to report.

CHEP paper - CM set up an Overleaf template. Deadline is February 28th.

The presentations we gave (e.g. TechEX) were generally well received.

Big news: the CPU campaign.

It would be nice to measure the impact of moving the worker nodes to

dual-stack on the overall IPv6 usage. But we still lack detailed monitoring

information.

Then the possible subject of 'ipv6-mostly' techniques.

TC: Those are less likely to be relevant for CPUs and Storage Systems.

More for Android and portable devices in general.

Tentative assignments are proposed and discussed about the first drafters of

various sections:

AS for the CPU campaign

BH for reports on the Data Challenge

DK, with help from TC, for IPv6-only facilitating techniques.

The author list is then reviewed.

- Coffee Break -

We seem to be getting only GA from across the Atlantic - suspect no more

status reports from US sites willbe coming.

Other topic for the paper: the standard problem of logging at LHCOPN and

telling the IPv4 vs IPv6 traffic.

DK: found a service at CERN giving the top talkers on SQUID

servers @CERN. Only 3 out of 50 showed an IPv6 address.

Stats are (publicly, apparently!) available here (NB: no https):

http://wlcg-squid-monitor.cern.ch/failover/displayHighBandwidth.html

Clarifications are asked from the page mantainers on the meaning of this table.

It *does* (see discussion on Thursday morning) show that IPv4 is preferred

at most CVMFS and Frontier endpoints, and propagates through the proxy/cache

layers.

Starting the monitoring section at 5PM:

On FTS - MP has a few slides to show:

https://indico.cern.ch/event/1486358/contributions/6309862/subcontributions/529774/attachments/3005278/5297222/FTS_HEPiX_IPv6_Working_Group_2025.pdf

AS: During DC24 there were problems in CERN<>CNAF transfers, because

STORM was not generating performance markers. Do you know if this bug

is now fixed (you said that all storage systems generate performance

markers)?

MP: Have to check.

AS: The symptom was that all CERN->CNAF transfers appeared to be IPv4.

MP: This was a combination of two bugs, one with the performance markers

and one in FTP where IPv4 was reported when the version information

was missing/unknown. This second one is now definitely fixed.

DK: You reported that everything in the CERN FTS servers is deployed quickly.

What about the other sites? - Are they up to date ?

MP: It's very likely - the bug was reported a log time ago. We expect that

a new release be propagated and deployed in two weeks time.

On xrootd - BG shows the following slides:

https://indico.cern.ch/event/1486358/contributions/6309862/subcontributions/530660/attachments/3004887/5296466/HEPIX-XRootD-Monitoring-IPv6.pdf

CG: Was the fstream section of monitoring events when the file is closed

modified in xrootd? - you could clearly see whether the addresses

were IPv4 or IPv6.

BG: The info is not received by the MONIT, but in the monitoring

collector.

DK: Didn't quite understand the slide on MONALISA, where it

says it doesn't report IP-related information.

CG: It's the same as the monitoring collector - we aggregate on the base of site

information, but the protocol information is discarded.

BG: This should be addressed in the medium term - likely by the "next"

Data Challenge (2026 ? 2027 ?)

BH: 2026 may be early for a DC that should prove something for 2030.

DK: Would be nice if we could monitor LHCONE - not just LHCOPN.

Hiro sent status slides and apologies for not registering/attending.

These will be shown on the following day.

Day 2 - minutes - Martin Bly

STILL to be modified **

Thursday: Morning Session.

Present: Carmen, Dave, Francesco (until 11), Costin, Bruno, Martin, Edoardo, Pepe, Marcos (local contact).

Online: Tim, Duncan, Chris, Garhan, Andrea (not necessarily continuously)

(Francesco has to leave around 11am, at break)

Discussion: There is a place in the /proc filesystem that can show the total byte counts on all interfaces (combined) - could we make use of collecting this? Francesco will circulate details, Costin will test out with Alice jobs. /proc/self, /proc/self6. self6 divided by interface.

Review of agenda:

Possible presentation for roundtable from Hiro - may be given on line after 16:00. No others.

Planning for IPv6 only

Plans for ISGC2025 IPv6 talk.

Plans and next steps for the CHEP2024 paper.

Plans for HEPiX (Lugano).

Bruno will give the talk @ Lugano, Dave will create a joint abstract with Bruno. Dave will also send Pepe a proposal for a revised track name to more obviously include identity management.

-----

CVMFS stratum1 clients prefer ipv4 by default on dual-stack systems.

Frontier clients and CVMFS clients will prefer IPv4 unless told to do so otherwise unless the proxy is IPv6 only or is set to prefer ipv6. Should campaign to change the preferred IP defaults to ipv6.

Need to get a clear idea from at least one mostly dual-stack site to see what is actually happening.

If the squids are dual stack, then can make them prefer ipv6 for internal site traffic.

Long discussion. Squid docs show that squid-squid cos should use the system preference for squid-squid coms, so should make sure they are set for system preference to be IPv6.

Traffic PIC-Dubna via CERN shown to be IPv4 despite hosts being dual stack. Pepe will investigate. Possible that Dubna has an old dCache that doesn't have the relevant patches. Bruno also sees outgoing traffic KIT->Dubna on IPv4, but also KIT->SARA.

Dates for next meetings and other urgent issues.

Next F2F - 21-22 May (Weds/Thurs), two half days, with possible informal session before lunch Weds if wanted

Future Zoom meetings:

Thursday 13 Feb at 16:00 CET/15:00GMT

Tuesday 11 Mar at 16:00 CET/15:00GMT

Break

Continuing discussion of monitoring IPv6 particularly on LHCONE. Is traffic generally split such that IPv4/IPv6 proportions can be measured?

Looked at monitoring at Monit stuff at Cern. Appears to be a lot of IPv4 traffic to Dubna (Jinr)

Clearly more work to be done to understand traffic flows.

Planning for IPv6 only

Tony Cass keynote statement that all storage data traffic on LHCOPN must be IPv6 before start of LHC-HL. Experiments constrained but using online farms for some data stuff, need to get them to ipv6 only first. Also the problem of Analysis Facilities - do they have ipv6 access? Are they seen in monitoring - yes if transferring via monitored data transfer protocols/mechanisms - FTS, XRootD etc. For Alice, all traffic is visible in the Monitoring (Costin). Pepe noted that user driven hand launched transfers don't show up in the service monitoring.

Storage will be last to go IPv6 only.

Lunch

Action: Dave request to Shawn McKee to add ingress and egress data for ipv6 for the various sites measured in the same way as for total traffic - maybe (total, ipv4, ipv6) with both ipv4 and ipv6 set to 0 if splitting is not possible.

CHEP Paper:

Should finalise section titles and authors…further discussions. c.f. Overleaf for structure/notes.

Dave will finalised the authors list, need to add Garhan

Introduction (Dave, Edoardo)

Describe WG etc, ref to previous work, mention conclusions from previous paper, rationale for CPU campaign

CPU ticket campaign (Andrea) - described the various problems

Take material from HELP talk in Paris.

Lessons learned from DC24 (Bruno, Carmen)

including issues uncovered (anonymised). Measured use of IPv6. Include issues with SARA->KIT,JINR etc. Stuff about happy eyeballs-type behaviour.

On-going use of IPV4 (Bruno, Carmen)

Things we noticed after DC24, mention squid, loopback stuff, name resolution.

Other observations (Edoardo)

squids, frontier, cvmfs, GitHub
resources that are IPv4 only
other WLCG services going to Pv6,

Plans for IPv6-only (Dave)

New ticketing campaign?
Identify the unknowns

IPv6-mostly (Tim)

For user devices rather than configuring servers, describe IPv6-only

Conclusion (Dave)
References

For more details see Overleaf.

Review of slide from Hiro (BNL)

Notes on current status of IPv6 rollouts at BNL.

Continued discussion on ipv4/ipv6 traffic proportions, particularly re FNAL-CERN, and BNL-CERN.

There are minutes attached to this event. Show them.

Wednesday 29 January
- Wed 29 Jan
- Thu 30 Jan
- 09:30 → 13:00
  Session 1
  - 09:30
    
    Welcome from our hosts - then Introduction, agenda, note-takers and general news 30m
  - 10:00
    
    The HEPiX IPv6 working group 30m
    
    Some history and aims of the group
    
    Speaker: David Kelsey (Science and Technology Facilities Council STFC (GB))
    
    Kelsey-CHEP2023-11may23-IPv6.pdf
    
    Kelsey-IPv6-WLCG-28jan25.pdf
    
    Kelsey-IPv6-WLCG-28jan25.pptx
  - 10:30
    
    Roundtable updates 30m
    
    Starting with updates and news from Sites in Spain
    
    Speaker: Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))
    
    20250129_IPV6_Updates_Spanish_Sites.pdf
    
    HEPiX-IPv6-2025-01-19.v0.1.pdf
    
    HEPiX-IPv6-2025-01-19.v0.1.pptx
  - 11:00
    
    Break 30m
  - 11:30
    
    Status of CPU campaign 30m
    
    And also look at overall service deployment status
    
    IPv6 compute deployment 250129.pdf
    
    IPv6 compute deployment 250129.pptx
    
    Twiki
  - 12:00
    
    Roundtables updates continued 1h
- 14:30 → 19:00
  Session 2
  - 14:30
    
    Roundtable updates - continued 30m
  - 15:00
    
    CHEP2024 paper preparation 1h
    
    First look at status of the paper. To be submitted by end of February 2025.
    Agree the sections to be included and confirm authors per section
    Discuss and agree figures and plots to be included
  - 16:00
    
    Break 30m
  - 16:30
    
    News from USA, ESnet, FNAL, BNL etc 30m
  - 17:00
    Monitoring, perfSONAR, RNTWG/SciTags etc 30m
    
    FTS 10m
    
    Speaker: Mihai Patrascoiu (CERN)
    
    FTS_HEPiX_IPv6_Working_Group_2025.pdf
    
    XrootD 10m
    
    Speaker: Borja Garrido Bear (CERN)
    
    HEPIX-XRootD-Monitoring-IPv6.pdf
  - 17:30
    
    News on identifying use of IPv4 between two dual-stack endpoints 30m
    
    And plans to address issues found
Thursday 30 January
- Wed 29 Jan
- Thu 30 Jan
- 09:30 → 13:00
  Session 3
  - 09:30
    
    Review agenda and plans for the day 30m
    
    And more round table updates - if not finished yesterday
  - 10:00
    
    Planning for IPv6-only 1h
  - 11:00
    
    Break 30m
  - 11:30
    
    Dates of next meetings and other urgent issues 30m
  - 12:00
    
    Plans for ISGC2025 talk 30m
  - 12:30
    
    Plans and next steps for the CHEP2024 paper 30m
- 15:00 → 18:30
  Session 4
  - 15:00
    
    Return to plans for removing use of IPv4 30m
  - 15:30
    
    Return to plans for IPv6-only 30m
  - 16:00
    
    More presentations from USA 30m
    
    BNL - Hiro
    
    IPv6 status at BNL.pdf
  - 16:30
    
    Break 30m
  - 17:00
    
    Final session - to be defined 1h
    
    Work more on the CHEP paper
    More planning for removing use of IPv4
    More planning for IPv6-only

Choose timezone

HEPiX IPv6 Working Group - in person - Santiago de Compostela, Spain

Instituto Galego de Fisica de Altas Enerxias - IGFAE

HEPiX IPv6 Working Group meeting - Santiago De Compostela, Spain - 29/30 January 2025

Day 1 - Meeting - Wednesday 29th January 2025 - starting at 09:30

Start with roundtable introductions

General news

"History and aims" of the working group

- Coffee Break -

Roundtable updates

PF gives a report on the Spanish sites:

BH presents the KIT status report:

https://indico.cern.ch/event/1486358/contributions/6309819/attachments/3004721/5296244/HEPiX-IPv6-2025-01-19.v0.1.pdf

AS presents an update of the status of the currently on-going CPU ticket campaign