HEPiX IPv6 Working Group - virtual F2F meeting

Europe/Zurich
Zoom

Zoom

Description

In place of the normal face to face meeting at CERN - a virtual Zoom meeting.

Please register to say you will attend the Zoom meeting - connection details will be sent to those who register. 

 

 

 

Registration
IPv6 virtual F2F participants

IPv6 F2F minutes - 20200929 - Day 1

Notes by Duncan Rand

https://indico.cern.ch/event/959585/

 

present: D.Rand, D.Kelsey, S. McKee, J. Chudoba, B. Hoeft, M. Bly, Costing, D. Stockdale, F. Prelz, M. Babik, T. Chown.

 

Shawn delivered his Update from RNTWG - Packet Marking subgroup.

Starting with 8 different science domains (experiments): ATLAS, CMS, LHCb, ALICE, BelleII, SKA, LSST, DUNE and two applications: perfSONAR and XRootD.

Can we add in Pierre Auger please? Forseen timescales: see bits being marked by end of 2020. Should be in xrootd relatively soon. Also working with iperf3 and perfSONAR developers.

Experiments want packets to be marked so they might start to push IPv6 as a result. In discussion with dCache developers too. Right time to do this IPv6-only. On path devices may also inspect packets. Is there an IETF document being prepared - not such a bad idea to document it and feed back to IETF. See also RF8799 - limited domain document. How restricted is WLCG to R&E networks? WN in commercial clouds and Geogrid in Germany might be examples. Two main questions: 1) is there a really bad unforeseen problem 2) can you influence experiments to uptake this idea? Looks like it will be a very useful tool. By the way every packet gets marked, not a sample.

 

Roundtable

 

Dave Kelsey reported input from Edoardo:

Edoardo: reported that CERN have made proposal about new data centre IPv6 only addressing, non-routable IPv4. Proposal was rejected.

Kars: DESY Zeuthan is now dual-stacked.

 

Martin Bly (RAL):

100G connection CERN-RAL OPN is live for testing purposes. Looking at migrating to it. Programme to upgrade local networking at RAL Tier-1. Can we make it IPv6-only? General purpose internet will be the back-up if this link fails. Long running packet loss problem is ongoing. User community are complaining about slow IPv4 - mainly CMS.

 

Bruno H (KIT/Germany): Getting rid of IPv4 in our administration at KIT.

 

David Stockdale (Imperial College): Discussions of how much public IPv4 addresses we need, e.g. for wifi. Just brought on a new hall of residence with DNS64, NAT44 and NAT64. 47% traffic was native v6, 20% NAT44. How to better understand the 25% of NAT44 traffic?

 

Jiri Chudoba: Internal usage of v6 at Prague by ALICE - no major changes. Big discrepancy of volume of v6 usage by ALICE with respect to ATLAS who use much more. Costin will look into it.

 

Costin: Most ALICE Tier-2s have upgraded to dual-stack, but not RCKI. Una in Mexico failing tests but not answering the ticket. 75% dual-stack.

 

Tim Chown: NTR

 

Marian Babik: Will talk about perfSONAR and ETF tomorrow.

 

Duncan Rand: NTR not much to report from the UK, several sites still not dual-stack. Of the big sites Glasgow storage is not yet dual-stack, but perfSONAR is.

 

Francesco Prelz: Not much possibility of setting up a testbed for IPv6-only testing at INFN Tier-1. Not much interest in IPv6 in Italy it seems. However, T2 mostly done - Torino only site not to add IPv6.

 

Tier-1 status

Russian colleagues: Dubna now dual-stack but in colleagues in Moscow still IPv4.

 

Tier-2 status

It seems we are now left with the sites which don’t answer the tickets - many of them.

ATLAS is on 62%. However DESY-Zeuthen is done.

 

IPv6 only testing:

Testbed at RAL?

What about dual-stack worker nodes, should we push for these?

Dual-stack to dual-stack still using v4? How to debug this?

 

Day 2 - 20200930 

Notes by Martin Bly

Monitoring etc.

Marian Babik:

Perfsonar:
288 Active personar nodes. 5k+ routes.  4.2.4 lastes version. 4.3.0 not released yet (python3 support). 
Traces added to all latency meshes. 
New meshes for aditional communities. 
Developes F2F in June - plan to move to ELK stack. 
Session at TechEXtra 2/11 (link in slides).
Meshes review: 
OPN: SARA offline, issues with PIC and Russia site. IPv4 and IPv6 somewhat similar.
LHCONE: (tests sites to NRENs).  Lots of grey, Geant NRENs are more capable end points.
(sites can subscribe to alerts but it's a manual process - need to send and email.) Still working on network perfomance alerts.  T1s not in the LHCONE mesh - Discussion - adding T1s to ONE mesh may cause confusion.
New 100GbE mesh. No tuning so only showing 10% link bandwidth.
No change sin platform architecture. 
Plannning to get cleinst topublish results directly to improve latency of availability of results.
Plans: 
    improved infrastructure monitoring.  
    Want to add IPv6 flow lables to all expt meshes. 
    Integration of IPv6 toolkit by SI6 networks - collaboration with Fernando Gont.
Question: slide 7 (100G mesh): is it a snapshot?  Yes. Prague/Imperial shows issue, could be transient or a longer term issue.  Click through on mesh to see time series as usual. Bookhaven is 2 x 40G not 100G, will be 100G in future.

Experimnets Test Framework (ETF).
Serviec abvailability monitoring (SAM) tests. 
Rewrite of JESS (job submission system). 
WN-uFrameWork - executes tests on worker nodes. Basic schedulig of tests. 
Challenges and plans: Atlas, CMS, LHCB running ipv6-only instances.
ATLAS requested ipv6 profile in monit.
Rolling out new ETF features (order Alice, lhcb + atlas, cms)
K8s prototyping ongoing, to be tested in QA in future.
Infrastructure keeps evolving - more complex landscape.

DPK reported some IPv6 numbers ~ 52% of taffic IPv6, 76% of T2s using IPv6.  heading towards IPv6 dual-stack everywhere and 50% traffic on each protocol.   How do we steepen the gradient? 

Bruno:  Small ipv6-only WN set: test the middleware stack, identify ipv4-only storage.  
Marian: Already known that some middleware and experiment stacks have ipv4-only elements.  Duncan / Andrea will investigate with teh Brunell (?) IPv6-only WN.  Raja reports that for LHCb jobs work OK on it.

(Duncan in chat: dc2-grid-25.brunel.ac.uk https://bigpanda.cern.ch/site/UKI-LT2-Brunel_ipv6_TEST/
But it might be dc2-grid-70.brunel.ac.uk.  I’ll let you know.)

Packet marking would be one argument in favour.
Less work to run single stack ipv6 rather than dual stack.

(Break).

Edoardo: reason for rejection of ipv6-only in new CERN DC - IT colleagues needing to provision stuff.    Going to contract for a package datacentre, 2 years to build (ish). IPv4 negotatied down to 2000 addresses for temporary DC containers.  IP resource not an issue at CERN as an argument for IPv6-only. 

Submissions to conference (DPK):
HEPiX October: IPv6 abstract accepted.
GDB clashes - IPv6 highlights
ISGC/HEPiX (2 weeks in March 2021): Call for papers for ISGC - tweak abstract from March 2020 meeting.

Next meetings:
Thursday 22 Oct 16:00 CERN time
Thursday 26 Nov 16:00 CERN time
F2F Virtual 15:00-17:30 Tuesday 19th, 09:30-12:00 Wednesday 20th CERN time.  
    Possible in-person F2F at CERN if travel possible and CERN accepting visitors.

DPK: 
    Encourage work to investigate why remaining data transfers are not going over IPv6.
    Encourage participants to set up ipv6-only WN testbeds.
        Bruno, Martin, maybe.  
        Not likley at Imperial because CMS believe more trouble than it's worth.
        Likely to be issues with provisioning - PXE over IPv6-only rather problematical.
    DPK will see if EGi can allow Andrea to work on ....

 
 

 

 

 

 

There are minutes attached to this event. Show them.