HEPiX IPv6 Working Group meeting.  CERN.  25-26 Jan 2018.

Day 1 - 25 January (notes by Ulf Tigerstedt)

 

.. starting a bit late, and having Vidyo problems.

Participants: see list of registered people on agenda page

https://indico.cern.ch/event/676532/

Remote: Brian Davies, Martin Bly, Fernando Munoz, Raja, Alistair, Tim Chown

 

Roundtable introductions

   * Ulf received thanks for participating in the WG (as leaving at the end of January)

   * Brij Kishor Jashal bringing some Indian network knowledge

   * Brian Davies takes care of T2 storage at RAL

 

Agenda

No changes.

 

Reviewing actions

    - Actions, most done but Raja's point has had no progress

 

Roundtable updates

   PIC: Not many changes due to holidays, workernodes are at 50% dualstacked.

   CMS: Quiet. RAL CMS VO-box will get dualstack soon.

   Brian: Mixed results checking if ipv6 or ipv4 is better for perfsonar. Notable is a limit of 500 Mbit/s for some remote sites from RAL.

   Alastair: Atlas jobs working fine on ipv6-only workernodes (SiGNet).

10% of files, 20% of throughput now over IPV6. Event index for Atlas uses CERN STOMP servers that are ipv4-only. CERN STOMP cluster is large and hard to upgrade apparently.

   Tim: Nothing.

   Ulf: vomsd ipv6 problems, might be local config problem.

   Edoardo: cern has a problem where VMs boots up before dhcpd6 is ready to serve the correct address. EOS Atlas is now dual stack. 20% of traffic now 20%.

   < coffee break >

   Rajan: <see slides>. So far separate firewall for ipv6, since the old one does firewalling in cpu.

   Andrea: problems with VMs and IPv6 at CERN.

   Brij: Report on India networking

   Kars: Presumed nothing to report

   Francesco: Only one Italian site completely unable to do IPV6, INFN Management is supposed to do a decision.

   Imperial: Some DPM sites have gone dual stacked.

   LHCOPN: Kisti still missing.

 

Tier 2 status

   14% of sites have completed deployment, 34% in progress, 29% On hold, 22% no reply. No objections by any site.

 

Training, documentation

   Do we need any documentation for WLCG-specific IPv6 issues?

   Should there be a page with info on where to get help?

   Should there be more "lessons learned" info for the storage software?

 

Day 2 - 26th January 2018 - notes by Francesco Prelz

Dave K. reviews the agenda: we should spend some time on planning for the CHEP submission.

* Herve Rousseau gives us an update on CERN Storage, actually EOS, "the only system that has seen some improvement". The Ceph S3 service is IPv6-only behind the curtains.

Slides: https://indico.cern.ch/event/676532/contributions/2769106/attachments/1590093/2516000/EOS_IPv6_status_26_01_2018.pdf

Dave K. asks if there are any monitoring/statistics measurements that could be collected - we realised that the FTS data transfer plots don't include xrootd, and we'd need to track both separately. Generally speaking, IPv6 overall data transfer statistics [at least] on CERN systems would be welcome.

Duncan R. volunteers to follow up on this, and mentions that HTTP-based transfers should be watched as well, but FTS+xrootd should account for the majority of transfers.

Dave K.: how does Ian Bird get his overall integrated transfer figures ?

Duncan R.: Look at http://monit-grafana.cern.ch -> WLCG dashboard

Applying an 'IPv6' filter to the XROOTD statistics produces empty plots though.

Andrea S.: Asked Luca Magnoni about this, and he says that no IPv6-related info is obtained from IPv6.

Dave K.: Perhaps some optional logging should be enabled, as for FTS.

Some time is spent selecting various data from the Grafana dashboard.

Tim C. points out that Globus (still) says (in the "GT6.0 component guide") that IPv6 support in gridftp is 'experimental', and this may steer people away from it.

Francesco P.: Perhaps 'experimental' refers to the part of the  'extended' command set used in gridftp that is not documented in any standard.

Dave K.: we should ask the collaboration that negotiates with Globus (Brian Bockelman?) to have that statement updated.

* Plans towards the CHEP presentation/paper:

Dave K.: We have a lot to talk about, at the time for Tier-2 transition. I was thinking this might be our final presentation, but many suggested it probably won't - especially looking finally at the IPv6-only scenario.

   We should list what highlights should be given in the paper, and whether there is work to do.

   Tim was suggesting in his Jan 4th e-mail that emphasis should be given to the ipv6-only plans.

Tim C.: Performance measurements would help. Even if they show no improvement but give arguments for possible reasons.

Dave K.: We do see better efficiency for IPv6: our initial explanation is that the 'best' sites moved to IPv6, but we may try checking this.

Tim C.: Lack of fragmentation should help with the performance. Lack of NATs/Transparent proxies should help. Should compare IPv4 native, with or without NAT, and IPv6 native. Larger headers could worsen performance (how widely are jumbo frames used?).

 

Duncan R. shows data in http://pprc.qmul.ac.uk/~lloyd/gridpp/ukgrid.html where significantly better data transfer performance is measured for xroot6 over xroot4; http6 over http4. Statistics are from the last 72 hours.

Dave K.: Are the 'Dirac Network test Result' different from perfsonar ?

Dave K.: As the conference is close to holiday time, we'd probably better have a text draft ready by our next F2F meeting, in May/June.

Francesco P.: How much work can we afford to spend troubleshooting the performance differences observed in the Queen Mary statistics ? The findings, if we were lucky, could constitute a significant performance 'carrot' to promote the transition, but dissecting the performance data would require to set up a small project and the collaboration of at least the network managers of the involved sites.

Tim C.: Hard facts is what would make the paper interesting.

Francesco P.: Will try to see, timesharing allowing, if the IPv4/IPv6 asimmetry can be reproduced in xrootd transfers between worker nodes in Milan and CNAF.

Dave K.: We have enough facts to present in the paper, with the T2 transition, anyway.

Dave K.: We can reuse the github arrangement that was set up for the previous papers. Francesco P. to create a new template.

After the coffee break Marian Babik gives his Monitoring (perfsonar + ETF) talk.

Slides: https://indico.cern.ch/event/676532/contributions/2769107/attachments/1590158/2516135/perfSONAR2FETF_IPv6_1.pdf

 

Dave K.: There were times when we had a lot of "orange" on the perfsonar

   dashboard, with timeouts, etc. Is this getting better ?

Duncan R.: Not really. See:

   http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=Dual-Stack%20Mesh%20Config

Marian B.: These are the 'instabilities' referred to in the slides.

Dave K.: Who is supposed to act on these. Someone in Chicago ?

* Picking dates for the next meetings.

Next face-to-face meeting: Tuesday-Wednesday June 5-6, 2018 at CERN.

Next phone meetings: Friday, March 2nd, 2pm CERN time.

                      Wednesday, May 2nd, 4pm CERN time.