HEPiX IPv6 working group F2F meeting
WIFI access: pre-register your MAC address here (contact person: David Kelsey)
Timings are approximate.
The PIN for Vidyo has been distributed by email.
IPV6 Working Group F2F meeting - CERN - Day 1 - 15 Sep 2015
(minutes by Francesco Prelz)
Attendance:
In person: Kashif Hafeez; Bruno Hoeft; Tiju Idiculla; Dave Kelsey; Edoardo Martelli; Kars Ohrenberg; Francesco Prelz; Andrea Sciaba; Ulf Tigerstedt
Remotely:
Fernando Lopez, Alastair Dewhurst, Raja Nandakumar (plus many echoes of himself), Daniel Traynor (QMUL).
DaveK: reviews agenda for today and tomorrow on Indico.
Will resubmit the CHEP article draft tonight merging the latest changes. The deadline for resubmission is unknown. Hope it's not expired already.
Ongoing actions review:
AndreaS: Contacted Marian Babik about the dual-stack SAM3 monitoring. Something should happen in 'September'. Still on-going.
DaveK on the security best practices: will be covered by the security session tomorrow.
FrancescoP: All connections on xrootd seem to be correctly dual-stack and preferring IPv6. Would still like to understand why it was not working for the first half a day after installation (seemed to be looping on various IPv6 and IPv4 connections, then the effect silently disappeared). Still on-going. It would be nice to extend the testbed to other endpoints, but dual-stacking part of the production FAX infrastructure doesn't seem to be feasible. See the point on testbed below.
Roundtable site updates:
Cern (EdoardoM): Have seen an increase of support cases raised by IPv6 users, so it seems that awareness and general use is increasing. Support Desk received basic training on how to troubleshoot IPv6.
NDGF (UlfT): Plenty of news. We found three IPv4-only dCache nodes, and dual-stacked them during a maintenance dcache update window, and... everything went to pot. One exciting day. It turns out that ARC client does not support DPASV in gridftp. The ARC CEs are dual-stacked and prefer IPv6, so the effect was that 600 MB/s of data traffic ended up flowing through just two virtual machines and everything ground to a halt. Also trying to see whether FTS3 works or not with a real dual-stack SRM endpoint: issues may appear when the second dual-stack SRM endpoint appears. Trying to understand whether there is a pending version upgrade that includes either the 'broken' of the 'fixed' version of Globus. IPv6 remains on, though, as the current mode of operation can be made to work, even if a bit slower than usual. There also were a few firewall rules that had to be changed during the downtime. Alice doesn't seem to complain when chancing to run xrootd on dual-stack worker nodes.
AndreaS: There's a ticket (last update July 10th, status 'unresolved') in Jira mentioning IPv6-only transfers failing. Priority is minor. Could be raised a bit. See ticket here: https://its.cern.ch/jira/browse/DMC-681
CMS (AndreaS): Marian Babik answered. There's a Nagios meeting going on these days. An IPv6 item will be added to the agenda. They are happy to work on that. There is a section in the WLCG operation portal on common interest topics. Would be useful to have an overview IPv6 article to raise interest and point to existing documentation. No news to report from CMS proper.
DESY (KarsO): no real news, "stable operations", even after enabling dual stack on eduroam... Lucky him.
INFN (FrancescoP): CNAF still has part of their network on IPv6 (advertised on both LHCONEv6 and LHCOPNv6). The Milan Tier2 has general-purpose IPv6 routing (other than LHCONEv6) active, out of somewhat ad-hoc and manual BGP configuration made by GARR. Question to Edoardo: What is the current supposed status of LHCOPNv6 and LHCONEv6 ? A: Should be stable and in production. No ripples expected. Could spend just a couple of days on xrootd testing since last meeting, then had to move house and family (aching back squeaks in the background).
KIT (BrunoH): no real news from the WLCG side. The data exchange volume on IPv6 at KIT is growing, though not exponentially, with more experiments using it.
RAL T1 (Tiju): see slides attached to Agenda page:
https://indico.cern.ch/event/401132/session/0/contribution/2/attachments/1154640/1659113/GridPP35-RAL-IPv6.pdf
Items not in the slides:
Found an issue with Quattor and how it stores addresses in its DB. DaveK: To be posted into our knowledge base.
Heard that FTS3 can talk to S3 endpoint. If so, will test it over IPv6.
LHCb: (Remote Rajas): see slides attached to Agenda page:
https://indico.cern.ch/event/401132/session/0/contribution/2/attachments/1154640/1659093/LHCb-update.pdf
- 1/2 hour coffee break while we try to remove the echo creating the
multiple Rajas.
ATLAS (AlastairDW): See slides. There's an ongoing issue with pilot factories. These were reverted back from
dual-stack : dual stack CEs couldn't talk with IPv4-only production pilot factories @CERN, only with development ones.
No (more) problems with pilot factories based on Condor.
EdoardoM explains the "IPv6 ready" tick mark in the CERN network DB. Machines that are not "IPv6 ready" will still get a reply to DHCPv6 requests, and *will* get an AAAA record inserted in the DNS in the '*.ipv6.cern.ch' address space. They won't be reachable from the outside though, due to firewall rules blocking inbound connections. "IPv6 ready" machines will instead a top-level '*.cern.ch' DNS record and be reachable from the outside. It's a subtle difference.
Some discussion takes place as to what may be the correct setting for pilot factories at CERN. The only way to make really sure that a machine has an IPv4 address *only* is to disable the IPv6 stack, but we don't want this to happen as it would be in the way of progress in the IPv6 transition.
DanielT: A similar issue occurred at Queen Mary only when a DNS entry was published pointing to a non-routable IPv6 address. FrancescoP: RFC6724 will prevent a global scope destination address to be matched to a link-local source: a silent fall-back to IPv4 should occur in that case, and I wonder what is preventing this.
AlastairDW: we then need to find the right set of people to sort the details and directions of the TCP connections established for the operation of these pilot factories.
DaveK: In short, machines at CERN should either have the IPv6 stack turned off entirely or have IPv6 configured properly (e.g.: without a DHCPv6 client running and issuing DHCP requests they won't be getting any publicly routable IPv6 address).
QMUL (DanielT) Other than the dual-stack CE issue discussed so far, there are plans to move other services (squid, xroots, etc.) to new hardware, and use this as a chance to enable IPv6. DaveK: should probably start with test instances of these. Will come back on this when making testing/testbed plans later.
Thomas Finnern says hallo, enjoys the online debugging, and wishes a good continuation of the meeting.
PIC (Remote FernandoL): no news.
Next topic: Testing.
First item is the future of the testbed. Tony Wildish passed an info package to Ulf, who agreed to resume operation
of the testbed, which is now shut down.
Other ongoing test activities: xrootd. Tiju: gridftp and S3 ports into ceph could also be a candidate.
DaveK: the existing FTS3 transfer mesh mostly negotiates data transfers that use gridftp as an actual transfer protocol. We could in principle mix+match various protocols.
UlfT: Beware: xrootd has an unencrypted control channel, that could be hijacked to exploit write access. Wonder how Alice copes with the possibility of a MITM attack.
In order to test the proper operation of the native replica handling (choice and fall-back) in xrootd, more remote endpoints are needed. Something like the gridftp mesh we had at the very beginning, with the same IPv6 VO access rights. Francesco P. will put together a KVM/QEMU Centos7 VM with a canned xrootd instance for the purpose and distribute it.
Ulf T. will revitalize the testbed by adding a richer set of protocols into the FTS3 file transfer mesh.
Status of adding dual-stack to production endpoints. Is anybody else planning to turn on more dual-stack production services
in the future or near future?
Nobody, really - but we seem to have run out of steam for testbed discussions.
On the CHEP paper: the paper master is still kept up to date under https://github.com/prelz/hepix_ipv6/tree/master/chep2015_paper
All received comments were addressed.
There's two issues still worth touching:
1) Review the end of Section 8 ('This implies'... what is 'This' ?) Changed to 'Currently this implies'.
2) Beginning of section 6: it read "all of these services needed configuration changes", but not all of the listed services needed
config changes.
After a short discussion on whether it's wise to recommend to bind on '::' to achieve dual-stack binding to 'localhost', as suggested in section 6, the meeting is called to an end.
IPv6 Working Group meeting - Day 2 at CERN - Wednesday 16th Sep 2015
(minutes by Ulf Tigerstedt)
Starting at 09:30-ish:
- testing:
- try xrootd
- testbed resurrection
- perfsonar records dualstack traceroutes, so following history is possible.
- continue using -ipv4/-ipv6 hostnames to try to separate ipv4 and ipv6 transfers in FTS3.
- dual-stack mesh is mostly yellow-red. Latency tests have disappeared.
- Fermilab has ipv6, but problems with registering it in gocdb/OSG
- BNL is ignoring ipv6
- SARA/NL-T1 does not have ipv6 yet.
- No news from Russia.
- Kisti. Started with ipv6..
- ASGC not a T1 any more?
- RAL. Working on it.
- OSG has ipv6 as a priority.
- We need to push ipv6 perfsonar deployment.
- Should we do a new wiki with current status? Yes, via the ops
coordination meeting on Thursday.
11:00:
- Security, following DaveK's slides.
- No NAT, so it will cause some headaches.
- RAs are a problem, since they can be forged.
- RFC4890 should probably be implemented. cern and desy already do it.
- Firewall recommendations: Close everything, allow only what you need.
- thc scan penetration testing.
- Update the knowledge base with new info, mostly regarding
hardening networks.
- should we try to get a half-day at the wlcg workshop? After discussion it was agreed that this is not a viable option
- Next vidyo meetings 29.10 16:00 CET and 10.12 16:00 CET
- next f2f meeting 21-22.1.2016 at CERN