HEPiX IPv6 working group F2F meeting
WIFI access: pre-register your MAC address here (contact person: David Kelsey)
Timings are approximate.
The PIN for Vidyo has been distributed by email.
HEPiX IPv6 working group F2F meeting at CERN
Day 1 - Thursday 21 January 2016
(notes by Francesco Prelz)
Attendees: Terry Froy, Costin Grigoras, Bruno Hoeft, David Kelsey, Edoardo Martelli, David Mitchell, Raja Nandakumar, Kars Ohrenberg, Francesco Prelz, Duncan Rand, Andrea Sciaba', Ulf Tigerstedt, Ramiro Voicu
Attending remotely: Alastair Dewhurst, Dan Traynor, Kashif Hafeez, Michael Steder
Apologies: for today from Fernando Lopez (PIC)
DaveK welcomes new members - Terry Froy (QMUL) and David Mitchell (ESnet rep at CERN).
DaveK reviews the agenda and mentions the IPv6 security recommendations to be presented in Taipei at ISGC2016.
Minutes for the last meeting and actions are reviewed:
1) Action on Dual-stack SAM3 and Nagios monitoring is done.
2) The Security best practices document preparation is ongoing.
Status will be discussed tomorrow.
3) A docker container and instructions to join into an xrootd testbed
were circulated on Dec. 23, so the 'setup' action is considered done.
We now need to make some plan for actual testing. Te be discussed in the
testbed slot.
Roundtable updates
Experiments first:
RajaN (LHCb). See uploaded slides.
DIRAC seems to be OK on dual-stack, and tested in reasonable detail.
VOMS reported to be functional, even though the VOMS server has known issues.
DuncanR: did you manage to run jobs on an IPv6-only WN ?
RajaN: see slides. No issues reported with running dual-stack servers, and this should be enough.
One possible issue is with github, which is currengly IPv4 only. There is at least one open ticket asking for IPv6. Should we post more ?
CostinG (Alice): No major changes. CERN was set up with a dual-stack VO box. Not much activity on the sites.
UlfT: You are using IPv6 with us.
CostinG: There are few exceptions.
Sites are encouraged to upgrade to an IPv6-enabled version of xrootd.
UlfT: AliROOT uses an ancient, IPv4-only version of xrootd. That should be the starting point for an upgrade.
CostinG: Two distinct uses of xrootd in Alien and the Alice code were disentangled from each other, and this should make the upgrade easier/possible.
DaveK: Looks there are no major showstoppers. Trouble is that most site are not IPv6-capable.
CostinG: Yes, it's a very slow process.
UlfT: The CERN CRL distribution host seems to have a AAAA record, but is not reachable over IPv6.
AndreaS (CMS): No real news. Situation: all the main system components were tested. Level of readiness rated at 90%. GlideinWMS was tested extensively in OSG. xrootd was tested, too.
We are lacking an official statement by the management, clear directions to the sites and a complete system validation.
Andrea will try to get some position statement from the experiment computing management.
DaveK: Without a clear strategy there is no push on the sites...
AlistairDW (Atlas): deferred to later
Site reports:
Edoardo (CERN): No burning news.
DaveK: Any evidence of traffic/usage growth?
EdoardoM: Peak of outgoing IPv6 traffic shortly before Christmas and after the LHC shutdown that lasted a couple of weeks. Unfortunately it was noticed too late to be profiled.
CostinG: May have been Alice transferring data into NDGF...
AndreaS: What is the status on worker node address exhaustion ?
EdoardoM: shows "IPv4 depletion at CERN: status and plan" confidential slides.
Usage of IPv4 addresses decreased for static allocation increased for dynamic allocation (e.g. portable devices). Total assignment for these two is around 100000, ~80% occupancy. Around 30000 could be spared by reorganisation.
For data production machines, both at CERN and Wigner, usage of the IPv4 classes is around 50%, but utilisation rate is high, and the 17000 remaining addressed at Wigner may be exhausted in 12-24 months. The maximum capacity at Wigner is to be reached in the same timeframe, so this isn't as bad as it looks. The Data Center at Geneva has 72000 addresses left, which should last a few years.
One /18 class will be used for the new CERN wi-fi infrastructure.
FrancescoP (INFN): An IPv6 training day was presented by Francesco at CNAF on Jan. 12th.
It was well attended, but the practical part was plagued by some issue with the 2-year-old Cisco access-point used in the meeting room, which would happily let UDP on IPv6 through, but would block TCP connections on IPv6 to and from some sites (apparently at random). Everything was OK on a wired connection on the same VLAN as the access point. IOS release notes mention
'IPv6 support' explicitely from two minor versions later than the one installed on the access point. A firmware update will be tested...
Docker container for xrootd testing was assembled. Instruction for installing it were circulated.
E-mail exchange statistics at Milan (collected since 2010) show an apparent inversion of trend.
See http://orsone.mi.infn.it/~prelz/ipv6_stats_trend.html for details.
A few domains (including CERN) seem to have dropped IPv6 e-mail transfer, at least for outgoing messages (they seem to be still reachable on IPv6).
BrunoH (KIT): We lost part of our IPv6 peering to CERN. Trying to troubleshoot the problem with EdoardoM.
Thomas Hartmann, one of the most active people on IPv6, left the site for DESY.
Need to bring the firewall configuration to a more state-of-the-art state.
DaveK: What happened to the FTS3 testbed that Thomas was running?
BrunoH: It's still running, but somebody needs to maintain it.
DuncanR (Imperial): No news. Joined LHCONE, but initially with no IPv6 peering. This should have been corrected in early January.
KarsO (DESY): No news. Usage stable.
UlfT (NDGF): Nice to see data traffic flowing.
Needed to troubleshoot routing problem for Alice just before Christmas, and also a routing loop that occurred in Bergen.
More test resources should appear in two weeks, providing a nice environment to provision test VMs.
On the other hand, IPv6 is working fine *for production*.
TerryF (Queen Mary): Still issues with the ATLAS pilot factory talking to the CEs, traced to IPv6 misconfiguration on the pilot factory.
Have a prototype for 'doctored' DNS that will send A records only in response to queries coming from CERN. This was done to deal with client nodes at CERN not properly falling back to IPv4.
AlastairDW: This is a solved problem. The solution needs to be deployed.
TerryF: We are also in the process of refactoring the internal network (renumbering). In the same process the worker nodes will be prepared for IPv6. A NAT (IPv4) solution allocating a port range to every node to allow for easier abuse tracking is being deployed.
Will now look at the range of grid services deployed at Queen Mary and report on the findings. Things may break, but we try to roll back quickly...
We miss Tiju for a RAL report, but likely little news as he's taken some time off. He'll likely do a bit more Squid testing.
Kashif joined the conference later, with no further news.
Plans for testing
UlfT:
Everything works with manual testing of FTS3. But things grind to a halt if Tony's framework for submitting jobs is used. Propose to set up an
FTS3 server in Umea for easier access to the logs.
BrunoH will circulate the URLs for log access.
(Done: https://fts3-kit-02-hepix.gridka.de:8449/fts3/ftsmon/#/)
The testbed has currently 5/6 endpoints, all running dcache.
BrunoH proposes to set up another server at NDGF for testing, while keeping the one at KIT.
FrancescoP:
To establish am xrootd testbed, we need to find a few volunteer sites to run the docker container. Exercising a separate xrootd testbed makes sense only in the hope of finding new issues before they hit production, and this can happen if we can explore the vast configuration space of xrootd beyond what is currently used in production.
In order to join the testbed, a machine with public IPv6 address, a host certificate and running docker with IPv6 enabled is needed.
Status of LHCOPN and LHCONE peerings
BrunoH reports little change w.r.t. the last report. Seven Tier-1's appear to be reachable on LHCOPN, and eight over LHCONE, but not all of them are running a reachable Perfsonar server. The perfsonar of In2p3 is reachable on LHCONE only.
CNAF seems not to be reachable on LHCOPN.
After a meeting in South Korea a new IPv6 LHCOPN announcement appeared from Thailand. The peering to FZU (Prague) is up, but the perfsonar seems to be not reachable. A new peering also came from U. Michigan @ Ann Arbor.
but their perfsonar hasn't been added to the mesh yet.
DaveK: The conclusion is that there's still work to do, even to get to the first step of having a reachable perfsonar service.
DavidM: There may be more that an IPv6 vs. IPv4 issue in getting multiple organisations to join into this kind of testing and linting all the associated issues...
ETF ("Experiment Testing Framework")
Andrea Sciaba' presents the results of running ETF , the successor of the SAM ("Service Availability Monitor") distributed test framework on IPv6 (see presentation on agenda page:
)
Connections made by CondorG into CREAM don't honour the condor_config ENABLE_IPV4=False setting, but this is expected: the CREAM client library has no access to condor_config.
Proposal to allow IPv6-only worker nodes
AlastairDW presents his proposal for IPv6-only WNs to be effective in April 2017, as detailed in the document attached to the agenda page:
Various implications of finally *asking* people to enable IPv6 to provide enough accessible IPv6 storage by the proposed April 2017 deadline are mentioned. Some services (proof ?) may have to be reached via proxies.
Should we start EOS testing @CERN ?
BrunoH: Would it make sense to move this to the end of Run2, 2018 ?
AlastairDW: It wouldn't hurt if people objected to the proposed plan and asked for this to be moved to the start of Run3, but it's better to be more aggressive with the dates.
DaveK: Today we could do little even if people came to us with plenty of IPv6-accessible resources/storage, so we need to keep advancing the readiness of the software.
AlastairDW: We don't want a site to think they are a "second-class" site if they can run a limited set of software (the set that can work on IPv6-only sites).
DaveK: What sort of 'economic' incentive can we provide ?
We need to understand whether the proposed deadline is tenable.
Will sleep over the issue and come back to it tomorrow morning...
---- end of Day1 ---
Day 2 - Friday 22 January 2016
(notes by Raja Nandakumar and Dave Kelsey)
Attendees: Terry Froy, Costin Grigoras, Bruno Hoeft, David Kelsey, Edoardo Martelli, David Mitchell, Raja Nandakumar, Kars Ohrenberg, Francesco Prelz, Duncan Rand, Andrea Sciaba', Ulf Tigerstedt, Ramiro Voicu
Attending remotely: Alastair Dewhurst, Fernando Lopez, Kashif Hafeez
Plans for CHEP 2016
HePiX in Berkley the week after CHEP - interesting to have a 2-week trip to California?
We have been successful in giving talks in last 3 CHEPs. Do we want to do one more?
-- Also had a written paper in the last two CHEPs. Useful to have a published refereed paper.
-- Useful to submit an abstract as ipv6 work is neither stopped or completed.
-- No panic as yet as the CHEP call has not gone out, but useful to have an idea of what to do.
-- Make it more of a thrust from the experiment side maybe? Who gives the talk? One of the experiment reps?
---- Make a case for ipv6-only as in the draft proposal from Alastair?
---- Monitoring?
QMUL has a few test ipv6-only WNs. Could try to test some in production for CMS.
-- and maybe even ATLAS. Alastair to be contacted about it.
A Canadian site has been genuinely interested in setting up a ipv6-only site for WLCG.
Experiment data available on dual-stack / visible from ipv6-only machines.
-- How do we measure it?
-- # storage endpoints, fraction of data.
-- For LHCb, the above numberes are 0 - correction : Imperial supports, so > zero
-- For CMS they have some sites, especially in the US
-- Can these numbers be put into a SAM / ETF test? Will make it simpler to get the numbers.
---- ETF will likely be dual stack. In production ~ March 2016
-- A few ways of testing.
---- ping, but some sites can block ping
---- ssh port, but this too can be blocked
---- SAM tests have full knowledge of what possibilities are available.
-- If experiments are okay, sites should not be afraid to turn on dual-stack services.
---- Issues will be only temporary and can be worked around
---- To start with, we should request sites around the table to go dual stack as far as possible.
---- QMUL : Almost everything dual stack, except WNs.
---- PIC : not yet dual-stack, as the dcap client segfaults. ipv6-only WNs not possible due to Maui / batch
system issues. Going to replace batch system with HTCondor - evaluation ongoing.
---- KIT : Evaluating and running in test. Not yet production ready.
---- CERN / AFS : AFS possible being deprecated and CERN looking at EOS in the long term future.
---- INFN : Useful to have an ipv6 day to make and break things
---- CMS : Production / test dual stack storages in - Wisconsin, Brunel, Nebraska, DESY, QMUL, Purdue, Oxford
---- Oxford : Hampered by their central networking who does not support ipv6 well
---- LHCb : Imperial already is dual-stack. So, can run tests against them - Action, Raja.
Note : Some environment flags can be set to force xroot to read data over ipv6. Test that at Imperial?
ISGC 2016 : ipv6 security and plans
Attendance : Dave K, Dave M, Eduardo and Bruno
-- Lots of advice on ipv6 and security
-- Need a slide on ICMPv6. Make sure we do not block the necessary ICMPv6 broadcasts. Refer to the
relevant RFC?
---- Probably if in doubt, rate-limit, but not block?
---- Use the copy of address management plan from QMUL as an example? Also possibly from Imperial?
-- Add experience regularly to knowledge base
-- Also check up with JaNET about their ipv6 advice.
-- Also need to discuss with EGI security
(RajaN leaves so DaveK takes the notes from now on)
More discussion on AlastairD's draft paper on support for IPv6-only WNs/VMs
DaveK: Is it mainly to support use of IPv6-only opportunistic resources or also to allow sites to pledge IPv6-only CPU? Answer - would like to allow both, but we need to take some of the risk away from the sites doing this. We could offer tolerance of a reduction in the level of availability, e.g. during 2016 as we work towards this. General agreement on this approach. AlastairD will work with the other experiments on the production of the next draft document.
SIte survey of IPv6 Readiness
This was last done in the summer of 2014. DaveK will ask sites to check and update their entries in the table.
https://www.gridpp.ac.uk/wiki/2014_IPv6_WLCG_Site_Survey
Dates of future meetings
The following was agreed:
Thursday 25 Feb 2016 - Vidyo - 16:00 - 17:00 CET
Thursday 14 Apr 2016 - Vidyo - 16:00 - 17:00 CEST
Next F2F meeting at CERN - Wed/Thursday 18/19 May 2016 - still to be confirmed.
Possible date for one-day IPv6 workshop at CERN - to be confirmed with Ian Collier (chair of WLCG GDB) - Tuesday 7 June (or also partly on GDB day of Wed 8 June).
Future workshops and publicity
UlfT reports that next week there is a NeIC meeting where he will give an IPv6 tutorial.
We could offer to give more IPv6 training to HEPiX at the upcoming DESY Zeuthen meeting in Berlin (18-22 April).
AlastairD says that the ATLAS site jamboree is too close to organise something but he would like to get IPv6 on the agenda for the ATLAS software week at the end of February.
Review of new actions
- AlastairD. Work on IPv6-only document together with other experiment reps.
- AlastairD. Get one slide input to th ATLAS site jamboree and ask for longer session at ATLAS s/w week.
- AlastairD. Work on testing IPv6-only WN.
- DaveK. See if HEPiX is interested in IPv6 training in April.
- DaveK. Distribute draft IPv6 security guidance for more feedback.
- BrunoH. Update LHCONE IPv6 wiki.
- UlfT. Take on maintenance of checking IGTF CA CRL status re IPv6.
- AndreaS. EOS - can they deploy dual stack on production services?
- AndreaS. ETF - mods to determine IPv6 status of services.
- RajaN, CostinG, AndreaS. Work with Alastair on his document.
- DuncanR. check and update the Dualstack perfSONAR mesh
- EdoardoM. check how best to set the IPv6-ready flag for VMs.
- EdoardoM. check/decide what can be published re IPv4 exhaustion at CERN.