IPv6 Working Group meeting at CERN - Day 1 - 11 Sep 2017

Present: Eduardo, Dave, Duncan, Xavier, Catalin, Andrea, Francesco, Ulf, Raja, Martin, Costin, Bruno, Herve

(Notes by Raja and Francesco)

Introduction :
Discussion about agenda.
DaveK to present the status to the GDB on Wednesday and a HepiX talk at KEK.

Discussion about GGUS ticket https://ggus.eu/?mode=ticket_info&ticket_id=129946
--> Possibly need to involve Janet also - offline, as they cannot access the GGUS system
--> We cannot access the Geant ticketing system and they cannot access GGUS
People unanimously happy with the agenda

Actions : None
Minutes : None to review

(Francesco returns and takes over note-taking)

Round-table updates:

CERN (Edoardo M.): There was an action on the AGILE infrastructure team to give out IPV6-ready Centos7 VMs with a ready-to-go AAAA DNS record and a corresponding firewall access entry.

A problem was found: The infrastructure reuses MAC addresses. Stale entries in the neighbour discovery table confuse CERN Brocade routers (which serve as default gateways for the VMs). The current workaround is clearing the ND table. Another possible workaround is refrain from recycling MAC addresses at least for one day.

A case for this issue is open with Brocade. Details will of course be added to the Knowledge Base.

CERN is also in the process of implementing/adding the DHCP relay option forwarding in all routers.

Imperial (Duncan R.): Addressing the file transfer issue to SARA (see previous notes).

RAL (Catalin C.): SQUID service started 2 months ago. XROOTD redirector set up at CMS's request. The FTS, BDII and Frontier services have to be converted to dual-stack: asking if there any bad experiences reported.

None are known around the table.

About ARC-CE, Bruno reports that there are problems: ARC will try to use Link-Local addresses if IPv6 is enabled and will not fall back to IPv4.

The issue has been reported to ARC developers, but they seem to have had no time to address this yet.

About ARGUS: Raul should have experience and know the details.

VO-Boxes: No problems according to Costin.

Firewalls at RAL are still IPv4-only and need an upgrade.

Martin B. comments on what may be the most viable next steps for RAL.

T1-to-site link uses a separate 10 Gb/s link for IPv6. Can be upgraded to

40 Gb/s when needed.

LHCb (Raja N.): Nothing new. apart from the SARA problem (see above). A consequence is that SARA had to address a number of firewall issues.

EOS smoothly went to dual stack for LHCb.

KIT (Bruno H.): Deploying new storage system, in principle IPv6-compliant.

And handling ARC-CE issues (see above for details).

Alice (Costin G.): Comments on page (http://alimonitor.cern.ch/ipv6/) showing the overall reachability status of all Alice services.

There are just 6 sites with IPv6-reachable storage, and this hasn't been changing for a while... Need to find a strategy to compel sites to move.

CMS (Andrea S.): Progress on testing IPv6 job submission infrastructure.

Jobs could be successfully submitted to IPv6-only worker nodes @Brunel, but they failed on *dual-stack* worker nodes. Weird.

A new version of Condor will be installed on the testbed to see if it cures this problem.

A CMS-internal survey was conducted: a PDF report and a Twiki link are posted to the meeting Agenda page.

All information reported was collected by contacting the individual sites directly. Easy channels should be available to allow sites to get information about IPv6.

Dave K: There seem to be more sites in 'green' status than Alice.

"It could be worse".

INFN (Francesco P.): No new sites joined IPv6 over the summer. This may need a wake-up call specifically for Tier-2 sites. Still unable to lead this process from Milan: Milan T2 management (Atlas) doesn't want to risk downtime for this purpose.

May try Turin (Alice site), which is close enough.

Currently organising an IPv6 session at INFN security workshop on November 14th.

Plan to set up an INFN-internal induction/training course for 2018.

Nordic (Ulf T.): Nothing has happened during the summer (coldest in 50 years). "Everything is working" (TM).

- Coffee Break -

T0-T1-T2-status:

The only T1 centre that does not have IPv6 connectivity is Kisti.

The NREN is IPv6 ready though.

Checking status on the WG page:

http://hepix-ipv6.web.cern.ch/sites-connectivity

Various comments and checks on the state of the table.

100% dual-stack storage [at RAL] expected by December.

Dave K.: do we need a new column "by end of calendar year 2017?"

Do we know whether Nikhef (or SARA) really went to 100% dual stack by end of July ? There needs to be a way of confirming these milestones, possibly from the experiments' viewpoint.

Based on the current status, a list of sites that need to report on their status will be reported to the GDB on Wednesday.

Action on Raja N. to produce a comprehensive status summary table for LHCb.

Actually, all experiments do collect and could/should report on the IPv6 reachability.

Should a common table (or other) format to report information be agreed upon ?

[But ATLAS is currently not represented at the meeting].

Francesco P: A collection of IP names, with role (and size of served storage where applicable) would allow to automate the procedure across the board. Automated procedures are the only way to guarantee sustainability.

Andrea S.: Each experiment actually *already* produces such a topology listing

in the form of a 'VO feed' XML file, that is publicly available.

Here's the URL listing for the 4 experiments:

Alice: http://wlcg-sam-alice.cern.ch/dashboard/request.py/alicesitemap

ATLAS: http://atlas-agis-api.cern.ch/request/atp/xml/

CMS: http://cmssst.web.cern.ch/cmssst/vofeed/vofeed.xml

LHCb: http://lhcb-portal-dirac.cern.ch/topology/lhcb_topology.xml

Francesco P.: I could use these VO feeds as a source of IP addresses to run

a DNS check on, rather than the soon-to-be-disappearing BD-II.

Dave K.: The current site table on the WG Wiki page seems to have been populated in different ways for different sites. Should we change the table and have one row per Storage Element ?

Duncan R.: Should we be running SAM tests throughout and collect the results ?

Dave K.: Would be better than inventing something new.

Andrea S.: There is already a test ETF instance that's dual-stack.

(https://etf-ipv6.cern.ch/etf/check_mk/)

Duncan R.: Could the production instance become dual stack ?

This was a test for IPv6 DNS resolution could simply

be added and become part of the normal reporting.

Dave K.: Can agree with Marian [Babik] to set up a non-critical test

for IPv6.

---- end of Day 1

IPv6 Working Group meeting at CERN - Day 2 - 12 Sep 2017

(notes taken by Ulf Tigerstedt)

* Fernando gives site report from Pic: Moving on to 100% ipv6, even for workernodes

* SARA-Imperial network problem: New data has come forward.

* EOS/US Storage update (Hérve)

- newest eos supports ipv6

- lhcb first out of wlcg to be on ipv6-enabled eos

- atlas/cms will come later. lhcb and public facing is done now.

- cvmfs works fine with ipv6

- alice goes ipv6 on 18.9.2017

* Monitoring

- DNS test going into the next release (checking if there is an AAAA)

- LHCb will probably be first out

- Should there be a

- Docker issue 25407 talks about bad ipv6 support for docker

* Networking

- KISTI still missing

- meeting at KISTI soon + HEPIX to discuss this

- perfsonar 4.0.x is a bit fragile still, meshes not in order

- perfsonar system mostly unusable now since too many boxes broken

* Monitoring #2

- schema for testing, should we do it? Not really.

* Papers:

- ISGC2018? Perhaps not.

- Next CHEP. Yes.

- Hepix. Yes.

* Other issues:

- next meetings:

- f2f: 11-12 january 2018 (lunch to lunch) (later moved to 25-26 Jan)

- Vidyo: 26.10 16:00 Cern time (later moved to 9th Nov)

- Vidyo: 7.12 16:00 Cern time