HEPiX IPv6 working group F2F meeting
DRAFT agenda. Topics may change. Timings are approximate.
Please REGISTER (by 30th May) if you plan to attend the meeting in person at CERN.
Notes - Day 1 - 5 June - taken by Francesco Prelz
Attendees (in person):
Dave Kelsey, Edoardo Martelli, Bruno Hoeft, Andreas Petzold, Andrea Sciaba, Kars Ohrenberg, Martin Bly, Catalin Condurache, Fernando Lopez, Marian Babik, Francesco Prelz, Brian Davis (over Vidyo), Garhan Attebury (over Vidyo).
Agenda is reviewed.
Pending issues/actions:
1) Why are the FNAL and BNL T1s not providing dual-stack storage/not transferring data over IPv6 ? FNAL claimed that CMS decided that the deadline was December (possibly confused the T2 deadline with the T1).
2) Duncan was to follow up on xrootd ticket to instrument xrootd for (IPv6) transfer logging. Would be nice to have it by CHEP. Will follow up on this.
Round-table updates:
CERN (Edoardo): a new firewall for IPv6 was set up, with the same policies
(including bypass rules) as for IPv4. General Internet IPv6 traffic
graph through the firewall are shown.
(URL: https://netstat.cern.ch/monitoring/network-statistics/ext/?p=EXT&q=IPv6&mn=Internet&t=Daily).
KIT (Bruno): Will be presenting a talk later.
RAL (Catalin, Martin): will meet the CMS 'December' daeline. The services were
ported in time, modulo 1 week. The CEPH storage is accessible from IPv6,
the back-ends (e.g. Castor) are still (and will ever be) on IPv4.
Orders of magnitude of storage: Atlas 5 PB, CMS 1.5 PB. Castor < 10 PB.
The pair of firewalls on site was upgraded (reported in May). The new
firewalls do IPv6 on ASIC instead of CPU, and this will hopefully solve the
occasional connection drops that were observed.
The transition was choppy, with hardware faults on one of the new machines.
Issues are being tracked to recover full resilience among the firewalls.
100 Gb/s Lambdas for GP internet to land on the site soon, with a plan to
switch over to them in July. This doesn't affect LHCOPN traffic.
Bruno: will you add a feed to LHCONE ?
Catalin: policies within GridPP are shifting.
Martin: no one is probably working on LHCONE tests. Not screaming
to do this anytime soon.
INFN: tracking the Tier-2 transition process in periodic meetings.
Legnaro was the last large lab entering the game. Frascati still
to procure new networking hardware. Pisa will work on this in the summer.
Will be reporting on the status at the yearly INFN computing workshop
next week. Any message to relay there ?
All identities in the IPv6 VO hosted at CNAF are expired.
We decide to terminate the VO. Dave will see to it.
Why is the adoption rate slowing down ?
Martin: nobody else at RAL
Dave: many home ISP are providing IPv6, in the 'blissful
unawareness' of their users.
PIC (Fernando): 80% of worker nodes are in dual stack
There's an issue with IHEP Beijing - servers that are connected on
LHCONE can download the CRL via wget over IPv6
(wget http://cagrid.ihep.ac.cn/cacrl.crl, but the 'fetch-crl'
script doesn't work. From CERN the command seems to be working.
Edoardo: the LHCONE prefix was filtered on the way to CERN, leading
to asymmetric traffic.
GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=135105
DESY (Kars): Business as usual, nothing to report.
CMS (Andrea): No breaking news. More than half of CMS sites have IPv6
deployed and verified. Info comes from independent site testing
done by Stephan Lammel.
We are missing the other 3 experiments... Sigh.
Marian Babik shows the update on perfsonar and ETF. Slides in Indico:
The configuration interface update is an important (was a show-stopper)
feature in version 4.1.
Dave: expected timetable for production?
Marian: Q3/2018.
A WLCG testbed (mostly in the US: Nebraska, UChicago, Florida, Oklahoma and
others) will be added.
There will be NO SL6 packages for V4.1.
It works much better in CC7 anyway.
The statistics page in the slides is shortly commented on. A few sites
(e.g. INFN T1) show highly asymmetric measurements.
Francesco: Question from Legnaro. When an existing perfsonar server is
brought up in dual stack, is this detected automatically or
should this be communicated to anyone ? IPv6 traffic was
seen, but they'd prefer to be sure.
Martin+Andrea: need to tell Duncan R. (he may be checking the status
of updated tickets). Otherwise perfsonar just does
one test, possibly with DNS-based fallback.
Dave K.: In terms of the CHEP talk (and paper) we should be presenting the
status as reported here.
Marian B. : OK.
Dates for next meetings:
F2F @ CERN. Tue-Wed 18-19 September. Will be finalized at the beginning
of August.
Vidyo: Thursday August 9th, 4PM MET-DST - may be cancelled
Status os the Tier-2. Andrea posted his e-mail summary on Indico:
https://indico.cern.ch/event/730309/#preview:2662839
along with a link to a twiki page with status and plots:
(https://twiki.cern.ch/twiki/bin/view/LCG/WlcgIpv6#WLCG_Tier_2_IPv6_deployment_stat)
The piecharts don't include OSG sites. 'Done' doesn't just mean that the
site claims they are done, but that the dual-stack storage availability
has been verified by the involved experiment(s).
The Grid 'operation center' for OSG is no more: tickets have to be sent
to individual sites.
Technical issues section:
Bruno H. first shows a 'non-disclosable' scheme of the KIT network structure
and explains the way IPv6 was introduced.
Andreas P. then shows his slides on the dual-stack rollout of dCache:
Dave K. urges Francesco P. to understand better with dCache Paul what the issue
with dCache 'dual stack' versus 'dual home' hosts may be, in order to get to a
solid piece of advice for either sysadmins or developers.
A written summary of the current understanding of this issue from the
dCache viewpoint could serve as a valid starting point.
Garhan Attebury presents the slides on the IPv6-only exercise at Nebraska:
Day 1&2 notes (6 June) - taken by Martin Bly
IPv6 WG CERN 5-6 June 2018
David Kelsey, Edoardo Martelli, Bruno Hoeft, Catalin Condurache, Kars Ohrenburg, Marian Babik, Andrea Sciaba, Martin Bly, Francesco Prelz, Fernando Lopez Munoz, Brian Davies (remote).
RT Updates
CERN (Edoardo): today – new FW for ipv6 including bypasses, new policy based system, same policy as for ipv4. Previously restricted to 5Gb/s for ipv6 general internet CERN outbound, now peaked at 20Gb/s outgoing, but dropped back.
KIT (Bruno): see later stuff
RAL (Catalin, Martin):
INFN (Francesco): Tracking T2 transitions etc. Pisa and Frascati starting to move?
PIC (Fernando): 80% of WNs in dual stack. Issue with IHEP in china – problem with fetching CRLs over LCHONE but it works over normal wget. GGUS ticket #...
DESY (Kars): no update.
CMS (Andrea): More than half CMS sites have ipv6 installed and verified. TWiki available that has size of storage so can tell average and total storage available via ipv6.
Atlas, LHCb: no reports.
Monitoring: perfSONAR and ETF. Marian Babik (CERN)
News on perfSONAR –
- 4.1 beta scheduled in next few weeks. Introduces psconfig. SLC6 dropped after 4.1 released. Campaign to update instances to CC7, 4.0, 86/207 done so far.
- Geant deployed ipv6 perfSONAR instances on LHCONE at AMS, GVA, LONG, FRA and PAR. Work very well. Grafana dashboards updated to v 5, ipv6 introduced.
- All central services migrated from OSG GOC to AGLT2. Network throughput report: no major incidents reported.
PS dual-stack mesh –
- Meshes reconfigured following discussion at previous F2F.
- Replaced:
- Create dual-stack LHCOPN with both ipv4 and ipv6 for all tests (Done)
- Change all current expt meshed to contain ipv6 throughput and tracepath
- Create a dedicated ipv6/ipv4 latency mesh only for debugging specific cases
- Retire dual stack mesh.
ETF –
- ETF ipv6 instance, switched to ipv6 only to test for issues. MyProxy is still ipv4.
- Experiment instances running for CMS, LHCb, currently still dual stack, waiting for MyProxy.
- Atlas missing.
- LHCb, CMS results published to SAM3 (QA)
- Aggregate and compute ipv6-only profiles.
- Looking at possible combined profile.
Next Meetings:
F2F: 18-19 Sept 2018 @ CERN tbc wrt CHEP submission dates
Vidyo: 5 July 16:00 CEST, 9 August 16:00 CEST
T2 Status
~30% done and storage verified working. 38% in progress, 32% on hold (usually due to local site not being able to progress.
OSG not tracked, need to send GGUS tickets. For CMS T2s, data from CMS twiki.
Discussed Andrea’s notes on regional status.
KIT Technical issues with dCache etc? (Bruno, Andreas Petzold)
Displayed some slides of the topology @ KIT. Using Policy based routing w/BGP and VRF virtual routing.
Day2.
DPK took attendance:
Minutes:
Tier0/1 LHCOPN/LHCONE status:
All Tier1’s peering over OPN, running perfSONAR. What fraction of storage: some at 100% (PIC, NDGF), some (RAL) with all that can be on ipv6 on ipv6. Some struggling (KIT) – dCache issues. Is it that PIC/NDGF are the only sites 100% disk storage on Ipv6? IN2P3 possibly 100%. Should track by VO to get a better idea. Dave K has contacted FNAL and BNL. FNAL have responded, BNL. Action: DPK to contact experiment reps to survey their Tier1s and report back, to construct weighted average. Tier0 to go in Tier1 table. Need to check on perfSONAR provision (and fix it at RAL). Andrea reports that CMS see all their T1 storage is on ipv6 except FNAL and RAL.
LHCONE: some T1s on LHCONE most if not all peering ipv6 on LHCONE. No NREN issues (except Romania), so should expect T2s to peer over ipv6 too. Some T2s have included LHCONE/ipv6 status in their responses to Andrea’s tickets.
Note that ipv6 monitoring for LHCOPN (at CERN) comes with the new routers in a few months. More complicated to do LHCONE (can do at CERN), NRENs may be able to provide it.
Plans for CHEP paper
Removing IPv6 blockers
Looking at issues that occur that mean ipv6 is not actually working. Bruno: Table of apps that do IPV6 on HEPiX sites – perhaps create and maintain a table of obstacles. Various issues highlighted: infrastructure services, repositories, dual-home/dual-stack (dCache), clouds, docker repos, quad A for vital ancillary services. FNAL turning their FTS back to ipv4 only after a ‘problem’.
Discussion about monitoring to find whether traffic is actually going over ipv6. Most T0-T1 traffic should be ipv6 by now so should see this in the link data soon. Look at T1-T1 ipv6 transfers? Why if large fraction of storage is available dual stack is the recorded ipv6 traffic much lower? Look at records of FTS transfers T1-T1 and T1-T2.
DPK showed stats for FTS transfers from dashboards show ipv4/ipv6 transfers.
Encourage people to look at their own sites to see what can been seen.
CHEP: transition happening, going well,…
Should we turn off ipv4 on LHCONE or LHCOPN or both? Think about it…
AoB:
Thanks to Edoardo for hosting.