HEPiX IPv6 working group F2F meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
Description

DRAFT agenda. Topics may change. Timings are approximate. 

Please REGISTER (by 30th May) if you plan to attend the meeting in person at CERN.

 

Notes - Day 1 - 5 June - taken by Francesco Prelz

Attendees (in person):

Dave Kelsey, Edoardo Martelli, Bruno Hoeft, Andreas Petzold, Andrea Sciaba, Kars Ohrenberg, Martin Bly, Catalin Condurache, Fernando Lopez, Marian Babik, Francesco Prelz, Brian Davis (over Vidyo), Garhan Attebury (over Vidyo).

Agenda is reviewed.

Pending issues/actions:

1) Why are the FNAL and BNL T1s not providing dual-stack storage/not transferring data over IPv6 ? FNAL claimed that CMS decided that the deadline was December (possibly confused the T2 deadline with the T1).

2) Duncan was to follow up on xrootd ticket to instrument xrootd for (IPv6) transfer logging. Would be nice to have it by CHEP. Will follow up on this.

 

Round-table updates:

CERN (Edoardo): a new firewall for IPv6 was set up, with the same policies

      (including bypass rules) as for IPv4. General Internet IPv6 traffic

      graph through the firewall are shown.

      (URL: https://netstat.cern.ch/monitoring/network-statistics/ext/?p=EXT&q=IPv6&mn=Internet&t=Daily).

 

KIT (Bruno): Will be presenting a talk later.

 

RAL (Catalin, Martin): will meet the CMS 'December' daeline. The services were

    ported in time, modulo 1 week. The CEPH storage is accessible from IPv6,

    the back-ends (e.g. Castor) are still (and will ever be) on IPv4.

    Orders of magnitude of storage: Atlas 5 PB, CMS 1.5 PB. Castor < 10 PB.

 

    The pair of firewalls on site was upgraded (reported in May). The new

    firewalls do IPv6 on ASIC instead of CPU, and this will hopefully solve the

    occasional connection drops that were observed.

    The transition was choppy, with hardware faults on one of the new machines.

    Issues are being tracked to recover full resilience among the firewalls.

 

    100 Gb/s Lambdas for GP internet to land on the site soon, with a plan to

    switch over to them in July. This doesn't affect LHCOPN traffic.

 

    Bruno: will you add a feed to LHCONE ?

    Catalin: policies within GridPP are shifting.

    Martin: no one is probably working on LHCONE tests. Not screaming 

            to do this anytime soon.

 

INFN: tracking the Tier-2 transition process in periodic meetings.

      Legnaro was the last large lab entering the game. Frascati still

      to procure new networking hardware. Pisa will work on this in the summer.

      Will be reporting on the status at the yearly INFN computing workshop 

      next week. Any message to relay there ?

 

      All identities in the IPv6 VO hosted at CNAF are expired. 

      We decide to terminate the VO. Dave will see to it.

 

      Why is the adoption rate slowing down ?

      Martin: nobody else at RAL 

      Dave: many home ISP are providing IPv6, in the 'blissful

            unawareness' of their users.

 

PIC (Fernando): 80% of worker nodes are in dual stack

      There's an issue with IHEP Beijing - servers that are connected on

      LHCONE can download the CRL via wget over IPv6

      (wget http://cagrid.ihep.ac.cn/cacrl.crl, but the 'fetch-crl'

      script doesn't work. From CERN the command seems to be working.

      Edoardo: the LHCONE prefix was filtered on the way to CERN, leading

               to asymmetric traffic.

      GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=135105

 

DESY (Kars): Business as usual, nothing to report.

 

CMS (Andrea): No breaking news. More than half of CMS sites have IPv6

              deployed and verified. Info comes from independent site testing

              done by Stephan Lammel.

 

We are missing the other 3 experiments... Sigh.

 

Marian Babik shows the update on perfsonar and ETF. Slides in Indico:

https://indico.cern.ch/event/730309/contributions/3009400/attachments/1662058/2663161/perfSONAR2FETF_IPv6_2.pdf

 

The configuration interface update is an important (was a show-stopper)

feature in version 4.1.

 

Dave: expected timetable for production?

Marian: Q3/2018.

 

A WLCG testbed (mostly in the US: Nebraska, UChicago, Florida, Oklahoma and

others) will be added.

 

There will be NO SL6 packages for V4.1.

It works much better in CC7 anyway.

 

The statistics page in the slides is shortly commented on. A few sites

(e.g. INFN T1) show highly asymmetric measurements. 

 

Francesco: Question from Legnaro. When an existing perfsonar server is

           brought up in dual stack, is this detected automatically or

           should this be communicated to anyone ? IPv6 traffic was

           seen, but they'd prefer to be sure.

Martin+Andrea: need to tell Duncan R. (he may be checking the status

           of updated tickets). Otherwise perfsonar just does

           one test, possibly with DNS-based fallback.

 

Dave K.: In terms of the CHEP talk (and paper) we should be presenting the

         status as reported here.

Marian B. : OK.

 

Dates for next meetings:

 

F2F @ CERN. Tue-Wed 18-19 September. Will be finalized at the beginning

                                     of August.

 

Vidyo: Thursday August 9th, 4PM MET-DST - may be cancelled

 

Status os the Tier-2. Andrea posted his e-mail summary on Indico:

https://indico.cern.ch/event/730309/#preview:2662839

along with a link to a twiki page with status and plots:

(https://twiki.cern.ch/twiki/bin/view/LCG/WlcgIpv6#WLCG_Tier_2_IPv6_deployment_stat)

 

The piecharts don't include OSG sites. 'Done' doesn't just mean that the

site claims they are done, but that the dual-stack storage availability

has been verified by the involved experiment(s).

 

The Grid 'operation center' for OSG is no more: tickets have to be sent

to individual sites.

 

Technical issues section:

 

Bruno H. first shows a 'non-disclosable' scheme of the KIT network structure

and explains the way IPv6 was introduced.

Andreas P. then shows his slides on the dual-stack rollout of dCache:

https://indico.cern.ch/event/730309/contributions/3009392/attachments/1661539/2663055/gridka-ipv6-20180605.pdf

 

Dave K. urges Francesco P. to understand better with dCache Paul what the issue

with dCache 'dual stack' versus 'dual home' hosts may be, in order to get to a

solid piece of advice for either sysadmins or developers.

A written summary of the current understanding of this issue from the

dCache viewpoint could serve as a valid starting point.

 

Garhan Attebury presents the slides on the IPv6-only exercise at Nebraska:

https://indico.cern.ch/event/730309/contributions/3009392/attachments/1661539/2662198/gattebury-HEPiX_IPv6_WG_F2F_June_5-6_2018.pdf

 

Day 1&2 notes (6 June) - taken by Martin Bly

IPv6 WG CERN 5-6 June 2018

David Kelsey, Edoardo Martelli, Bruno Hoeft, Catalin Condurache, Kars Ohrenburg, Marian Babik, Andrea Sciaba, Martin Bly, Francesco Prelz, Fernando Lopez Munoz, Brian Davies (remote).

 

RT Updates

CERN (Edoardo): today – new FW for ipv6 including bypasses, new policy based system, same policy as for ipv4. Previously restricted to 5Gb/s for ipv6 general internet CERN outbound, now peaked at 20Gb/s outgoing, but dropped back.

KIT (Bruno): see later stuff

RAL (Catalin, Martin):

INFN (Francesco): Tracking T2 transitions etc. Pisa and Frascati starting to move?

PIC (Fernando): 80% of WNs in dual stack. Issue with IHEP in china – problem with fetching CRLs over LCHONE but it works over normal wget.  GGUS ticket #...

DESY (Kars): no update.

CMS (Andrea): More than half CMS sites have ipv6 installed and verified. TWiki available that has size of storage so can tell average and total storage available via ipv6.

Atlas, LHCb: no reports.

Monitoring: perfSONAR and ETF. Marian Babik (CERN)

News on perfSONAR –

  • 4.1 beta scheduled in next few weeks. Introduces psconfig. SLC6 dropped after 4.1 released. Campaign to update instances to CC7, 4.0, 86/207 done so far.
  • Geant deployed ipv6 perfSONAR instances on LHCONE at AMS, GVA, LONG, FRA and PAR. Work very well.  Grafana dashboards updated to v 5, ipv6 introduced.
  • All central services migrated from OSG GOC to AGLT2. Network throughput report: no major incidents reported.

PS dual-stack mesh –

  • Meshes reconfigured following discussion at previous F2F.
  • Replaced:
    • Create dual-stack LHCOPN with both ipv4 and ipv6 for all tests (Done)
    • Change all current  expt meshed to contain ipv6 throughput and tracepath
    • Create a dedicated ipv6/ipv4 latency mesh only for debugging specific cases
    • Retire dual stack mesh.

ETF –

  • ETF ipv6 instance, switched to ipv6 only to test for issues. MyProxy is still ipv4.
  • Experiment instances running for CMS, LHCb, currently still dual stack, waiting for MyProxy.
    • Atlas missing.
  • LHCb, CMS results published to SAM3 (QA)
    • Aggregate and compute ipv6-only profiles.
    • Looking at possible combined profile.

Next Meetings:

F2F: 18-19 Sept 2018 @ CERN tbc wrt CHEP submission dates

Vidyo:  5 July 16:00 CEST, 9 August 16:00 CEST

T2 Status

~30% done and storage verified working.  38% in progress, 32% on hold (usually due to local site not being able to progress.

OSG not tracked, need to send GGUS tickets. For CMS T2s, data from CMS twiki.   

Discussed Andrea’s notes on regional status.

KIT Technical issues with dCache etc? (Bruno, Andreas Petzold)

Displayed some slides of the topology @ KIT. Using Policy based routing w/BGP and VRF virtual routing.

 

Day2.

DPK took attendance:

Minutes:

Tier0/1 LHCOPN/LHCONE  status:

All Tier1’s peering over OPN, running perfSONAR.  What fraction of storage:  some at 100% (PIC, NDGF), some (RAL) with all that can be on ipv6 on ipv6.  Some struggling (KIT) – dCache issues.  Is it that PIC/NDGF are the only sites 100% disk storage on Ipv6?  IN2P3 possibly 100%.  Should track by VO to get a better idea.  Dave K has contacted FNAL and BNL.  FNAL have responded, BNL.  Action: DPK to contact experiment reps to survey their Tier1s and report back, to construct weighted average.  Tier0 to go in Tier1 table. Need to check on perfSONAR provision (and fix it at RAL).   Andrea reports that CMS see all their T1 storage is on ipv6 except FNAL and RAL.

LHCONE: some T1s on LHCONE most if not all peering ipv6 on LHCONE.  No NREN issues (except Romania), so should expect T2s to peer over ipv6 too. Some T2s have included LHCONE/ipv6 status in their responses to Andrea’s tickets.

Note that ipv6 monitoring for LHCOPN (at CERN) comes with the new routers in a few months.  More complicated to do LHCONE (can do at CERN), NRENs may be able to provide it.

Plans for CHEP paper

 

Removing IPv6 blockers

Looking at issues that occur that mean ipv6 is not actually working. Bruno: Table of apps that do IPV6 on HEPiX sites – perhaps create and maintain a table of obstacles.  Various issues highlighted: infrastructure services, repositories, dual-home/dual-stack (dCache), clouds, docker repos, quad A for vital ancillary services. FNAL turning their FTS back to ipv4 only after a ‘problem’.

Discussion about monitoring to find whether traffic is actually going over ipv6. Most T0-T1 traffic should be ipv6 by now so should see this in the link data soon.  Look at T1-T1 ipv6 transfers?  Why if large fraction of storage is available dual stack is the recorded ipv6 traffic much lower?  Look at records of FTS transfers T1-T1 and T1-T2.

DPK showed stats for FTS transfers from dashboards show ipv4/ipv6 transfers.

Encourage people to look at their own sites to see what can been seen.

 

CHEP: transition happening, going well,…

Should we turn off ipv4 on LHCONE or LHCOPN or both? Think about it…

 

AoB:

Thanks to Edoardo for hosting.

 

 

There are minutes attached to this event. Show them.
    • 13:30 18:00
      Session 1 31/S-028

      31/S-028

      CERN

      30
      Show room on map
      • 14:00
        Introductions, agenda, note takers 10m
      • 14:10
        Review minutes and actions 10m

        Matters arising at previous meetings

      • 14:20
        Roundtable updates 40m
      • 15:00
        Monitoring including perfSONAR & ETF 30m
        Speaker: Marian Babik (CERN)
      • 15:30
        Coffee 30m
      • 16:00
        Tier 2 status 30m

        Analysis of tickets and their responses

        Speaker: Andrea Sciaba (CERN)
      • 16:30
        Current technical issues 1h

        Recent problems at Tier 1 (KIT) - Andreas Petzold TBC
        and Tier 2 (Lincoln, Nebraska) - Garhan Attebury TBC
        Followed by any other current technical issues to discuss

        Speakers: Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)), Garhan Attebury (University of Nebraska Lincoln (US))
  • Wednesday, 6 June
    • 08:45 12:45
      Session 2 600/R-001

      600/R-001

      CERN

      15
      Show room on map
      • 08:45
        Review agenda 10m
      • 08:55
        Plans for CHEP2018 paper 50m

        Including work to be done before the conference

      • 09:45
        Tier 0/1, LHCOPN and LHCONE status 30m
      • 10:15
        Coffee 30m
      • 11:15
        Removing IPv6 Blockers 30m

        a) List of known issues.
        b) Analysis of file transfers between bi-lateral dual-stack storage end points.

        Speaker: Alastair Dewhurst (Science and Technology Facilities Council STFC (GB))
      • 11:45
        AOB and next meetings 15m
      • 12:00
        Review decisions and actions 15m