HEPiX IPv6 working group F2F meeting

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map
Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))
Description

Timings are approximate. 

Registration is open - Please REGISTER (by 21st Jan) if you plan to attend the meeting in person at CERN.

Please note that we are in different meeting rooms on the first day (31-S-023) and the second day (31-S-028).

 

 

Minutes HEPiX F2F meeting at CERN - Day 1 - 24th January 2019

 

Present: Edoardo, Francesco, Catalin, Andrea, Fernando, Bruno, Kars, David G, Duncan, David K., Marian.

(note taker Duncan Rand)

 

Minutes and matters arising at previous meeting in December

- No news of CHEP paper.

- Incentives for sites to move to IPv6. Nobody at GDB was in favour of changing time-lines.

- Status of FNAL and BNL storage and FTS dual-stack

Andrea presented the slides he gave at the Jan GDB:

https://indico.cern.ch/event/739874/contributions/3278149/attachments/1779648/2894690/IPv6_deployment_update_GDB_20190116.pdf

There was a discussion as to how to incentivise sites to move to IPv6.

Roundtable updates

Edoardo: CERN have implemented IPv6 monitoring of LHCOPN, e.g.

https://netstat.cern.ch/monitoring/network-statistics/ext/?p=LHCOPN&q=LHCOPN&mn=DE-KIT&t=Daily

It is now possible to remove an IPv4 address from the DNS for hosts at CERN to produce an IPv6-only host.

Francesco (INFN): No direct news. Last three sites to make their sites dual-stack are Pisa, Frascati and Turin. Still to do dCache multi-homing patches (c.f. KIT).

Catalin (RAL): Not much development. Outstanding task to consolidate IPv6 and IPv4 trunks. Catalin queried the relative efficiency of IPv4 and IPv6 transfers (thread raised by Duncan). Issue with connection of Romanian sites connection to LHCONE.

Andrea (CMS): No news.

pic (Fernando) : Request to CERN to split LHCOPN v4/v6 traffic. Need to update VLANs.

Bruno (KIT): See https://netstat.cern.ch/monitoring/network-statistics/ext/?p=LHCOPN&q=LHCOPN&mn=DE-KIT&t=Daily

but only LHCOPN. It was notes that some traffic is going over the 20G backup link. ALICE dual-stacking ‘on the move’ - missing IPv6 at server side, other three are complete.

Kars (DESY): Upgrading WAN from 2x30G to 2x100G within the next three months. Currently ~30% WAN traffic is IPv6. XFEL now running 3 beam lines, producing 0.5 PB a week.

David G (NIKHEF): Everything works, no issues using IPv6. Some problems with Torque batch system when enabled AAAA records. WN have IPv6 but no forward resolution.

Duncan (Imperial): No news apart from various FTS issues already discussed.

Monitoring (Marian) 

https://indico.cern.ch/event/762602/contributions/3164163/attachments/1784622/2905026/go

SL6 no longer supported. About 50% on CC7. Sending details to mailing lists. RAL is one of the Tier-1s not updated to the latest version. There is still an issue with the Maddash display, e.g.

http://psmad.opensciencegrid.org/maddash-webui/index.cgi?dashboard=UK%20Mesh%20Config

 

CHEP 2019 paper

Discussion as to what might be the theme of a possible paper.

For Run3 CERN IT will use 2 of LHCb containers at Point 8. For Run4 it is likely to build a new data-centre at Prevesin and are likely to run out of IPv4 addresses - another reason for the WLCG to move to IPv6.

 

Minutes HEPiX F2F meeting at CERN - Day 2 - 25th January 2019

Attendees (around the table): Andrea Manzi (FTS Team lead), A. Sciaba,
  F. Lopez, B. Hoeft, K. Ohrenberg, D. Groep, D. Rand, D. Kelsey, E. Martelli,
  F. Prelz, C. Condurache, M. Bly (remote), Petr Vokac (Prague - remote).

(Notes by Francesco Prelz)

Andrea Manzi (FTS team lead) is introduced: it's now time to make sense
of any monitoring data we have - and possibly trace and squash bugs.

Agenda for the morning is briefly reviewed and agreed on.

Addressing monitoring questions from the agenda:

a) Why does network monitoring over LHCOPN between two dual-stack end points
   show traffic over IPv4?

   Andrea M.: On many FTS endpoints the FTS configuration was set to
              prefer IPv4 before Christmas due to site configuration problems.
              After Christmas a new FTS cluster was installed, 
              and IPv6 preference was restored in the FTS configuration.
              The FTS server configuration allows to set an IPv6 preference
              *per endpoint*.
   Dave K.: Who has the authority to change the config?
   Andrea M.: The FTS manager (or team), the VO manager (production role
              in the VO) can also change it, but usually they aren't doing it.
   Dave K.: With due understanding of the production system needs, disabling
            IPv6 prevents proper problem diagnosis.
            Is a direct connection to the SE tested when sites are certified
            or is FTS used?
   Andrea S.: CMS tests connections to the SE, bith  via gridftp and xrootd.
   Bruno H.: On Grafana there's no IPv6 traffic at all to and from BNL.
   Duncan R.: From what I see, however, IPv6 is not working - no traffic.
              Logs from fts307.usatlas.bnl.gov show that the 'PASV' command
              (may actually be either PASV od EPSV) gets an IPv4 response.
   Andrea M.: When a particular SE claims to be dual-stack, there may be a
              pool of machines behind, and some of them may be misconfigured. 
   Duncan R.: Looking at transfers from Triumf to CERN, there is also
              the case of IPv4 PASV responses, while the site should
              be "dual stack". 
   Andrea M.: FTS will retry on IPv4 immediately if IPv6 fails for any
              reason (hits a firewall or so) - and this fallback is not logged.
   Dave K.: Should be writing a short troubleshooting guide to find the
            many locations where the configuration may be incorrect ?
            Presumably the sites forget one item in what could be 
            a systematic checklist.
   Andrea M.: We do investigate further on requests to "just shut down
              FTS on a certain link because all transfers are failing".
              There was another issue that was discovered a while ago.
              Sometimes the DHCPv6-offered IPv6 address is not refreshed.
              CERN issue ?
   Dave K.: How do we improve this situation in a scalable way ?
            Twist site admins arms or rely on FTP experts ?
   Andrea M.: First of all, SAM tests should test IPv6 connectivity.
   Duncan R.: It's hard to enumerate and test storage nodes behind SE
              head nodes. SAM testing (compute) worker nodes faces the same
              issues.
   Dave K.: In general FTS is the success rate == 100%, given the ability
            to retry transfers?
   Andrea M.: No, missing files, checksum errors, missing files all are
              terminal failures.
   Andrea S.: Can these failures be categorised?
   Andrea M.: Partly - the monitoring can be improved to point more precisely
              to the failing party (source/destination, and where/when).

b) Status of FTS IPv6 efficiency versus IPv4 efficiency
c) Is FTS3 monitoring correctly reporting IPv4 versus IPv6?

   Something that didn't start is automatically reported as IPv4.
   Andrea M.: We are going, as agreed with Duncan R., to add a new field in
              the log, filled with the IP protocol version only when the
              transfer starts, so that Grafana will be able to properly
              filter on the used protocol.
   Dave K.: A useful by-product will be the ability to see the protocol
            string instead of true/false in Grafana.
   Andrea M.: There are three tags in the FTS logs:
              "TRANSFER" points to a failure during the transfer
              "SOURCE" means that the source file is missing
              "DESTINATION" is an existing destination file or checksum mismatch
   Andrea S.: Is the error message logged as well ?
   Andrea M.: Currently not.
   Dave K.: Files on 'devel.cern.ch' seem sometimes not to be working.
   Andrea M.: Will check what they are.
   Andrea M.: Will also check with the monitoring team whether the failure
              reasons can be further filtered to select the ones involving
              file transfer.
   Dave K.: Not counting them as IPv4-only failures will help in our
            search for unexplained asymmetries.
            A useful cross-check: the amount of transferred data for
            'UNDEFINED' state transfer should be zero.
   Dave K.: Is Xrootd transfer monitoring also in your ballpark?
   Andrea M.: When, in the future, FTS transfers will be allowed via xrootd 
              it will be. But some development on xrootd and
              implementations of HTTP servers supporting third-party copy
              will be needed.
   Francesco P.: Is xrootd already instrumented to log transfer size and
                 protocol?
   Duncan R.: They reported it should - perhaps with an appropriate plugin.
              We shoud get an update here from an xrootd developer.
   Andrea M.: Another thing we cannot do is disable IPv6 and force a
              fallback to IPv4 in xrootd and HTTP (WebDAV and the like).
   Duncan R.: There was a big thread on the actual ability to know whether
              IPv4 or IPv6 is used in a WebDAV transfer.
   Andrea M.: Multi-stream transfers between the same pair of nodes
              may also occur on different protocols!

d) what is status of PIC's investigation of transfers between two dual-stack
   systems?

   Fernando shows his slides:
   (https://indico.cern.ch/event/762602/contributions/3164162/attachments/1783846/2906047/IPv4_on_IPv6v2.pdf)

   Fernando L.: In June 80% of gridftp failures were for ATLAS, 
                now they are much better.
                The CMS failures were due to a problem in the Singularity
                image that CMS uses: IPv6 was disabled for the Gridftp
                GFAL plugin (/etc/gfal2.d/gsiftp_plugin.conf - see slides)
                Note: XROOTD statistics by protocol are obtained from the
                DCACHE billing database.

e) in how many places can the preference for IPv6 (or IPv4) be configured?

Dave K.: We need an active representative from all of the experiments (our
         customers) and an xrootd rep as well.

(coffee break)

Dates for next meetings:

Bruno H.: It may be useful to have a F2F meeting before the CHEP abstract
          submission deadline.
Dave K.: We'll have to probably settle over e-mail/phone.

Next F2F meeting settled on Thursday-Friday May 2-3, usual times.

Next phone conferences:
Thursday, March  7th, 16:00 CET.
Thursday, April 11th, 16:00 CEST.

Hepix @ San Diego on the week of March 25th: usually somebody gives a report.
Who is planning to attend? Andrea S. is attending but will be busy - as long
as somebody prepares the slides he can present them.

Dave K.: A few IPv6-only worker nodes here and there would help.
         Queen Mary was running DNS64/NAT64 - don't know if they still do.
Duncan R.: There's another site in Slovenia doing a similar exercise for
           Atlas. 

The status of IPv6 reachability of IGTF CA CRLs and IPv6
is accessible at this URL: http://cvmfs-6.ndgf.org/ipv6/overview.php
Dave K.: IPv6 will be in the tender requirements for the next Geant incarnation.

Now going back in the agenda to the Tier-1 status. Bruno H. presents 
the slides at:


https://indico.cern.ch/event/762602/contributions/3164164/attachments/1785258/2906271/Tier-1__IPv6_traffic.pdf

 

 

 

 

 

 

 

 

There are minutes attached to this event. Show them.
  • Thursday, 24 January
    • 14:00 18:00
      Session 1 31/S-023

      31/S-023

      CERN

      22
      Show room on map
      • 14:00
        Introductions, agenda, note takers 10m
      • 14:10
        Review minutes and actions and other urgent topics 10m

        Matters arising at previous meeting (December):

        News on CHEP2018 paper?

        AndreaS offered to give his Jan 2019 GDB talk.

        Incentives for sites to move to IPv6.

        Status of US Tier 1s?

      • 14:20
        Roundtable updates 40m
      • 15:00
        Tier 2 status 30m

        Analysis of tickets and their responses

        Speaker: Andrea Sciaba (CERN)
      • 15:30
        Coffee 30m
      • 16:00
        Monitoring including perfSONAR & ETF 30m
        Speaker: Marian Babik (CERN)
      • 16:30
        Data transfer performance between dual-stack storage end-points 1h

        Discussion

        Can we measure transfer throughput and efficiency in a controlled way? Compare IPv4 version IPv6 on identical systems and network paths.

  • Friday, 25 January
    • 09:00 13:00
      Session 2 31/S-028

      31/S-028

      CERN

      30
      Show room on map
      • 09:30
        Review agenda 5m
      • 09:35
        Data Transfer & network monitoring - Is IPv6 preference working? 55m

        Issues to be addressed include:
        a) Why does network monitoring over LHCOPN between two dual-stack end points show traffic over IPv4?
        b) Status of FTS IPv6 efficiency versus IPv4 efficiency
        c) Is FTS3 monitoring correctly reporting IPv4 versus IPv6?
        d) what is status of PIC's investigation of transfers between two dual-stack systems
        e) in how many places can the preference for IPv6 (or IPv4) be configured?

        Speaker: Mr Fernando Lopez Munoz (PIC)
      • 10:30
        Coffee 30m
      • 11:00
        Tier 0/1, LHCOPN and LHCONE status 30m
        Speaker: Bruno Heinrich Hoeft (KIT - Karlsruhe Institute of Technology (DE))
      • 11:30
        Other issues 20m

        IGTF CA CRLs and IPv6.

        Plans for HEPiX - San DIego - March 2019.

        Plans for CHEP2019 - Adelaide - Nov 2019.

        Should we push for dual-stack WNs?

        Should we do more testing of IPv6-only WNs?

      • 11:50
        AOB and next meetings 10m
      • 12:00
        Close meeting 1m