HEPiX IPv6 Working Group F2F meeting - Day 1 CERN April 10th 2014, 14:00 Notes by Francesco Prelz Attendance: In person: Jerome Bernier; Alastair Dewhurst; Marek Elias; Dave Kelsey; Fernando Lopez; Edoardo Martelli; Kars Ohrenberg; Francesco Prelz; Duncan Rand; Michail Salichos; Andrea Sciaba; Ulf Tigerstedt; Ramiro Voicu; Christopher Walker; Tony Wildish Remotely: Raja Nandakumar Dave: reviews agenda for today and tomorrow on Indico. No other technical issues to add. People happy with agenda. Minutes review: Minor comments. Roundtable site updates: CERN: IPv6 deployment is progressing. On April 1st (for real!) DHCPv6 was enabled for all devices at the CERN Computing Center and at Wigner. One issue: some Windows servers had IPv6 off-site access blocked administratively, but were getting off-site access via IPv6 after a public IPv6 address was obtained. The firewall rule was being applied only to hosts that were tagged as 'IPv6-Ready' in the network database. A temporary workaround was applied to plug this hole up. Next step: on May 6th an IPv6 address will be offered to all devices on WiFi or 'portable' sockets. The last step will be extending fixed device outlet access beyond buildings 31 28 and 600. DaveK: are training courses for sysadmins being offered? EdoardoM: No. Just a workshop for the IT departments. Support people attended courses. CMS: IPv6-accessible Bestman node and OSG Computing Element nodes are available at the CMS T2 in Nebraska (UNL, care of Brian Bockelman). No production Storage Element yet (ideally they should be joining Tony's transfer mesh). AlastairD: From the Atlas perspective a few big test instances would be handier than many small-scales ones. DaveK: the small-scale, low-level tests have to work before attempts to push the scale up are made. KIT: FTS3 and dCache are still in the same state. One 'dCache guy' complains that there are no clients where they can test IPv6 access. The test environment includes UI, Worker node with PBS and a Cream CE. The DNS is now working. CCIN2p3 - Lyon: An infrastructure with DNS, test machines is ready. The production DNS will go to IPv6 in a few days. No dual-stack plans USLHC: No news. NDGF: The IPv6 dCache test stand is included in the Atlas Hammercloud. Trouble is the only available storage large enough for Hammercloud was IPv4-only. Troubleshooting and fixing this mixed configuration introduced some delay but should be done now. Xrootd via IPv6 was also tested talking to the dual-stack dCache server. But the new IPv6-enabled xrootd client is not following the protocol (wrong null-termination of certain strings). FZU: A new testing site (fzu-ipv6) had to be registered in the GOCDB for political reasons, but no ATLAS pilot jobs are coming in yet. Worker nodes have only public IPv6, and private network IPv4 addresses to talk to Torque (failed to run it on IPv6 only). In principle the WLCG Management board should be forgiving with IPv6-related downtimes: the political pressure shouldn't be this extreme. WN to Head node connections are IPv6-only. Production DPM Pool to Head node connection is IPv6-only. INFN: The IPv6-capable new CNAF router is still switched off. Keeping urging/encouraging people on the matter. Our UberFTP IPv6 pull request was eventually merged at the gentle request of OSG (Brian Bockelman). Still trying to make progress with the high-availability tools (out of the RHEL5 clustering suite) used in Milan DESY: Nothing important to report: just waiting for more people to use IPv6. PIC: No significant news. Will share a few items in the technical issues roundtable. Queen Mary: College moved to a fancy Cisco Firewall that is showing higher latency and had some infancy problem. A storm SE was set on the IPv6 network. The IPv6 hepix VO was enabled. The availability of thie SE is shown in the BDII. Should decide whether all testbed resources accessible to the ipv6.hepix.org VO should end up being listed in some BDII eventually. Imperial: Tickets submitted to GOCDB about the IPv6 entrances have been closed. Refreshed an old Globus ticket about IPv6 addressed being logged as dotted-quads in th egridftp logs. Playing with the FTS3 RESTful interface into DPM and Storm: this discovered a bug in GFAL2 causing wrong/missing file checksumming when files are accessed via IPv6. DaveK: written reports on the technical details of tests made should be shared with the group. - coffee, just 1/2 hour late Plans for the June 10th pre-GDB IPv6 workshop. Dave covers the (many!) items in the attached list. A generic ATLAS/CMS IPv6 strategy discussion follows. Will come back to this tomorrow for more detailed planning. Testing updates. The Chicago, Gridka and NDGF sites seem to be getting timeouts and not working anymore with point-to-point connections from CERN. Data transfer speeds are under 1 MB/s in the FNAL -> CERN, CERN -> FNAL links (both ways) as well as DESY -> CERN and Caltech -> FNAL. As usual, site people should troubleshoot these issues - why aren't they doing that? DaveK: Status of SRM and dCache ? Are they still not working ? FernandoL: With dCache 2.3 the gridFTP is working, the SRM-LS and SRM-RM are working, while SRM-CP is not. This was reported to the dCache devels. A developer at Desy will try to reproduce this. UlfT: Did you try ARC-CP ? Try it. AndreaS: Can the problem be in the client rather than the server ? FernandoL: Maybe. Status of (HT)Condor: they support operations of a single-stack pool, either IPv4 or IPv6, mainly because every network endpoint is identified by only one address. This means that on a dual-stack node (provided the host 'hostname' resolved on IPv6, otherwise a bug prevents the IPv6 address from being advertised correctly) only the IPv6 address is made known to the central collector, and the Condor services will be contactable on IPv6 only. This also means that a genuine dual-stack pool with a dual-stack central manager will effectively be partitioned between the ipv4-reachable nodes and the ipv6-reachable nodes. Proper handling of multiple network endpoints for each service is dealt with by the patch described in this ticket: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3982 which is about 50% completed. Alan De Smet is working on this, but he's also the main contact form this year's Condor Week, so little progress is likely to occur there until May. FrancescoP volunteered for alpha- or beta-testing of this branch (master-ipv6-mixed-mode on GIT) once it becomes usable. As for the CREAM and Nordugrid GAHPs that we mentioned at the last f2f meeting, Jaime Frey opened and closed a ticket addressing all issues we identified. The latest gridftp_client library is now used and the IPv6 options in both gsoap and org.glite.security.gss are enableded: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=4243 These changes should appear in the next development release, 8.1.5. Torque is understood not to work - many people tried and failed. Anyone knows the state of Slurm ? UlfT: There is nothing in the code supporting IPv6. Checked the last releasenotes: can talk to some services via dCache PASV-used-as-redirecting-tool-while-EPSV-cant: need to close the loop on this issue with the dCache developers. For gridFTP proper, this can probably be worked around by violating RFC2428, that allows filling just the port number in the EPSV reply. Kars will get in touch with the devel team, then if feasible we'll try moving forward by FrancescoP opening a dCache ticket.