- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Timings are approximate.
Registration is open - Please REGISTER (by 21st Jan) if you plan to attend the meeting in person at CERN.
Please note that we are in different meeting rooms on the first day (31-S-023) and the second day (31-S-028).
Present: Edoardo, Francesco, Catalin, Andrea, Fernando, Bruno, Kars, David G, Duncan, David K., Marian.
(note taker Duncan Rand)
- No news of CHEP paper.
- Incentives for sites to move to IPv6. Nobody at GDB was in favour of changing time-lines.
- Status of FNAL and BNL storage and FTS dual-stack
There was a discussion as to how to incentivise sites to move to IPv6.
Edoardo: CERN have implemented IPv6 monitoring of LHCOPN, e.g.
https://netstat.cern.ch/monitoring/network-statistics/ext/?p=LHCOPN&q=LHCOPN&mn=DE-KIT&t=Daily
It is now possible to remove an IPv4 address from the DNS for hosts at CERN to produce an IPv6-only host.
Francesco (INFN): No direct news. Last three sites to make their sites dual-stack are Pisa, Frascati and Turin. Still to do dCache multi-homing patches (c.f. KIT).
Catalin (RAL): Not much development. Outstanding task to consolidate IPv6 and IPv4 trunks. Catalin queried the relative efficiency of IPv4 and IPv6 transfers (thread raised by Duncan). Issue with connection of Romanian sites connection to LHCONE.
Andrea (CMS): No news.
pic (Fernando) : Request to CERN to split LHCOPN v4/v6 traffic. Need to update VLANs.
Bruno (KIT): See https://netstat.cern.ch/monitoring/network-statistics/ext/?p=LHCOPN&q=LHCOPN&mn=DE-KIT&t=Daily
but only LHCOPN. It was notes that some traffic is going over the 20G backup link. ALICE dual-stacking ‘on the move’ - missing IPv6 at server side, other three are complete.
Kars (DESY): Upgrading WAN from 2x30G to 2x100G within the next three months. Currently ~30% WAN traffic is IPv6. XFEL now running 3 beam lines, producing 0.5 PB a week.
David G (NIKHEF): Everything works, no issues using IPv6. Some problems with Torque batch system when enabled AAAA records. WN have IPv6 but no forward resolution.
Duncan (Imperial): No news apart from various FTS issues already discussed.
https://indico.cern.ch/event/762602/contributions/3164163/attachments/1784622/2905026/go
SL6 no longer supported. About 50% on CC7. Sending details to mailing lists. RAL is one of the Tier-1s not updated to the latest version. There is still an issue with the Maddash display, e.g.
http://psmad.opensciencegrid.org/maddash-webui/index.cgi?dashboard=UK%20Mesh%20Config
Discussion as to what might be the theme of a possible paper.
For Run3 CERN IT will use 2 of LHCb containers at Point 8. For Run4 it is likely to build a new data-centre at Prevesin and are likely to run out of IPv4 addresses - another reason for the WLCG to move to IPv6.
Attendees (around the table): Andrea Manzi (FTS Team lead), A. Sciaba,
F. Lopez, B. Hoeft, K. Ohrenberg, D. Groep, D. Rand, D. Kelsey, E. Martelli,
F. Prelz, C. Condurache, M. Bly (remote), Petr Vokac (Prague - remote).
(Notes by Francesco Prelz)
Andrea Manzi (FTS team lead) is introduced: it's now time to make sense
of any monitoring data we have - and possibly trace and squash bugs.
Agenda for the morning is briefly reviewed and agreed on.
a) Why does network monitoring over LHCOPN between two dual-stack end points
show traffic over IPv4?
Andrea M.: On many FTS endpoints the FTS configuration was set to
prefer IPv4 before Christmas due to site configuration problems.
After Christmas a new FTS cluster was installed,
and IPv6 preference was restored in the FTS configuration.
The FTS server configuration allows to set an IPv6 preference
*per endpoint*.
Dave K.: Who has the authority to change the config?
Andrea M.: The FTS manager (or team), the VO manager (production role
in the VO) can also change it, but usually they aren't doing it.
Dave K.: With due understanding of the production system needs, disabling
IPv6 prevents proper problem diagnosis.
Is a direct connection to the SE tested when sites are certified
or is FTS used?
Andrea S.: CMS tests connections to the SE, bith via gridftp and xrootd.
Bruno H.: On Grafana there's no IPv6 traffic at all to and from BNL.
Duncan R.: From what I see, however, IPv6 is not working - no traffic.
Logs from fts307.usatlas.bnl.gov show that the 'PASV' command
(may actually be either PASV od EPSV) gets an IPv4 response.
Andrea M.: When a particular SE claims to be dual-stack, there may be a
pool of machines behind, and some of them may be misconfigured.
Duncan R.: Looking at transfers from Triumf to CERN, there is also
the case of IPv4 PASV responses, while the site should
be "dual stack".
Andrea M.: FTS will retry on IPv4 immediately if IPv6 fails for any
reason (hits a firewall or so) - and this fallback is not logged.
Dave K.: Should be writing a short troubleshooting guide to find the
many locations where the configuration may be incorrect ?
Presumably the sites forget one item in what could be
a systematic checklist.
Andrea M.: We do investigate further on requests to "just shut down
FTS on a certain link because all transfers are failing".
There was another issue that was discovered a while ago.
Sometimes the DHCPv6-offered IPv6 address is not refreshed.
CERN issue ?
Dave K.: How do we improve this situation in a scalable way ?
Twist site admins arms or rely on FTP experts ?
Andrea M.: First of all, SAM tests should test IPv6 connectivity.
Duncan R.: It's hard to enumerate and test storage nodes behind SE
head nodes. SAM testing (compute) worker nodes faces the same
issues.
Dave K.: In general FTS is the success rate == 100%, given the ability
to retry transfers?
Andrea M.: No, missing files, checksum errors, missing files all are
terminal failures.
Andrea S.: Can these failures be categorised?
Andrea M.: Partly - the monitoring can be improved to point more precisely
to the failing party (source/destination, and where/when).
b) Status of FTS IPv6 efficiency versus IPv4 efficiency
c) Is FTS3 monitoring correctly reporting IPv4 versus IPv6?
Something that didn't start is automatically reported as IPv4.
Andrea M.: We are going, as agreed with Duncan R., to add a new field in
the log, filled with the IP protocol version only when the
transfer starts, so that Grafana will be able to properly
filter on the used protocol.
Dave K.: A useful by-product will be the ability to see the protocol
string instead of true/false in Grafana.
Andrea M.: There are three tags in the FTS logs:
"TRANSFER" points to a failure during the transfer
"SOURCE" means that the source file is missing
"DESTINATION" is an existing destination file or checksum mismatch
Andrea S.: Is the error message logged as well ?
Andrea M.: Currently not.
Dave K.: Files on 'devel.cern.ch' seem sometimes not to be working.
Andrea M.: Will check what they are.
Andrea M.: Will also check with the monitoring team whether the failure
reasons can be further filtered to select the ones involving
file transfer.
Dave K.: Not counting them as IPv4-only failures will help in our
search for unexplained asymmetries.
A useful cross-check: the amount of transferred data for
'UNDEFINED' state transfer should be zero.
Dave K.: Is Xrootd transfer monitoring also in your ballpark?
Andrea M.: When, in the future, FTS transfers will be allowed via xrootd
it will be. But some development on xrootd and
implementations of HTTP servers supporting third-party copy
will be needed.
Francesco P.: Is xrootd already instrumented to log transfer size and
protocol?
Duncan R.: They reported it should - perhaps with an appropriate plugin.
We shoud get an update here from an xrootd developer.
Andrea M.: Another thing we cannot do is disable IPv6 and force a
fallback to IPv4 in xrootd and HTTP (WebDAV and the like).
Duncan R.: There was a big thread on the actual ability to know whether
IPv4 or IPv6 is used in a WebDAV transfer.
Andrea M.: Multi-stream transfers between the same pair of nodes
may also occur on different protocols!
d) what is status of PIC's investigation of transfers between two dual-stack
systems?
Fernando shows his slides:
(https://indico.cern.ch/event/762602/contributions/3164162/attachments/1783846/2906047/IPv4_on_IPv6v2.pdf)
Fernando L.: In June 80% of gridftp failures were for ATLAS,
now they are much better.
The CMS failures were due to a problem in the Singularity
image that CMS uses: IPv6 was disabled for the Gridftp
GFAL plugin (/etc/gfal2.d/gsiftp_plugin.conf - see slides)
Note: XROOTD statistics by protocol are obtained from the
DCACHE billing database.
e) in how many places can the preference for IPv6 (or IPv4) be configured?
Dave K.: We need an active representative from all of the experiments (our
customers) and an xrootd rep as well.
(coffee break)
Bruno H.: It may be useful to have a F2F meeting before the CHEP abstract
submission deadline.
Dave K.: We'll have to probably settle over e-mail/phone.
Next F2F meeting settled on Thursday-Friday May 2-3, usual times.
Next phone conferences:
Thursday, March 7th, 16:00 CET.
Thursday, April 11th, 16:00 CEST.
Hepix @ San Diego on the week of March 25th: usually somebody gives a report.
Who is planning to attend? Andrea S. is attending but will be busy - as long
as somebody prepares the slides he can present them.
Dave K.: A few IPv6-only worker nodes here and there would help.
Queen Mary was running DNS64/NAT64 - don't know if they still do.
Duncan R.: There's another site in Slovenia doing a similar exercise for
Atlas.
The status of IPv6 reachability of IGTF CA CRLs and IPv6
is accessible at this URL: http://cvmfs-6.ndgf.org/ipv6/overview.php
Dave K.: IPv6 will be in the tender requirements for the next Geant incarnation.