Meeting Notes for WLCG Network and Transfer Metrics WG ============================================================ September 8, 2014 from 3 PM to 4:15 PM CERN time Attending: Alessandra Forti, Duncan Rand, Shawn McKee, Frederique Le Flour Chollet, Joel Closier, Jason Zurawski, Saul Youssef, Vincent Garonne, Frederic Schaer, Jorge Alberto Diaz Cruz, Alessandro de Salvo, Jose Flix Molina, Kaushik De, Stefan Roiser at CERN: Julia Andreeva, John Shade, Michail Salichos, Tony Wildish, Andrea Sciaba, Alessandro Di Girolamo, Albero Aimar, Felix Lee, Hassen Riahi, Marian Babik Indico: https://indico.cern.ch/event/336520/ The meeting purpose was to: - Provide an overview of the current status in network and transfer metrics - Discuss organizational aspects of the working group (communication, task tracking, meetings schedule) - Propose and discuss a list of topics and tasks as well as their priorities and plan the follow up meetings Executive summary: Kick-off meeting took place on 8th of September (agenda at https://indico.cern.ch/event/336520/). The meeting had very good participation including experiments, ESNet Science Engagement Group (perfSONAR development team), Panda, PhEDEx, FTS, FAX as well as majority of the perfSONAR regional contacts. An initial overview of the current status in the network and transfer metrics was presented and a list of topics and tasks to work on in the short-term was proposed. Very good feedback was received and we have agreed on the topics to discuss at the follow up meetings. More details can be found at WG Twiki at https://twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics List of actions: ALL (urgent): Please vote on your preferred dates for the next meetings: Metrics area meeting focusing on use cases and review of the transfer systems (T1.1, T1.2): 13-17th October http://doodle.com/xvwdvysdrdzap8wh Meetings focusing strictly on perfSONAR operations (T2.1): 29 Sept - 3 October http://doodle.com/e6epkkqmdx6ka3r7 20 Oct - 24 October http://doodle.com/qydib32fkv48er2r NOTE: We'd like to encourage ALL mesh leaders to participate in the perfSONAR operations meetings. ALL: Send comments and suggestions on the proposed list of topics/tasks and on the way WG will be organized ALL: Volunteer to lead tasks in the metrics area (T1s) Julia: Send a list of topics concerning xRootD tasks to the WG. To be discussed with WLCG OPS Coordination. Marian: Setup WG JIRA and report to WLCG OPS coordination every 2 weeks on the status of ongoing tasks. Marian, Shawn: Prepare abstract for CHEP2015 (deadline Oct 15th). Minutes: Initial overview of the WG (mandate, objectives, team and basic organizations) were presented and received no comments. Afterwards, Network Monitoring Status was presented by Shawn. Questions: Ale: Expressed concerns on the current organization of the meshes (regional-based), which right now has a strong focus on the ATLAS tiering structure. In some way there are artificial links like e.g. Beijing is part of French mesh, etc. Shawn: We're currently working on WLCG perfSONAR configuration interface that will make it possible to fine tune both test parameters and meshes according to the requirements (to be in production soon). Current coverage and its optimization will be discussed in the metrics area. Tony: Asked if the purpose of grouping the sites into meshes is to control and organize the tests (and optimize their frequencies). Shawn: Yes John: Commented that there is a third firewall issue about doing 100G throughput tests over firewall, thus "flooding" firewall. Shawn: This item will be added to the list of topics for the perfSONAR area operations discussion. Currently there are no 100G perfSONAR instances but there are some 40G ones. The solution (generically) is that sites should setup following the Science DMZ model (see https://fasterdata.es.net/science-dmz/ ). Andrea: Asked about detection tools that would make it possible to help with the firewall issues (e.g. exposing port status, service information). Shawn: We have currently OMD based instance monitoring the entire perfSONAR network, i.e. versions deployed, availability of measurements archives, network reachability, etc. This interface is currently password protected, but we're working to expose it to all sites (. There is also perfSONAR dashboard that shows measurements (http://maddash.aglt2.org/maddash-webui/index.cgi) - most of the "orange" boxes (unknown/missing data) are very likely caused by the firewall issues. Ilija: Asked if it would be possible to fix traceroute tests when they're blocked. Shawn: We can ask but this is difficult to do for all sites. Some institutions block ICMP packets at their border. We should follow up as part of the perfSONAR subgroup to see if we can at least fix some of the problems. Kaushik: Commented that blocking traceroute might be a state-wide security policy, which is not going to change. Shawn: Yes, that is the difficult part. In many cases having the traceroute up to the border of the site is sufficient, but not optimal. We lose potentially important information about the end of the path we are testing. Presentation of the initial/draft status on the Transfer Metrics was given by Marian. Ale: Commented about making sure things are not correlated so they stay useful. In the current monitoring, we do have overlaps that can impact each other, it's important that we try to streamline this in order to avoid confusing results. In addition, Ale expressed that it would be great if we could differentiate between network, protocol and storage metrics/measurements. Finally, review of the proposed topics/tasks was presented by Marian. Kaushik on Topics/Tasks: Gather requirements should be iterative. Don't just pick some requirements at the start and not look at them anymore. (T2.2 should reflect this). Also suggested that it would be good to have perfSONAR team on board for discussions (the remaining tasks will require a more general discussion and direct feedback btw. perfSONAR team, experiments and transfer systems). Michael: Asked if there will be a central repository for the perfSONAR information and how this information can be used in the context of FTS. Marian: We're working on establishing WLCG perfSONAR datastore that will be operated by OSG and will expose all network monitoring data, OSG is currently testing one of the candidates for scalability (https://twiki.opensciencegrid.org/bin/view/Production/OSGNetworkDatastorePlan). In addition, we'll likely need a service that is going to provide information on which sonars can provide relevant networking information for a given storage element. Julia: Commented on the need to discuss how to incorporate XRootD (FAX/AAA) deployment issues into the WG Marian suggested that Julia sends a list of concrete topics to the WG, so we can discuss them and follow up with WLCG OPS coordination