WLCG throughput call

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map
Description
0) Agenda Review/Update, News 1) perfSONAR status - Issues to report for 3.5.1 - WLCG deployment/operations status 2) WLCG network throughput SU tickets 3) Focus Topic: Re-organisation of the WLCG-wide meshes 4) Round-table about throughput, network, monitoring, data transfer and new issues to track 5) AOB and next meeting

Attended:

Shawn, Duncan, Frederique, Marian

Fernando, Frederic, Jason, Bruno (excused)

 

Agenda:

0) Agenda Review/Update, News

- News: 

 - perfSONAR 4.0 (formerly 3.6) RC planned end of June, final release sometime in Sept/Oct 

 - Next week NA throughput call will have Andy Lake presenting major features introduced in perfSONAR 4.0 

 - One of the major changes is the new configuration management that was presented already some time ago and was discussed at the meeting, we’re looking for volunteers willing to test the new web interface (one of the supported use cases is to organise campus/site testing)

 

1) perfSONAR status

- Issues to report for 3.5.1 

  - Marian filed a bug on web interface showing regular testing not running despite daemon running fine 

  - EGI SVG advisory on iperf3 was broadcasted, please check if you get iperf3 from Internet2 repo since epel still has the old version. Site on auto-updates should have received the fixed iperf3 on Jun 09

 

- WLCG deployment/operations status 

  - We took the opportunity to look at the status of the UK sites in detail and discussed issues seen

 

2) WLCG network throughput SU tickets 

GGUS:119820 ASGC - resolved, waiting for the implementation of the recommendations

GGUS:121687 RAL consistent loss - waiting for an upgrade of the RAL router

GGUS:121905 BNL to SARA - SARA perfSONARs were fixed, Marian reported issue didn't disappear and he will present another report on it (ticket was updated in the meantime, consistent loss is seen both inbound/outbound but a lot more outbound from SARA, it was suggested we wait until SARA moves to the new data centre before investigating further)

Grid output retrieval failing: Victoria - Prague - asymmetric paths and MTU step down issues - resolved

Possible network issue between McGill and BU - gridftp transfers timing out - resolved (was an issue with storage)

 

3) Focus Topic: Re-organization of the meshes

 

- Marian presented a proposal to introduce 3 bandwidth and 3 latency meshes instead of the existing WLCG all latency and all bandwidth. The main benefits will be our ability to better reflect the production traffic as well as decreasing the size of the meshes and potential to increase the frequency of bandwidth testing. The full proposal is listed at http://etf.cern.ch/perfsonar_meshes2.txt, it’s auto-generated using a python script and takes into account the following:

- Each mesh is mapped to the corresponding T1/T2 structure of an experiment

- Sonars are selected from a pool, which contains all registered sonars, the main selection criteria is to get best coverage for a given experiment

- Utilisation is computed for the proposed meshes (showing number of hosts to test and amount of throughput testing that will be performed)

 

Proposal has 6 parts: 

1. Topology check (missing sonars for major T1s/T2s)

2. Initial meshes as selected from a pool (listing all existing sonars at a given site)

3. Filtered meshes (removing test sonars and sonars not working correctly) - there are few errors such as ASGC and SARA sonars which are working fine and will be included in the meshes

4. Utilisation Status for filtered meshes (only base meshes are considered)

5. Utilisation Status for filtered meshes (including OPN, LHC1 and Dual-Stack meshes)

6. Global Mesh (sonars that were not added to any mesh - test sonars, sonars not working fine, etc) 

 

Duncan expressed interest in getting latencies/loss tested for dual stack and a need to better reflect IPv6 production traffic in the meshes. It was agreed that this is indeed very important and we should follow up on this.

 

Unless there are objections, the plan is to implement the proposal as soon as possible, preferably before end of June and then review the status in September.

 

4) Round-table about throughput, network, monitoring, data transfer and new issues to track

- NTR

 

5) AOB and next meeting

- NA throughput next week

- Next WLCG throughput mid August (or mid Sept)

- pre-GDB on network will be on 13th of December

- LHCONE/LHCOPN will be held in Sept

There are minutes attached to this event. Show them.
The agenda of this meeting is empty