Network and Transfer Metrics WG

Europe/Zurich
28/R-015 (CERN)

28/R-015

CERN

15
Show room on map
Marian Babik (CERN) , Shawn McKee (University of Michigan ATLAS Group)
Description

The meeting date/time is a result of a fixed schedule agreed in the last meeting (https://indico.cern.ch/event/354593/).

Details on the Network and Transfer Metrics WG are available at our Twiki.  

Network and Transfer Metrics WG (8th April 2015) Minutes

Attended: Hung-Te Lee, Ian, Henryk, Frederique, Alessandro DiGi, Bruno, Stefan S, Marian (Excused: Tony, Jason)

Agenda/slides presented at https://indico.cern.ch/event/382622/

We're still missing input to the use case document, please provide it ASAP (https://docs.google.com/document/d/1ceiNlTUJCwSuOuvbEHZnZp0XkWkwdkPQTQic0VbH1mc/edit)

Draft of the CHEP presentation is attached in Indico, please send your comments and suggestions directly to Shawn.

Next meetings: 6 May, 3 June, 8 July, 2 Sept - all at 4pm CEST

1) perfSONAR status

Review of the mesh configuration changes was presented. Release 3.4.2 introduced significant improvements over 3.4.1, LHCOPN and LHCONE now consistently delivering all metrics. Initial results of the data completeness (data measured by the network) were shown. The plan is to restart the full mesh LHC Latency ramp up (starting with top-k sites). News on security and infrastructure monitoring were presented. Please register in OIM to use our configuration interface at https://oim.grid.iu.edu/oim/meshconfig , please contact me or Shawn if you have issues.

Current deployment status was discussed (http://grid-monitoring.cern.ch/perfsonar_report.txt):

–WLCG perfSONAR service status report on 2015-04-08 04:02:24.048580 =======

–Active perfSONAR instances: 233

–Registered/monitored perfSONAR instances: 259

–perfSONAR-PS versions deployed:

–   3.4.1 : 33

–   3.4.2 : 172

–   Unknown: 26

–Incorrectly configured (failing >4 metrics): 26

Please check status of sonars still on 3.4.1 in your region (they either run out of disk or have auto-updates disabled, which poses a potential security problem and is not recommended). The only sonar on 3.4.2 that doesn’t seem to work correctly is the GRIDKA Latency node, Bruno commented he will try to re-install it after CHEP.

Frederique commented on psmad dashboard showing incomplete data. Marian explained that psmad is connected to the OSG datastore and both are just testing (pilot) instances, so may not work all the time. For direct results (coming from local MAs) please check maddash.aglt2.org

2) Network Incidents Follow up

Discussed at the WLCG operations coordination meeting, agreed to start and introduce possible modifications later on once we gain more experience (details available on slides). 

3) Datastore/esmond status 

Validation work on-going, working on getting metrics to check accuracy and coverage/completeness of the data collection. 

4) Pilot projects 

Henryk reported on the progress done in esmond2mq, parallelisation has significantly improved performance, but we still have issues with missing raw data in the datastore. To be followed up together with Shawn and Jorge Batista to check for potential issues in querying esmond. 

Ian commented that it would be the best if we could directly run publishers on the perfSONARs, he would be willing to test if esmon2mq can be run locally.

Marian presented early work on proximity/topology service (proximity.cern.ch) – started with site-based mappings and geoip. Initial goal is to fetch active SEs from FTS and map them to perfSONARs. Plan is to test different algorithms (site mapping, traceroutes, geoip) and evaluate existing tools if they could be used for this purpose. After the meeting, Ian has shared a link on Shoal project, which is used to geo locate nearest Squids (http://shoal.heprc.uvic.ca/).

Next meeting will be focused on FTS performance project (May 6th 4 PM CEST, https://indico.cern.ch/event/382623/).

There are minutes attached to this event. Show them.
The agenda of this meeting is empty