Demo of WLCGMON

Name: Demo of WLCGMON
Start: 2013-11-08T14:00:00+01:00
End: 2013-11-08T15:35:00+01:00
Location: CERN

Friday 8 Nov 2013, 14:00 → 15:35 Europe/Zurich

31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105

Show room on map

Description

Demo of the current status of the WLCGMON prototype

Hide

WLCG Monitoring Consolidation 8-Nov-2013

Present: Nicolo, Pablo, Julia, Stefan, AleDG, Maarten, Lionel, Luca , Costin, Marian, Jacobo, Eddie, Ivan, Valentina, Pepe, David (Crooks)
Apologies: Pedro, David (Tuckett)

Minutes Ale DiGGi

_____________________________________________
Pablo introduction
thanks for feedback to the report, we are now applying the changes, both in the content and the format. Once they are ready, we'll circulate them again.

Jacobo presentaion:
Service and Site status through SSB aggregation
- motivation: simplify and reduce efforts
- goal: use SSB virtual metric to compute site status…
- to match the present SAM features we will have some POEM like metric definition
- simplify algorithms: different flavors with same tests can be simplified. flavour name arbitrary.
- FQAN support: each metric + FQAN is independent
- topology: VO feed as only source, not limited to services in GOCDB/OIM, it should be stored in SSB
- current limitations: no downtime;

- metric validity: now SAM validity is 24h. shorter validity means closer to real time, but too short could be a problem. we are working on it now. Validity will be defined per metric
Nicolo: the CMS real production jobs are often long in the queue
Pablo: all the metrics of SRM are quite regular. for the CEs and WNs we have 3 hours in average of validity.
ADG: validity and alerts can be also split, maybe the same value but an alert sent if no results in e.g. 50% of validity.

- upcoming challenges:
* filters (e.g. results for Tier1).
* availability/rel on top of any SSB metric.
ADG: This is already done for ATLAS.
Pablo: Yes, that's why we are not too worried about it
* Validation, this will take time too.

- summary (slide 12)

----> demo
1st wlcg-mon/dashboard/request.py/siteview
global view with the experiment CRITICAL profiles. the profiles (if you click) are virtual metrics, if you click you may then see the specific metrics.
this is done at the site level.
if you select the e.g. CMS_CRITICAL_tests profile you will see the various services contributing to the profile.
The key is the service endpoint, with also the service flavour.
You can filter on the view. you can search (filter is case insensitive)
If you click on the header of the column you expand the values.
All possible to get through API. for all both virtual metrics and real metrics.

Stefan: in the view of CMS_CRITICAL_tests, is it "you" putting the columns or me?
Pablo: It will be the experiments to define/aggregate their view, the SSB team will create the default.
- main view will have the GOCDB/OIM site name
- it will be one instance.

Julia: one instance for all the experiments/SAM everything?
Pablo: we will still maintain the other SSB instances
- in WLCG dashboard transfer we have hirarchical structure, we would like to have the same structure here, without duplication of data.

- can we start using that?
YES.
- downtimes integration will be similar to what we have done with ATLAS.

----> DEMO over.
Pablo: Next meeting in 2 weeks from now, Friday the 22nd morning (since the afternoon is already scheduled for other meetings).
Pablo: 3 topics in mind: going through the report, input from the experiments regarding this prototype, and input fro the Operations as it how this will affect the sites (if Pepe agrees)
Pepe: OK.

Done.

There are minutes attached to this event. Show them.

- 1
  
  Review of version 1.0 of the report
  
  Speaker: Pablo Saiz (CERN)
- 2
  
  Demo of WLCGMON
  
  See http://wlcg-mon.cern.ch
  
  Speaker: Jacobo Tarragon Cros (CERN)
  
  Slides
- 3
  
  Discussion
- 4
  
  Next meeting

Choose timezone

Demo of WLCGMON

31/3-004 - IT Amphitheatre

CERN