System Analysis Working Group meeting

2/1-034 (CERN)



Show room on map
julia andreeva
The main topic is what is site availability from the VO perspective. How experiments are using the SAM framework for the experiment specific tests Visualization of the results of SAM tests (various options) What is missing ?
Stefano: What should be taken into account while deciding the status of the sites: results of SAM tests avalability of site calculated based on SAM tests number of running/pending jobs success rate published SW versions Piotr objected to the idea that everyone can use SAM programmatic interface (in particular for historical queries which are heavy) for calculating site availability on their own. Rather we need to think how to provide necessary flexibility in SAM for calculating availability Julia and Roberto: LHCb would like to have a possibility to run a job which would publish to SAM aggregated results of sanity checks done via normal Dirac jobs over a certain time range. Then from SAM they should be published back to the local site fabric monitoring James : Why we publish it back to SAM? Should be carefull not to overload SAM too much by publising more and more tests. Suggested to publish directly from Dirac, but using the same publishing mechanism as used by SAM. How site admins would like to be informed of how their site is behaving from the point of view of the VOs they are serving. Stefano: So far 4 CMS sites had been asked and all answers are different. San-Diego mail every hour Brunel - GridView Taiwan - published back to Nagios FNAL - run tests locally. Julia: Not always sufficient. James: Local tests should improve the time granularity complementing the tests running remotely Andrea: - local versions of the CMS tests may be run by sites, but they should not replace the official tests, just complement them Stefano told that CMS might be interested to try the prototype developed by Grid service monitoring WG for publishing results back to local fabrics monitoring. Julia sent him a link to the twiki page with the prototype description. James: For the workshop would be nice if people come up with the suggestion how the UI showing site and service status from the VO perspective should look like Andrea: - the CMS SRM tests will be fed for integration to the SAM team - the FNAL problem with availability should be fixed ASAP in on e way or another, probably with a hack in GridView, otherwise it must be made clear to the MB that the FNAL availability is wrong - the FCR functionality to exclude SEs is broken because it makes sense only for the classic SEs - the effect of critical tests on CE exclusion from the BDII via FCR should be decoupled from the effect on the availability calculation Piort: We can have different kind of aggregated metrics with logical 'and' which can trigger different actions, not just excluding test from BDII Should be algorithm for calculating SAM availability the same for all VOs and sifferent use cases? Andrea: Algorithm should be the same for all VOs, but rather there should be a flexibility for defining critical tests Piotr: Might be usefull to have dependency of the criticality of the test on the tier or other attribute of the site Alessandro: Granularity of info in SAM (not just host) but also for exampe pools for storage related tests Max showed the GridMap prototype. It needs to be shown to people the experiments , so that people can think about use cases. One of the issues for adopting of the tool for different experiments needs is how to get the experiment topology. Some effort is required to have the common way to publish experiment topology so that it can be used by any application also the experiment. On the other hand this info should be kept uptodate by the experiment people.
There are minutes attached to this event. Show them.
    • 10:00 10:20
      Site availability from the CMS perspective 20m
      Speaker: Stefano Belforte
    • 10:20 10:40
      Using SAM framework for the CMS specific tests 20m
      Speaker: Andrea Sciaba
    • 10:40 11:00
      GridMap visulalization for the results of SAM tests. Other use cases? 20m
      Link to the GridMap prototype:
      Speaker: Max Boehm
    • 11:00 12:00
      Discussion 1h
      We take as an example the CMS experiment. What is site availability from the point of view of other experiments? SAM framework used by other experiments. Are experiments happy with currently provided UI? Use cases for GridMap view in the Experiment Dashboard or/and other monitoring systems. How the results of SAM tests or site availability calculated according to the experiment's policy should be communicated to the sites (existing practices, suggestions for the future)?