local: Eddie, Luca, Pablo, Maarten, Andrea, Nicolo', Alessandro remote: Pepe, David, Alessandra, Salvatore JIRA actions ------------ some are past deadline 31: waiting for atlas to switch 27: condor-g, ok for atlas but issues for cms Luca: on CMS we see errors that we need to investigate Maarten: as long as we don't understand the cms issues, they might affect ATLAS Maarten: new person? Pablo: yes, working on the new reports, a summer student Status Board monitoring on ipad (David) --------------------------------------- Motivation: already using an ipad for work. Reuse/represent monitoring data. Panic released a software called Status Board for ipad (2013). Can configure various widgets, the pro ones are more interesting for us. Look is great, you just need to provide the data. It displays: 1) status indicators (=> Naemon) 2) high priority site metrics (=> Graphite) Key site indicators: up/down, ATLAS SUM Flags: aggregate all instances of a service. Livestatus: can be used to retrieve Nagios data and convert it to an HTML table. Very fast to setup and convenient Future: use simplest method to pull in data and use JSON for more widgets. One can also build it in a web page with Graphite. Maarten: a read-only app? David: yes, you just display. Maarten: do you consult just a public web server? David: yes. Maarten: if the stuff that I want to display must not be public? David: need to investigate how Alessandro: how do you collect the metrics? where do you aggregate the information? David: we have a set of scripts, we do the aggregation in Graphite. Pablo: for the indicators, does this come from your Nagios? David: yes, they are all Nagios checks. Luca: the point is that you have a cron job that reads from Graphite and writes a JSON file. Alessandro: you need an HTTP server to host your data files. Maarten: you could have a catch all button to show that something is wrong without exposing in detail what is wrong for security reasons. Pablo: if another site wants to do the same? David: pull data from your source. Maarten: how much does it cost? David: about 10 euros Maarten: might be interesting for others to see if there is something similar for Android, the concept is very interesting. Alessandro: for ATLAS it was difficult to keep the scripts updated. It's not really a problem of showing but of getting it right. the CSV or HTML are similar to the SSB data. Can we think about using Graphite instead of our own internal plotting libraries? Luca: Graphite is a frontend on top of Carbon, it also offers a lot of math and statistical tools, not only visualisation and is strictly bound to the data format. Luca: the visualisation part of Graphite is actually poor, the strengths are on the other aspects of the framework. Pepe: does the app give alerts? David: we use the Nagios notifications, but for now the app is purely passive, there are no notifications in the app. Recomputations in SAM3 (Pablo) ------------------------------ We want to do it by combining the original availability with the additional data, like we do for the downtimes. We plan to allow also for visualising the values before the recomputation Maarten: it's important to be able to undo it in case somebody makes a mistake. Alessandro: the cleanest solution is to have a new metric. (Agreed by everybody) Who does the corrections? We for ops, but what about the experiment metrics? It can be the experiments. (Agreed by everybody) Alessandro: in SAM3 we can have the PDF report sent to sites on the 1st of the month, sites have 10 days to ask for a recomputation and if no action is taken the 10th, when is the report recomputed? Maarten: I would have versioned reports and the latest one is the one that counts. Normally at most an additional version of the report is produced and it should be announced. Pablo: it will be shown at the MB instead of the current one. To be understood how long an experiment would have to make the correction before the report is updated. Alessandro: ATLAS would like to publish the state of a site, not of a service. This can be visible by the site if put in a profile. When can it be done? Pablo: even now. We'll sort it out together. ATLAS would like to start doing it: ATLAS site usability for analysis and production. Next meeting: talk about xrootd monitoring, WLCG transfer monitoring. When: in 3 weeks from now, the 25th, at 1400, here. Nicolo': please invite Shawn and Marian.