WLCG monitoring consolidation Minutes Friday 02 August 2013 - Ivan Dzhunov Participants Local : Lionel, Paloma, Jacobo, Maarten, Julia, Pablo, Ivan, Alex, Robert, Pedro, Ale Remote : n/a Pablo: Marian sent questionnaire to experiments with the topics relevant to the evolution of the SAM probes and its framework - https://twiki.cern.ch/twiki/bin/view/LCG/SAMUsage inputs from 2 experiaments so far Architecture reviews : ===================================================================================================================== Transport review - Robert https://twiki.cern.ch/twiki/bin/view/LCG/TransportReview Comments: Lionel: You listed only 2 technologies http and messaging, but also DB is used - sql lite for config changes in SAM. Robert: I think it is not a sql lite db. Maarten: It is either a communication mechanism or a hack. Pablo: Doesn't it impact other layers(storage) ? Lionel: It is responsibility of the developers, not the MSG team. At the end agreed that this is not an issue, because it is the same as transport inserts into DB. Jacobo: Me and Marian created a micro framework for consumers/publishers that was not seen in the list. Robert: I should add it. Pedro: Isn't the list of consumers and producers too short, we should have much more common transport entries for the entire framework. Maarten: This is overview of the SAM case, we can have many more entries if we consider a global view. ===================================================================================================================== Storage review - Julia https://twiki.cern.ch/twiki/bin/view/LCG/StorageAggregationReview Comments: Maarten: Large set of choices for new technologies, can't we learn from Agile Infrastructure? Pedro: In my presentation I shared my experience, I haven't done the whole homework yet/evaluated every solution. Julia: HDFS, Hadoop, Elastic search are considered seriously, we can't evaluate everything (no manpower), but we can benefit from AI's experience. Pedro: Shouldn't we first simplify the existing storage and then evaluate different technologies? Julia: We try to parallelize the effort : move processing out of DB, evaluate in Oracle and then see if we can put the data structure in different solutions and evaluate them also. Maarten: Why evalute in Oracle when we want to run away from it? Julia : We know it well and can do it quickly. ===================================================================================================================== Visualisation review - Jacobo https://twiki.cern.ch/twiki/bin/view/LCG/VisualisationReview Comments: Maarten: Once consilidation has applied in the other layers, we can say how long will it take to change visualisation. Alex: Why highcharts is considered a strong candidate? Don't you forsee any problem with license in future? Currently open source, but is a commercial project. Jacobo: We will have to go to legacy version of highcharts. Similar with what happened with google charts that will dissappear, so we have to move away from it. Pedro: If we have 20 dashboards and the idea is to reduce the effort, can't it happen that for half of them there could be a solution working out of the box, that requires only configuration - Kibana Julia: Our users are used to the dashboards, also the users are not computer scientists, but physicists. Alex: In the past image was created server side, now we get data from API, providing the data for a image servier side is semantically not part of the visualisation. Pedro: Can't we learn from existing technologies like Kibana? Maarten: What about caching layer in between data and visualisation? Perdo, Maarten, Jacobo agreed that trend goes in larger client side applications. Lionel: Would it make sence to have an abstration layer between the data and the image generation (something like flot.js, d3.js)? Reduce the cost of moving to another plotting library. Jacobo: Abstration layer consists of calling a method or another, makes sence but currently not seen the need for it agreed: - should be kept in mind, but can be decided case by case - the layer could be imitation of what the library gives you ===================================================================================================================== Documentation review - Ivan https://twiki.cern.ch/twiki/bin/view/LCG/DocumentationReview Comments: Alex: What is the advantage of Confluence? Jacobo: Use of twiki is painfull. Confluence much more user friendly and nice features. Lionel's corrections: It depends on the department strategy, not on the group's one. Migration will take much more than few weeks. ===================================================================================================================== Deployment review - Paloma https://twiki.cern.ch/twiki/bin/view/LCG/DeploymentReview Comments: Maarten: Migration to puppet could be done in steps - quattor for managing the packages but not for configuration, hybrid solution could work for a while. Pedro: One year ago a summer student worked on SAM installations. Puppet and YAIM can work together. Alex: Is there a Koji/Bamboo discussion for the build system? Pedro: Bamboo is contuniously integration system, where builds are made by Koji. Pablo: No, builds can be done in Bamboo itself. Is koji used in Bamboo? to be reviewed/taken offline ===================================================================================================================== Recurrent task review - Alex https://twiki.cern.ch/twiki/bin/view/LCG/RecurrentTaskReview Comments: Maarten: Do we talk only for scheduling (when to do what), or also for competition between tasks? Should a task wait for its turn, are there priorities, etc. as in a batch system? Alex: No, only when to run what, no priority, no waiting. Pedro: If other technologies are selected we don't have to think about it - they may come out of the box. Maarten: Varios components need to be controlled, should be configurable to avoid clashes. ===================================================================================================================== Pablo: Next meeting in 2 weeks, focus will be on site/service monitoring - cut/merge things.