WLCG Operations and Support Tools TEG - Kick-off meeting Participants: Joel Closier, David Collados, Ian Collier, Pablo Fernandez, Tiziana Ferrari, Josep Flix, Maria Girone, Peter Gronbech, Oliver Gutsche, S. Purdie, Rob Quick, Stefan Roiser, Andrea Sciabà, Jeff Templon, ? TEG objectives -------------- Maria started introducing the mandate of the TEG. There are two main deliverables: a document describing the current situation (to be ready in about one month) and a strategy document to setup the plans for the next two years (which will requires face to face workshops). Then she described the different areas of work, which are several. The examples given are not meant to be exhaustive and a more complete list of topics and tools will be needed. About monitoring, it was suggested to cover monitoring information that can be used to take corrective actions in an automatic way. There are several specific examples but it might be generalized. Josep asked if the TEG should give recommendations to standardize site monitoring. Simone added that different sites have different custom monitoring for the same service (e.g. FTS). Maria said that it is a topic for the strategy document. Jeff mentioned that in the past it was typical for different experiments to monitor the same services in different ways, causing an unnecessary load. Joel added that the quality of monitoring varies significantly from one Tier-1 to another. It was agreed that the main source of problems is that middleware services do not have a good monitoring to begin with. CernVMFS should be an opportunity to avoid this problem. It was determined that software management for tools like root or Geant 4 is not in the scope of the TEG, as they are totally embedded in the experiment software. On the other hand, AFS usage of software deployment policies should be covered. Simone pointed out the importance of a good understanding of current middleware and service deployment policies (including the role of preproduction or pilot services). There was a discussion about the role of WLCG as a middleware provider; services provided by WLCG are simpler to deploy as there are no dependencies from external projects. Joel remarked that today it takes way too much time from when a bug is reported to when a fix is deployed worldwide. This is something that will have to be discussed and it also involves communication in WLCG and with external providers. Stefan stressed the importance of having a quick deployment process at all sites, not only Tier-1's. ALICE does have a procedure to automatically deploy new versions, but it is made easier by the fact that it just acts on the VOBoxes and it would be difficult to have it for other services. There are several procedures to handle operations that work but could be improved. Experience from the past (for example migrating from SL4 to SL5) should teach how to avoid doing some mistakes. A problem is the lack of upgrade paths, which requires a higher amount of coordination. Experiments' feedback on computing procedures would be useful and it should be a topic for the workshop. As an example, Joel said that concentrating site interventions in few, larger downtimes is much better than having several short downtimes. Work organisation ----------------- Maria illustrated a way to proceed to prepare the first deliverable. Everybody should, by this Thursday, send his/her view on 1) what works well, 2) what needs more effort and 3) what are the three biggest problems, in his/her areas of expertise. Subgroups should be created, with two editors each, to write the corresponding section of the deliverable. Concern was raised about the aggressive timescale; anyway it is not under control. Any request for extension should be motivated by real needs. The subgroups would be: - monitoring - support tools - operational requirements for middleware - application software management, deployment and configuration - middleware distribution management, deployment and configuration Maria proposed to have a F2F workshop on 12-13/12 just before the GDB. The final reports from the TEG are due on February 7. The next meeting will be next Monday at 14:00 CET. We will aim at weekly meetings, given the compressed time scale. Volunteers for being editors or subgroup participation should write to Maria and Jeff.