WLCG operations coordination kick-off meeting

Agenda

Participants

Local: I. Fisk, M. Girone, A. Sciabà, M. Barroso, D. Collados, S. Roiser, A. Beche, J. Closier, M. Cattaneo, S. Campana, P. Love, E. Dafonte, A. Valassi, M. Litmaath, M. Dimou, S. Gowdy, I. Ueda, A. Lossent, A. Di Girolamo, M. Guijarro, N. Magini

Remote: M. Sgaravatto, M. Zielinski, C. Grandi, A. Lahiff, A. Sansum, C. Condurache, M. Jouvin, D. Bonacorsi, C. Cioffi, A. Cavalli, S. De Witt, J. Coles, R. Vernet, S. Mc Kee, T. Ferrari, A. Forti, C. Wissing.

Chair: M. Girone

Secretary: A. Sciabà

Introduction (Maria G.)

Maria presents the working group. The focus of the activities will be on coordination with limited additional effort from WLCG. The goal is to have Computing as a Service by the end of the long shutdown. It will be based also on contributions from site people. It will interact with the other working groups.

Communication channels (Andrea S.)

Andrea presents the communication channels. The most crucial point is how to reach sites via email, or in other words how to populate a mailing list such that every site is included.

Michel says that the problem is to have all sites, and sites might prefer to give mailing lists.

Simone points out that ATLAS has cloud support mailing lists, but these are not always effective because of the delegation layer in the middle.

Michel proposes to extract the contacts from the GOCDB (we never managed to have a list of T2 contacts).

According to Alessandra the real problem is that sites do not always feel involved and we must be able to involve them.

Ikuo warns about the risk of having the same people receiving multiple copies of the same email and Maarten about the risk of people getting too many emails about VOs they do not support.

Ian says that for example CMS could ask its sites to subscribe to wlcg-operations@cernNOSPAMPLEASE.ch.

Michel says that we could ask sites to subscribe and after one month check if somebody is missing.

Maria D. points out that since 2008 we use for GGUS notifications to the sites the contact emails extracted from GOCDB/OIM and it usually works fine.

Maria G. suggests to start with lists from the experiments and in parallel collect emails from GOCDB/OIM.

Daniele asks what is the scope of the experiment reports in the fortnightly meeting and Andrea confirms that it is something in the middle between the daily operations reports and the quarterly GDB reports. Maria G. adds that the focus should be on plans because issues will still be tracked by Maria D. as in the old T1SCM.

CVMFS task force (Stefan)

Stefan presents the plans of the CVMFS task force. The primary goal is to help sites that still need to deploy CVMFS. He favours frequent reports and a direct contact with sites whenever practical. SAM tests should be used to verify the correctness of the CVMFS setup. Michel says that the TF should also test new features, and Stefan agrees. Michel and Christoph express the desire to have sites involved in the testing.

gLExec task force (Maarten)

Maarten presents the glExec TF. The goal is to have it working at (almost) all sites. Currently ATLAS, CMS, LHCb and OPS run glexec SAM tests. The CMS ones are the most realistic. For now it is not terribly urgent (only CMS is pushing for it) but we should try to ramp up the progress, to be better prepared in case it suddenly does become very urgent. The experiments need to be directly involved in the TF, while it is not clear if sites are needed at this time. We should also involve OSG and EGI to have the deployment done via their channels.
Tiziana volunteers to be in the task force.

PerfSONAR task force (Simone)

Simone described the main challenges, namely how to configure a large N x N metric of channels, how to test them without causing congestion and how to visualise them. He plans to start from sites particularly important for the experiments. pS should also be registered as a service in the GOCDB. We need to consolidate the configuration instructions, want to test a new Internet2 development about mesh configuration and extend the pS-PS dashboard. It will be a long-term effort requiring also some development.

Tiziana asks if this is going to be pS-PS or pS-MDM. Simone and Shawn say that they are compatible so either of them can be used (pending some internal checks still to be done). The Tier-0 and the Tier-2 sites use pS-PS.

Alessandro Di G. advocates the introduction of service flavour in GOCDB to distinguish pS-PS and pS-MDM without having to register two different service types. This will not be needed if it is confirmed that the two flavours are compatible.

Tiziana asks the creation of a pS support unit in GGUS to provide support to sites.

Daniele asks what is the boundary between our WG and Michael Ernst's networking WG. Shawn says that we focus on deployment issues, they focus on development.

Alessandra volunteers for the TF.

Tracking tools task force (Maria D.)

Maria presents the TF, where experts on the various tools should be present. The idea is to have internal meetings, like the one running since years with the GGUS developers, and come to the general meeting to propose timelines for important changes and get feedback from the community.

There is a discussion about what are the ticketing tools to be used now. For example Ian points out that for the recent CMS security challenge GGUS was not exclusively used. Maria explains that this was because it was not consider "closed" enough to discuss security matters. Concerning OSG Footprints, it's fully interfaced with GGUS, while EGI RT is deliberately not integrated, as Tiziana explains (it is used for incidents, not as a generic support channel).

Tiziana says that the EGI security team is discussing with OSG to decide how to deal with security incidents across the infrastructures.

Middleware deployment task force (Maarten)

Maarten starts by saying that EGI sites in WLCG are not in an optimal shape concerning middleware releases and it is urgent to move away from middleware that already is unsupported or will become unsupported soon. For now we should steadily upgrade services and clients to EMI 1 or 2 (better). At some point next year EMI 3 should be tried. OSG seems to be taking care of similar matters themselves. Another topic will be the move to SL6, that already happened at a few sites but will become a hot topic. There is also the issue of improving documentation, logging, error messages, etc. Sites, experiments and infrastructure projects should all be involved.

Maria D. asks if we should keep using the TEG twikis or create new ones. Everybody favours a fresh start.

XrootD task force (Ian)

ATLAS and CMS are rolling out federated data infrastructures based on XrootD. The existing prototypes are in a reasonable shape but we need to have production-quality services. The task force will organize the rollout of such services, similarly as it was done in the past for SRM, Frontier or CREAM. We will also need SAM tests.

Simone asks if filename translation issues will be covered. Ian answers that what to do exactly will be decided in the TF, but it should be of common interest, and this seems to be rather ATLAS-specific.

Alessandro Di G. points out that there are two types of services to be monitored and tested: xrootd servers and xrootd redirectors.

Shaun volunteers for the task force.

Squid monitoring task force (Simone)

Currently squid servers are monitored for both ATLAS and CMS using a tool developed at FNAL, which works well. WLCG should make this tool more official and define more formally how it should be run. This does not exclude monitoring via SAM. Dave accepted to lead this task force.

FTS 3 task force (Nicolò)

Now FTS 3 is in prototype stage. The task force should verify the FTS 2-like functionality, the integration with the experiment systems, interact with the sites to improve the installation procedures, etc. In the longer term, we should define and test the final deployment topology based on the new features (e.g. running a single global server vs. the classic model of servers at Tier-1 sites, dynamic configuration, etc.). For communication we are using the FTS 3 mailing list.

SHA-2 task force (Maarten)

We should prepare all the middleware to use RFC proxies with SHA-2 hashes, and also the experiments must make sure that they are ready for the change. The closest interaction will be with the developers (for bug fixes etc.) and the infrastructure projects (to ensure the right middleware versions get deployed on time), while sites do not need to participate explicitly (they should report deployment issues).

Tiziana announces that soon there will be a full compliance matrix from the developers and that services will be tested for SHA-2 compliance. SAM tests may also help, in particular for VO-specific services. At any time we should precisely know where we are with respect to compliance. We should put together all the relevant bodies and authorities to avoid duplication of work.

Maarten says that OSG will take care of this on their own. Tiziana mentions that Rob Quick said that SHA-2 is not an immediate concern right now and they have a test environment they can use.

Alessandro Di G. says that the experiments should do their testing independently of the site being in OSG or EGI. Tiziana hopes that OSG people will be in the task force.

WMS decommissioning task force (Maria G.)

It should discuss how and when to decommission the gLite WMS service in the WLCG context. The first step is to exactly understand what the experiments need.

Ian says that in CMS the only real dependency is for the SAM tests and asks what do we really get from shutting down the service. In any case CMS will be naturally replacing the SAM WMS-based probes with probes using pilots, as they are more realistic (similar to real jobs).

Maarten says that in his opinion we should not give this too high a priority, just reducing the dependencies on the WMS (which, by the way, is not doing so bad nowadays). He thinks it will happen naturally.

Alessandra asks if LHCb does really need to submit via WMS. Stefan explains that they also may use direct CE submission, and they do it at some sites but he is slightly concerned about possible scalability issues if moving to direct submission for all sites. Anyway, as this is not urgent, a decision does not need to be taken very soon.

Andrea asks if the direct CREAM submission SAM tests can be used; Maarten explains that currently they cannot run anything on the WN.

Alessandro Di G. proposes to use an ATLAS pilot factory to submit the SAM tests also for CMS. To be understood if this would be realistic for CMS (possibly, given that submission is still via Condor).

Conclusions

It is decided that we will have the fortnightly meetings on the second and fourth Thursday of each month, starting from October 11th.

-- AndreaSciaba - 26-Sep-2012

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2012-09-27 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback