60-6-015 (CERN)



John Gordon (STFC-RAL)
WLCG Grid Deployment Board monthly meeting

11th February GDB – minutes/discussion notes
GDB agenda:
Since last meeting – the Chamonix LHC workshop. Conclusion is that we will be taking data in 2012. There was also an LHC OPN T2 workshop at CERN on 13th January which discussed proposals for Tier-1 to ALL Tier-2s connectivity.
In April the GDB will be on 6th to avoid the EGI-UF 11th-15th April.  Will cancel in July due to WLCG workshop.
The March GDB will be in Lyon. On the agenda page there is a registration link: Aiming for a 9am start – John will survey opinion during the week.
Also upcoming: LHCOPN meeting 10th-11th Feb in Lyon; dCache workshop 16th-17th March; ISGC Taipei 21st-25th March; EGI User Forum 11th-14th April; Spring HEPiX 2nd-6th May; EMI all hands meeting 31st May-2nd June; WLCG workshop 11th-13th July and EGI TF 19th-23rd September.
CREAM – repeated request to deploy CREAM in January. Correct availability calculation will be available by the end of March.
Currently gstat reports 167 sites supporting CREAM. Need to cross check with WLCG list of sites.
Claudio: There are sites with CREAM but support just one VO such as ALICE. It is not by default that they support all experiments.
JG: But at least the site has deployment experience. We do need to check the mapping to VOs.
ACCOUNTING: gLite-MON had a deadline of the end of 2010. For support. Sites should have migrated to gLite-MON. The end of February deadline is final at which point sites that have not migrated will drop from the monthly T2 accounting report.
For the March GDB – is there any use of an experiment operations session?
ARGUS (Christoph Witzig)
JG: Who uses the different APIs? C vs Java
CW: C used by glexec and some of the services now in pipeline including EPN and the prototype for ARC. Java is used by dCache and CREAM – to the best of my knowledge.
Deployment guidelines were covered because gLite 3.2 is available now while the EMI-1 release will be available from April.
JG: Will discuss this afternoon – when will EGI certify the EMI release.
ML:  The CREAM 1.7 may be a bit later than March – because of a request for 1.6.4.
Massimo Sgaravatto: It is still the current plan to release in March.
Markus S: Slide 11. The word “upgrade”. In the first EMI release we get 1.3. How do you upgrade?
As the 1.3 is not released yet I do not know.
MS: EMI put packages in EPEL so sites need to reinstall and that is not an upgrade – it is a reinstall. If there are no dependencies outside RH5 then fine – but those service are rare.
JG: ON the data management side, is it only the blacklisting that is used?
CW: Just blacklisting.
MJ: We have now the function to deploy ARGYS in Quattor.
JG: To summarise – sites using SCAS can still do ID change. The thing gained is blacklisting.
RW: At the moment CERN runs a production ARGUS server. When there is a security incident we put the DN in the blacklist. Currently there are no incidents. Few people using it right now but hope is that people will subscribe to this service. Sites or countries can run their own ARGUS servers and set their own priority.
JG: Is there any other software that uses that blacklist?
RW: We can work on the format if needed but not aware:
CW: The format is xml. Given the interval and place where get the policies from, and you can import more than one.  For site operators and operators of the global banning they can operate through the command line.
JG: There is not a standard for this like there is for email spam?
??: Blacklisting is xml – DN encoding is not normalised. Does ARGUS deal with this where the email attribute is used?
CW: There have in the past been issues with DN formats. If a certificate has a standardised DN then fine.
DK: Is this not standardised by IGTF? If follows the
??: Not the string encoder… for the email.
CW: To our knowledge ARGUS handles the DNs correctly. If not then a bug needs to be fixed or the DN is non-compliant.
ML: Internally ARGUS choses a format. Dealing with ARGUS servers then no problem. But if this is exported then it needs to be read – not fully resolved so email addresses should be avoided if possible. Most CAs have moved from them.
Robot certs from CERN CA contain email addresses.
ML: Have spoken to CERN IT. Hope to get rid of this by May.
DK: Open grid forum suggest not using email addresses.
JG: Is this monitored?
DK: Not actively, except for a new CA. If there are problems we should follow up.
MUPJ – gLexec update (Maarten Litmaath)
The situation on Sunday 6th is shown in slide 2.  Not so bad. All the sites that should have this running have had it running (at some point). There has been some “decay” and problems need to be fixed.
SL: Did you inform ASGC of the observed problem?
ML: Not yet. Will follow up closely – by end of week test failures will lead to a GGUS ticket from me.
JG: How many found by glexec tests?
ML: All except PIC.
JG: So if sites looking at Nagios tests they would have picked up on these problems?
ML: Yes. Difficult to make critical since only at T1s right now. Some T2s have deployed. This is a thing for WLCG. Can not yet require that the ops glexec tests are expected to pass for the whole grid.
The host for the Nagios ops tests has changed.
IB: In the discussion yesterday – to push now we would ask T1 and T0 to have this in place by the end of March.
ML: All the problems mentioned could be fixed by next week.
IB: For the T2s the target date is the end of June.
ML: SAM targets all CEs. Not all CEs will support this but we hope the majority will. So the Nagis pages will show the status for ops and others that have it configured (e.g. LHCb). Sites will be able to see how they are doing.
IB: For T2s can we make the list per country.?
JG: What is involved in getting the SAM test deployed for the T2s?
ML: It is already there. Any CE that publishes..
ML: Any country could include this…
… it is in the probe that goes to nagios for all the NGIs it is just not configured. Then we could put this into the ROC critical path to automate ticket creation.
ML: Yes but only
IB: Will EGI support us in getting this deployed?
TF: Yes we will do this.
ML: May need to speak with SAM developers about general issue where some sites should satisfy more probes than others.
Davide: Based on publishing the capability they can be tested. If you publish in GOCDB then they will be included.
CG: Can we have a recipe for installing and testing
ML: You mean… for CMS you should imitate what LHCb have done and copy the glexec test … the variable issue needs to be fixed, perhaps in the next release. For now one would need to be intelligent on finding glexec and treat the 3 use cases in turn. Will converge on a single env variable. Need to discuss with OSG as they want variable prefixed OSG_
CG: There will be several CMS people at CERN next week and that would be a good opportunity to discuss the way forward.
JG: Did you say this capability is published in GOCDB?
Davide: I don’t think it is currently but it could be.
Grame: We have talked about glexec for a while now. It has been a long and painful process. There to provide traceability and some insulation of proxies from one another. After 1 year of running we have shown that our framework allows user tracing – SC4. Stealing proxy is not a … is it time to rethink this… perhaps linking pilot job frameworks directly into ARGUS and removing those who do not have X.509. Sites are going to be busy. We should rexamine what the frameworks are and how they work as they are brining deployment pain.
IB: Agree that we should look again but not that we stop what we are doing now – otherwise progress is non-existant.
RW: You had one user for pilot – if you had multiple users running payloads you would not be able to trace it.
GS: We were able to provide the payload code.
IB: If the frameworks could provide the traceability then that may satisfy the requirement on traceability. It has to be reliable. We went through arguments before and issue of just logging was not acceptable to sites.
JG: Just to remind you – ML did a survey of sites and the majority said they wanted ID changes.
MS: I’m surprised at the slow progress. Only seems impossible in Europe as in USA it is already deployed. Need to figure out why this is.
JT: Graeme  says that we do not have it working but not entirely fair since nobody looking at it – set it up but not monitored since not used. Second, if we move away from ID switching you accept the fact that the VO is one person – so if there is a problem you ban the whole VO. So if you reopen the discussion don’t forget that.
GS: We turn off the user ourselves.
RW: If you can identify the user. Surprised you would consider this option to turn off the whole VO.
DK: It may be that other users are compromised so unless you know what happened the whole VO has to remain off.
ML: Support what was said by RW and DK. If you run 8 or 10 jobs at the same time you’ll have a hard time identifying who was the user. You are glossing over the difficulty. Not saying this is a perfect system – has bad aspects that make it difficult. Nobody has come up with something better and with the use of VMs for jobs may be able to relax requirements. Till then better to keep working on it so that if it is needed urgently we can do it quickly – like backups, they are never used but the investment needs to be there.
MS: Still have the wrong discussion. Glexec systems work in the US for many years now. There is experience in operations and we should have a deployment discussion using their experience. The don’t run with SCAS/ARGUS but glexec on WN is the same component and we seem to have a problem (and prioritisation) problem.
JG: The reluctance seems to be rolling out anything new.
MJ: As a site, we have a will to deploy it… but we a clear route not saying SCAS then ARGUS… when we looked at the tests of sites that deployed it there appeared some configuration complexity. We are at a point where we can do it and the message should demonstrate that the service is ready for prime time deployment.
RW: The point was that it was demonstrated in the US.  In the US they have stronger traceability requirements so they deployed it.
MJ: We need to make the sites confident that it is not a difficult service to run.  In the uS they do not use ARGUS or SCAS.
MS: In the extreme we could switch to GUMS!
JG: Will return to this in the March meeting .
Short term improvements to the information system: a status report (Flavia Donno)
Slides were presented at the MB.
Proposal to deploy a well manage set of top-level BDIIs.
Consolidating IS attributes – usage document in draft. Framework for IS data quality meter in place and tickets issues accordingly (fixes deal mainly with format).
Summary: Cleaning-up/consolidating IS used attributes: experiments, middleware, monitoring and accounting
–            Document in preparation. –            Framework in place to measure data quality.
Deployment of “static”/semi-dynamic top level BDIIs. –            Code available in 1 week for test. –            VOs acceptance tests possible in 2 weeks, if plan accepted.
Deployment of a well managed set of top level BDIIs.
–            Good response from T1s.
–            Need to review requirements based on deployment of [semi-]static top level BDII.
–            Need to refine deployment plan and failover strategy.
Luca: Strategy for this … are you proposing we go for a branch of the software regardless of whether EGI fits with this plan? I hope we are able to convince the whole community to adopt this approach.  For Italy, the top-level BDII is not managed by our Tier-1 but by the country.
IB: What is the difference in the code?
FD: To allow for the local cache to survive for longer (than 10 minutes).  If fresh information comes then it will be published as fresh information.
IB: It is a change but not a code branch?
Lawrence: It is introducing a configuration parameter – we have to set a policy on this parameter. The delete feature only affects when entries are removed. If a site is not contactable for a certain period of time then entries can be removed. We can configure the delay to smooth out dynamic changes where things are added and removed quickly which gives an impression of instability.
FD: Will go into WLCG deployment but not EGI to start with. If you deploy with this delay, it is a different behavior of the service to now and we do not know the impact on other VOs. At this point it is a WLCG decision to go with this deployment.
IB: Went this route because experiments see problems. If we do nothing they’ll stop using it altogether. We may have to have a WLCG top-level BDII instance which is separate from EGI but it is not a code branch.
JT: Side comment – the NL Tier-1 … well WLCG was happy with what we offered. The proposal for WLCG specific BDII – semi-static seems to imply dynamic information not important to WLCG so useful to have a statement on that.  Also, how will WLCG jobs find the right top-level BDII!?
IB: The document being prepared by Flavia covers the information of interest issue.
CG: There are still experiments using the WMS for which the dynamic information is useful.
IB: Then it would be useful to give Flavia the information for the document.
JG: Need to guard against a new problem where a service requires the dynamic information.
FD: Did receive information about how CRAB uses the WMS. Status=unknown  response works for that.
JG: Plan needs to address the issues of which top-level BDIIs sites point at.
JT: The problem is that the sites do not point at a BDII it is the job. It is not a site config issue.
FD: At the moment it is the VO who points to a BDII via an env variable.
ML: Site provides a value but the VO can override it. Only required during tests – hope is that after that everybody can use it.
JG: Thought plan had limited number of  better supported BDIIs.
FD: Can use round-robin approach.
ML: For WLCG would like a reliable information system. Sort of orthogonal to having in that information system more static information. To test that new config parameter has undesirable effect will need to deploy in parallel.
JG: We need a plan!
JT: Please circulate it to sites as well as experiments and developers.
FD: Of course.
Roberto: Talking about a major issue. Why can this not be part of the normal EMI process? Whether activated or not is a deployment issue.
LF: This code will be in EMI. First is a code/software issue. EGI and WLCG need to worry about the deployment issue.
Installed Capacity (John Gordon)
Superficially results look okay but we need to get in and validate the figures: looking at
Luca: There is an error for CNAF. Too much shown.
JT: Are the shares shown?
JG: No. That is
Gstat only shows today’s information? Looks like the information is now giving a history so the T0 figure shown is out-of-date on the slide.
Tier-2s also publishing but not  correct for the shares.
IB: Lawrence was working on a UI for people to upload pledges directly. You could also upload the installed capacities.
JG: Then end up publishing pledge and installed as same number.
IB: May be okay – it is to show the funding agencies. We can’t poll every site.
JG: The advantage is that if you install new resources then the shares get automatically reflected.
Action: Can sites please publish their shares.
If sites have free for all then fair enough. If you purchase specifically for one VO then that ought to be reflected in the publishing.
MJ: The share is a CE attribute. If a VO has access to several queues it is alright to publish the same share for each queue?
SB: The reason it is a CE attribute is because it would not fit elsewhere. In YAIM it is only used once.
JG: If all CEs publish the same installed capacity it does not multiple count?
PG: If you do not setup sub-clusters then it can ead to a problem.
Mauro Morandin – if figures important and shown at high level then important that these figures are correct. Long standing issue at INFN-T2. The wrong conclusions may be drawn. Would it be useful if once per year there is a reality check?
IB: In the RRB we have never yet reported on T2 installed capacity. Email requests to so many sites will never work so we need an automatic route and this is the way we came up with. Open to better ways of doing this.
JG: Useful to compare pledges with accounting and then the installed capacity and accounting…. Can then look automatically for discrepancies.
Hope is to get the automatic collection working. Validating a snapshot regurlaly would be useful.
PG: In the UK we keep reports that are quarterly and compare with gstat.
IB:  Slide 7. The last bullet – “no upgrade path between major releases”. Unless the services can be deployed in parallel it is likely this will not happen for 2 years.
Francesco: Aimed at changes and they need to be introduced at some point – the repositories EPEL instead of DAG. Packaging policies as standard.
JG: Historically some services have been deployed in parallel. I’m more concerned  about the statement for ALL major releases.
Francesco: EMI-2 there should be an upgrade path from EMI-1 of the same service. Introducing backward compatible releases is important.
PG: Will there still be a WN tarball installation – it has been useful for shared clusters where can not install in /usr.
F: We have not discussed that deeply.  We can take into account this request.
MS: Related to use of EPEL as a repository. The idea is that the software goes into EPEL or we have a parallel repository that looks like EPEL.
F: If packages in EPEL then we will use them. Some things from EMI are already going into EPEL such as VOMS and LFC. Of course if there are external dependencies we’ll need to distribute those. Once we have everything in EPEL then we’ll just populate EPEL.
MS: If EPEL has a new version then sites upgrading one component will update everything. Sites will not want to do that.
F: Would like a specific example
ML: On production system we will have to tell sites not to use EPEL. Particular dependencies will have to be validated by EGI first so there will be a new repository.
F: gLite was not certified for EPEL.
ML: Preview of components that get into the production repositories… will have to tell WLCG sites not to use the repo
Alberto: EMI will package
ML: Who will do the integration tests?
Alberto: EMI has the responsibility. You have internal certification.
MS: Who puts it into EPEL? EGI? It is your goal and it will take effort. Up until now I heard that EMI will put things into the repository.
IB: EGI does not gurantee to take the whole of the EMI release.
Alberto: We are establisgng the process to avoid the problems that we have previously seen. We need specific examples.
IB: I do not understand the process – development – release and integration.
Simon Lin (SL):  For Morris, the support for webdev is only for dCache?
M: Yes that is the case for EMI-1.
EGI middleware support (Tiziana Ferrari)
Refers to the UMD- Unified Middleware Distribution.
JT: Appreciate analogy of a kitchen. Idea that can replace one item with another. But our kitchen is specific – for example the dishwasher only works with 3-pronged forks and anything else breaks it. Who overseas this sort of issue? Often the specification misses details like this …
TF: As explained by the EMI people. They are responsible to release certified software. We rely on their notes to highlight dependencies.
Mario: In the old process, where you can get an issue is when you do the staged rollout. Here you have all the other components in production. To catch an integration issue is in the staged rollout.
JG: That does not meet the scenario where EMI has two components to deploy and they both test fine but they do not work if deployed together.
Mario: May happen if two components in staged rollout at the same time.  It is a matter of probability as to whether things are caught – encourage more sites to be in staged rollout.
Martin Gasthuber (MG): Touched on later in the presentation.
JG: What do you mean by UMB update frequency?
TF: Relates to the release updates. The UMD has to have the same update frequency.
JG: Perhaps I missed that in the EMI presentation. There is no EMI-1.1?
MS: Do you have any tool to get more early adopters involved?
TF: Need to increase the number of sites.
MS: If there is a carrot then you may get more sites to support staged rollout.
Mario D: Some components have many adopters but many do not. Perhaps need 2 per component as a minimum.  It also depends what the sites have. Popular things are APEL, CREAM and DPM. But for FTS only 1 for VOMS 2 – but fewer sites actually deploy them. So sites will only adopt things that are of interest/importance to them.  It is not too bad for gLite 3.2. Also some components are easier to adopt/test than others.
MJ: At GRIF we have been an early adopter for 3 years. We have found it difficult to participate in every stage for many components. We can not commit to it all the time as we can not dedicate people to it. There need to be 2 or 3 adopters.
MD: Last week had email about CREAM and urgency for LHC experiments and after request had 4 new sites testing.
JG: Perhaps WLCG can encourage sites for components that it feels are important.
IB: Do you repackage what EMI supply?
TF: The component after staged rollout goes into the EGI repository.
IB: So there could be a WLCG repository for components in the EGI repository?
TF: Yes.
IB: EMI is almost removing globus? There are no globus components if you remove gsi?
Alberto: There are still dendencies – myproxy, ….
IB: Is the IGE globus distribution different from the, say, US one? I’m worried about compatibility across the Atlantic.
JG: Traditionally VDT added components.
Alberto: Yes and IGE do the same. Currently talking with them.
JG: TF you mention 3 months to certify. Looking back at EDG etc., certification became a blockage.
EMI-1 release candidate will be available at the end of February. We’ll use a stable version of that
Stephen: minimum time between a developer commiting a patch and getting it to production.
TF: Depends on the criticality of fix. If urgent then could be a couple of days if already certified by EMI.
Christina: Bug fixes that are revisions,  …
TF: Need to have a better plan here. Depend on EMI release frequency as to what EGI need to commit.
Alberto: As far as EMI are concerned. We are putting in a SLA – depending on the criticality of the bugs then we will make a commitment to make fixes (per product team) in a certain time.
MD: Happy to take the SLA discussion offline. But the sites hang at the end of the process so we do need to collaborate.
Alberto: There was a remark about it not being clear who is doing the certifying.
MS: Does UMD provide a repository for the material?
TF: Yes. Operated by EGI.
The trust anchor is the packaging of CA certificates.
IB: Clearer but concerned about the length of the process given that the certification steps are repeated and I’m not convinced that the integration will be done. So my concern is that the process
Alberto: We need to monitor this and understand if it does take more or less time. The current EMI-EGI process is a consequence of the changes introduced at the end of EGEE-3. The process is being made leaner. We are not duplicating anything – it should be faster and more reliable.
Mario: LFC has been in staged rollout since November.
MS: Wishful thinking – we rely on volunteer effort to get things out at the last step. If they do not have time they just do not do it.  Before this last step then you have proesssionals.
TF: We have partners in EGI that are funded to do some staged rollout.  The NGIs that get effort for this task.
PG: It depends on the services. At Oxford we are an early adopter of CREAM and we have many instances, but we would not do this for say DPM.
JG: But there is also an issue about exposing users of the new release.
MJ: So there is also a question about what we mean by staged rollout.
WLCG Middleware Support (Markus)
TF: Comment on ARC. Will undergo staged rollout using 4 sites.
IB: Why do you imply that the site would have to upgrade everything!?
MS: Because the support for gLite 3.2 ends.
Alberto: The goal is to have full support for SL6 and Debian by the time EMI-2 is out. The aim is to have a snapshot available in April.
Emi-1 on SL5 is not very attractive for gLite sites
–If you are already on SL5 ( as most are)
–Given the relative small functional differences
•What can be done:
•1) extend gLite-3.2 support beyond emi-2
–Maybe with WN on SL6???
–This will reduce the pressure at the end of emi-1
•2) release parts of emi-1 on SL6 early
–Clients, Re-invention of the UI/WN???
–To make it more attractive and smooth out the transition.
•3) Force sites to install emi-1
–Not likely to work
•For 1 and 2 extra effort is required…..
IB: Is the lesson from the last 6 years – option 3 is unlikely to work.  Sites only update/upgrade if there is a problem or there is a new platform. So should we not base the strategy around that?
MS: Francesco mentioned that they may anyway port some components to SL6 early anyway – and this is attractive if there is some gain with SL6.
ML: It used to be that the push for the new OS was contrained by the LHC experiments. That may delay things.
IB: This is the message – to get things deployed requires a driver like the OS.
JG: Three players (EGI/EMI – sites – Vos/experiments) – need two of them to support it before making a push.
IB: EGI-1 is before the conditions being right for SL6. This means the middleware providers need to take this into account.
MJ: None of the approaches are viable for me. Supporting wider than WLCG the NGI can not stay with gLite 3.2. SL6 being the carrot is not feasible because the experiments will take time. Difficulty is not backward incompatibility but a change from DAG to EPEL.  Is there an alternative where we could have a transition repository? Issue is with dependencies.
MS: For some components even the name changes.
ML: You can’t just junk the old rpms because there will be things left behind. That’s why you want to reinstall.
MJ: WNs are not a problem. More an issue with CE or SE where there is a database.
ML: Only the storage elements have a need to keep state information. We should not try to rescue gLite 3.2 indefinitely. We should cooperate and put across our concerns for the different node types. We will have to see if some services are on EMI-1….
JG: We do not want gLite 3.2 and EMI-1 to be incompatible.
Alberto: Generally new functionality will not be back ported to gLite 3.2. There is a policy that new functionality will only be added in EMI-1. If there is a requirement then it must be communicated on a formal level (via the WLCG MB).
TF: EGI would like to put a framework around requirements gathering and approach EMI with priorities.
Alberto: New platforms. We will support them as the become available. From gLite 3.2 to EMI-1 is not recommended as an upgrade for the reasons Maarten mentioned. But we’ll need to say service by service.
Francesco: There is an EMI document on change management.
On open questions – EGI/NGI vs WLCG needs.
TF: Priorities – the users are the driver.
IB: Not sure you can have a general policy on this
JG: Timelines – answered
There will be an EGI repository.
Home for WLCG middleware not part of UMD – EGI will volunteer to host this in the repository.
Middleware update (Maria Alandes Pradillo)
Webpages update:
For staged rollout:
JG: General message, CREAM should be there for all your VOs. ATLAS will use it. LHCb have been testing direct submission but use WMS anyway. ALICE use CREAM. Anyway the general advice is to move to CREAM CEs in production
Oliver Keeble (OK): ON SL4 found reappearance of VOMS memory leak. Need to decide if it is worth going around the loop again.
ML: Even if not fixed, it still does give us features we need.
PG: There is a CONDOR site in the UK.
Massimo: The batch systems are all being done on a best effort and none are moving things into EMI.
JG: I thought the workflow had been moved into the EMI product teams.
Alberto: I contacted the partners for batch system support and the replies have been unsatisfactory. Effort is there in EMI but it is difficult to convince the partners of the priority.
IB: Response from NIKHEF was that this is best effort but they need it so should be good.
DG: We committed to security support but have never signed up for anything to do with batch systems.
IB: There is a best effort commitment from Jeff for Torque.
JT: The issue is that things are changing towards EMI 1.
IB: The thing Maria is pointing out here for SL5 (new Torque 2.3.13) what is the status?
JT: I’ll find out what is meant by “missing configuration changes in YAIM” etc.
OperThe SLC 5.6 upgrade issue (Helge Meinhard)
IB: Can you also add the certification mailing list to the improved communication route.
GS: I like the idea of keeping the process lightweight. It may be useful to have a small amount of lxbatch (as well as lxplus) available for testing.
HM: Agreed.
Meeting closed at 16:55
EVO chat:
[09:05:53] Massimo Sgaravatto yes
[09:06:00] Gonzalo Merino yes
[09:15:50] IN2P3-LAL4 joined
[09:15:51] Matt Hodges joined
[09:15:51] Paul Millar joined
[09:15:51] Patrick Fuhrmann joined
[09:15:52] CERN cern-60-6 joined
[09:15:52] Jeff Templon joined
[09:15:52] Ron Trompert joined
[09:15:54] Gonzalo Merino joined
[09:15:56] Claudio Grandi joined
[09:15:56] Ulrich Schwickerath joined
[09:15:57] Massimo Sgaravatto joined
[09:15:57] Michel Jouvin joined
[09:15:57] Oxana Smirnova joined
[09:19:34] Yannick Patois joined
[09:20:31] Yannick Patois left
[09:24:52] Tiziana Ferrari joined
[09:24:57] Patrick Fuhrmann Could you please move the speaker camera up a bit
[09:26:38] Jeremy Coles Hi Patrick - I asked John about this just now. He was looking for the remote control. It may not be fixed till later!
[09:26:41] Andrew Elwell joined
[09:27:07] Mario David joined
[09:27:43] John Gordon joined
[09:29:51] Richard Gokieli joined
[09:30:26] Stephen Burke joined
[09:32:13] Andrew Elwell left
[09:32:39] Patrick Fuhrmann thanks
[09:33:15] Andrew Elwell joined
[09:40:53] Josep Flix joined
[09:49:18] Stephen Burke so talk sitting down?!
[09:51:01] Jakub Moscicki joined
[09:51:14] Andrew Elwell Stephen - you and your radical ideas...
[09:51:58] Stephen Burke A headless speaker looks fairly radical too 
[09:52:12] Patrick Fuhrmann right
[09:54:53] Richard Hellier joined
[09:54:53] Richard Hellier left
[10:00:25] Stephen Burke The UK CA still has email addresses in host certs
[10:00:34] Oxana Smirnova TERENA too
[10:00:49] Oxana Smirnova even in personal ones
[10:03:33] Stephen Burke it seems a bit ironic that it doesn't work at NIKHEF ...
[10:06:34] Oxana Smirnova seems to be easy to break, too
[10:09:02] John Gordon ARe host certificates relevant here?
[10:10:22] Alvaro Fernandez joined
[10:10:23] Stephen Burke Relevant for what? glexec should be authorising on a user/robot cert
[10:11:13] John Gordon That is what I meant. UK email in host certs not relevant for glexec.
[10:12:24] Andrew Sansum joined
[10:13:39] Derek Ross joined
[10:23:50] Phone Bridge joined
[10:27:27] Alberto Di Meglio joined
[10:30:28] Andrew Sansum left
[10:33:08] Andrew Sansum joined
[10:38:33] Yannick Patois joined
[10:40:34] Jeff Templon Romain is just saying that in the US there is a much stronger requirement from management
[10:41:37] Yannick Patois left
[10:41:49] Romain Wartel joined
[10:42:13] Alberto Di Meglio left
[10:42:21] Romain Wartel left
[10:46:49] Alberto Di Meglio joined
[11:03:58] Phone Bridge left
[11:05:03] Phone Bridge joined
[11:05:28] Stephen Burke It just means that it isn't useful to have very dynamic information kept in the cache for a long time
[11:06:23] Stephen Burke the dynamic info will still update normally as long as it's there
[11:08:46] Mario David at least for some of the cms workflows, there is a thing calledd siteDB, where you hardwire what CEs should be used at the sites, no info system needed there
[11:09:45] Mario David also atlas , has aDB where the ce queues are hardwired, no info sys needed there also
[11:10:34] John White joined
[11:16:12] John White left
[11:17:11] Phone Bridge left
[11:17:12] John White joined
[11:21:15] Christoph Grab joined
[11:26:52] John White left
[11:31:57] Stephen Burke The design is OK - as long as what the sites configure is correct!
[11:34:19] Richard Hellier left
[11:34:20] Jeff Templon left
[11:34:29] IN2P3-LAL4 left
[11:34:32] Christoph Grab left
[11:34:34] Josep Flix left
[11:34:42] CERN cern-60-6 start again at 1400 CET
[11:35:25] Claudio Grandi left
[11:37:24] Gonzalo Merino left
[11:39:38] Alberto Di Meglio left
[09:45:03] John Gordon The cameras at CERN are preprogrammed to point at seated speakers when their mikes are on, as you have probably noticed when Maarten and I spoke.
[09:45:26] John Gordon I expect they can be retrained but I don't know how to.
[12:57:33] Jeff Templon joined
[12:57:50] Jamie Shiers joined
[12:57:52] Richard Hellier joined
[12:58:01] IN2P3-LAL4 joined
[12:59:25] Alvaro Fernandez left
[13:00:06] Patrick Fuhrmann left
[13:01:49] Patrick Fuhrmann joined
[13:02:19] Claudio Grandi joined
[13:04:44] Mario David left
[13:05:01] Mario David joined
[13:08:58] Matt Hodges left
[13:09:04] Matt Hodges joined
[13:24:02] Romain Wartel joined
[13:24:23] Andrea Ceccanti joined
[13:25:55] Matt Hodges left
[13:27:23] Richard Hellier left
[13:29:58] Jason Lander joined
[13:30:47] Alvaro Fernandez joined
[13:30:50] Alvaro Fernandez left
[13:33:32] bob jones joined
[13:33:35] Massimo Sgaravatto protect the repo
[13:33:50] Alvaro Fernandez joined
[13:34:20] Steve Traylen joined
[13:34:28] Mario David in EPEL there should not have a newer version then distributed by emi
[13:34:40] Michel Drescher joined
[13:35:00] Massimo Sgaravatto even if there is, isn't it enough to protect the emi repo ?
[13:36:28] Mario David but it's the developers which put things into epel!!!
[13:36:44] Mario David so they can control what goes there, isn't it?
[13:36:53] Steve Traylen You just have to be clearer about which is the release.
[13:36:58] Stephen Burke but do the developers know what the consequences will be?
[13:37:47] Jeff Templon unless everyone moves to quattor 
[13:38:19] Mario David at least in the end we will have to protect emi or egi repositories
[13:38:31] Jeff Templon don't forget the mike markus
[13:38:45] Mario David marcus, turn on the mic
[13:41:14] Stephen Burke It seems everyone is happy with GLUE 2 
[13:44:00] Mario David I hope so, but more important , if all start using it properly, and open ggus tickets if something is wrong at any site
[13:45:19] Stephen Burke So what happens if an EMI major release is rejected?!
[13:49:47] Martin Gasthuber joined
[13:50:29] Pete Gronbech joined
[13:51:53] Pete Gronbech left
[13:52:08] Lorenzo Dini joined
[13:53:53] Alessandra Forti joined
[14:01:44] Martin Gasthuber left
[14:04:11] Lorenzo Dini left
[14:04:36] Richard Hellier joined
[14:05:06] Lorenzo Dini joined
[14:05:25] Oxana Smirnova Will security fixes also delayed by 3 months in EGI validation?
[14:07:55] Stephen Burke name the releases after the EA sites 
[14:08:44] Steve Traylen Don't give word documents to be filled in to people do EA relases.
[14:13:40] Romain Wartel left
[14:16:44] Steve Traylen IGE have added patches from VDT to their distribution
[14:16:46] Steve Traylen allready.
[14:16:57] Steve Traylen Hey over here.
[14:18:18] Jeremy Coles Hi Steve - do you have a mic?
[14:19:13] Stephen Burke So what will be the minimum time between a developer committing a fix and it getting to production?
[14:19:17] Jeremy Coles If not I can ask John to refer to the chat window.
[14:21:20] Mario David Oxana: for a security fix, it's possible that it will not go through the whole process (case by case basis) so that it can be released very fast
[14:21:55] Mario David Steve T, there is also an odt template 
[14:22:08] Mario David in the doc server same ID
[14:26:12] Paul Millar left
[14:26:23] Paul Millar joined
[14:27:05] Mario David sure
[14:28:00] owen synge joined
[14:29:46] Alessandra Forti new cream CE doesn't have changes of big impact
[14:30:05] Alessandra Forti dpm 1.8.0 is more of a problem to try it out in production
[14:33:32] Mario David Alessandra is correct, it depends on the service,
[14:33:47] Stephen Burke But in the end *someone* has to be the first!
[14:34:09] Mario David that is exactly the point.
[14:34:11] Stephen Burke (Like taking a new drug for the first trial ...)
[14:34:26] Mario David and the first ones are the ones most interested in any given service
[14:34:30] Alessandra Forti who wants to die first? 
[14:35:05] Mario David what about updated lhcb wns to the latest minor version sl5.6???
[14:35:20] Mario David it's exactly the same thing!!
[14:35:27] Stephen Burke That's a later talk 
[14:35:48] Mario David sure, but it can happen (I mean...)
[14:36:11] Mario David that's why there are a whole bunch of EA with very goos experience
[14:36:20] Mario David that can do rollback if need be
[14:36:42] Stephen Burke But e.g. a DPM with a schema upgrade may be very hard to roll back ...
[14:37:23] Mario David I jknow, that one is the most dificult one, but the basic schema and functionality has already been tested (certified/verified)
[14:37:37] Martin Gasthuber joined
[14:38:27] Mario David together with the lfc,
[14:38:57] owen synge Fortunately every one here has been around long enough to know we have discussed this before, and changed the process more than once. The simple answer is no solution is perfect and the effort comes from sites who have priorities different from the GDB, mayeb the issue is the speed of upgrades not the process
[14:39:20] Mario David but there are ways of doing it, like backing up the OS image, at least you want loose everything if something goes really bad
[14:39:56] Christian Bernardt joined
[14:44:19] Josep Flix joined
[14:44:30] Stephen Burke glite releases seem to work at least as well since EGEE ended as they did before 
[14:45:09] owen synge AS far as I knwo these are EGEE release's with the legacy release process
[14:45:31] owen synge As far as I knwo thier have been no releases as yet through teh EMI 1 process
[14:45:49] owen synge and as far as I know EMI one is not for release
[14:45:54] Stephen Burke yes, I'm just saying that you don't seem to need any of the project management, just the people doing the work!
[14:45:55] owen synge sorry EMI 0
[14:47:50] owen synge EMI 0 is not for release, and EMI 1 is not yet released
[14:58:33] Stephen Burke Experience suggests that some sites will still be on glite 3.2 in 2013 regardless of what WLCG says ...
[14:59:46] Mario David isn't there still tags for 3.0.0 in the info system, and the classicSE??
[15:00:00] owen synge A long standing question, is why are developers working on anything other than the next OS we will need? Why are developers not working on SL6 even if its is alpha ?
[15:02:07] Stephen Burke Because then their software won't work on sl5 ...
[15:03:18] Michel Jouvin May I make a comment...???
[15:04:39] Mario David left
[15:04:46] Mario David joined
[15:08:06] Mario David left
[15:08:15] Mario David joined
[15:08:58] Mario David agree with Maarten,
[15:09:10] Alessandra Forti I agree too
[15:09:44] Stephen Burke Installing a CREAM CE is also easy - but how long has it taken to get them deployed at all sites?!
[15:09:55] Alessandra Forti again?
[15:10:27] Alessandra Forti the cream CE is not yet part of the availability
[15:10:43] Alessandra Forti I'd have moved december last year if I could have
[15:10:54] Alessandra Forti moved completely
[15:11:11] Stephen Burke I don't mean completely, just having CREAM in parallel
[15:11:24] Stephen Burke I think the GDB originally asked sites to do that about 2 years ago
[15:11:25] Alessandra Forti nobody ever gave a deadline
[15:11:54] Alessandra Forti and experiments didn't ant to move for a long time
[15:12:09] Stephen Burke and that will all be different for EMI 1?
[15:12:24] Alessandra Forti probably not
[15:12:46] Alessandra Forti how do yuo say in the UK? lessons have been learned?
[15:12:50] Jason Lander left
[15:12:51] Alessandra Forti hopefully
[15:15:55] Michel Drescher left
[15:17:24] Mario David alwyas mic please
[15:21:52] Andrea Ceccanti left
[15:27:57] Paul Millar left
[15:30:13] Massimo Sgaravatto If I will be authorized 
[15:32:15] Mario David I have advised EAs to enable specially wlcg VO in their creams
[15:35:05] Mario David but no one wants it??
[15:35:33] Richard Gokieli left
[15:35:34] Jason Lander joined
[15:35:57] Richard Gokieli joined
[15:36:06] Mario David Marteen : if this goes to production THERE will be one first to move?
[15:36:24] Mario David I suppose, if not, then nobody is interested
[15:45:54] Steve Traylen The config changes are path changes.
[15:47:26] Lorenzo Dini left
[15:55:16] Ulrich Schwickerath There are actually quite a lot of nodes in lxbatch running preprod version.
[15:55:36] Ron Trompert left
[15:55:46] Richard Hellier left
[15:55:51] Alessandra Forti bye
[15:55:53] Jeff Templon left
[15:55:55] Alessandra Forti left
[15:55:55] Claudio Grandi left
[15:55:58] Josep Flix left
[15:55:59] Jason Lander left
[15:56:00] Massimo Sgaravatto left
[15:56:02] Mario David left
[15:56:38] Alvaro Fernandez left
[15:57:23] Derek Ross left
[15:58:03] Steve Traylen left
[15:58:25] Richard Gokieli left
[16:01:13] IN2P3-LAL4 left
[16:01:19] Michel Jouvin left
[16:01:45] Ulrich Schwickerath left
[16:04:13] CERN cern-60-6 left
[16:06:58] Oxana Smirnova left
There are minutes attached to this event. Show them.
    • 10:00 12:10
      Convener: Dr John Gordon (STFC-RAL)
      • 10:00
        Introduction 30m
        Speaker: Dr John Gordon (STFC-RAL)
      • 10:30
        Argus 30m
        Speaker: Christoph Witzig (Eidgenossische Technische Hochschule Zurich/ETH (ETH))
      • 11:00
        MUPJ 20m
        Speaker: Maarten Litmaath (CERN)
      • 11:20
        The WLCG Information Service 30m
        Speaker: Dr Flavia Donno (CERN)
      • 11:50
        Installed Capacity Reports 20m
    • 12:10 14:00
      Lunch 1h 50m
    • 14:00 17:00
      Convener: Dr John Gordon (STFC-RAL)
      • 14:00
        The European Middleware Initiative 30m
        Speakers: Morris Riedel (FZ Juelich), francesco giacomini
      • 14:30
        EGI Middleware Support 30m
        Speaker: Dr Tiziana Ferrari (EGI)
        slides ppt
      • 15:00
        WLCG Middleware Support 30m
      • 15:30
        Middleware Update 20m
        Speaker: Maria Alandes Pradillo (Unknown)
      • 15:50
        The SLC 5.6 upgrade issue 15m
        Speaker: Dr Helge Meinhard (CERN-IT)