TMB - 2009-08-05
Present: Steven Newhouse, Antonio Retico, Oliver Keeble,
Andrea Sciaba', Francesco Giacomini, Frank Harris, Massimo Lamanna
Phone: Vangelis Floros, Andrei Tsaregorodtsev, Guenter Grein,
Dennis Van Dok
Minutes of last meeting
=======================
No comments. Approved.
Task Review
===========
Task #9916 (Problem with auto-publishing of space reservation into Info System)
Oliver: a solution is available, it will be in a next DPM patch
Task #9915 (More details needed on firewall configuration issue)
The last comment mentions a document with a list of open
ports. Vangelis will pass this reference to Gergely and see if it
satisfies his needs.
Task #8953 (Clarify the restriction mechanism for multiple service
instances to individual VOs)
Need a clarification from someone in the AuthZ Service
group. Francesco will contact Christoph and/or Chad.
Task #8326 (Error codes for the command line interfaces)
Task #7932 (Lack of APIs for various middleware services/components)
Task #6711 (Discuss standards for error messages)
Task #6712 (Handling of bugs regarding error messages)
Francesco will provide a statement on all these tasks, based on the
work planned in the second year of EGEE-III.
Task #7938 (Portal access to the infrastructure)
Task #6901 (Storage semantics issues for EGEE (beyond HEP))
Vangelis has posted a comment with use cases.
Task #6652 (Check if the WNRWG will make recommendations about how to
characterise a subcluster)
An info provide is still needed.
Steven: can Laurence do it?
Oliver: not necessarily, info providers are maintained by several
people; but in general there is no documentation on what is mandatory
and should be enforced.
Task #6649 (Local scratch space and shared storage in Glue)
Oliver will ask Laurence to have a look at it.
Task #5952 (Python bindings of the LCG UI)
Oliver to check the status.
MPI - the next phase
====================
Oliver: a patch (https://savannah.cern.ch/patch/?3092) has been opened
where all the outstanding MPI-related bugs will be attached. The
bugs mainly concern configuration issues. It is estimated that it
will take ~2 months to fix, plus certification; a partner for the
certification has already been identified.
The MPI libraries and utilities will be rebuild and made available.
Dennis: it's also important to be able to pass additional parameters
down to the batch system and the ability to specify additional
attributes in the JDL
Frank mentions the documents circulated by Vangelis: the comments
from NA4 partners on the proposal by the MPI WG and the experience on
using MPI on the current EGEE infrastructure.
Vangelis: what are the priorities?
Steven: first the fixes mentioned by Oliver, then prototyping the
proposed changes to the JDL
Francesco: what about publishing the interconnect type of a site? who
does that and when?
Dennis will check if this specification already exists
Francesco, Oliver, Franck, Vangelis and other interested people from
NA4 should be included in the MPI mailing list
Steven: what about the SAM tests?
Oliver: they need to be revived, then we enable them after the MPI
fixes are available
Antonio: after the changes are available we should start a pilot
service, involving the interested VOs; we could then tune both the
installation and the SAM tests.
Steven: there is a two-hours session on MPI at EGEE'09, on Tuesday
morning; it will cover updates on the MPI WG, on the patch, on which
sites to involve in the pilot, on how thing are evolving from the
applications point-of-view, e.g. for JDL changes. Relevant people
from JRA1, SA3 should be there.
Oliver: it would be useful to have a presentation on how to enable MPI
on a site, but this may well go into another SA1 session, where
sites are more represented.
Steven: if communities find problems they should raise GGUS tickets,
e.g. for wrong published tags
Bug Classification
==================
Integrating GGUS and Savannah
=============================
Francesco presents the key points of the proposal on "Problem
Management and Change Management in gLite", i.e. how to handle "bugs".
There is agreement to apply immediately the parts concerning the
classification based on Severity and Priority, with their consequences
in terms of release management.
Further discussion is needed on the following points:
- Should submission to Savannah be restricted only to gLite people?
This would prevent people without a GGUS account to interact with
gLite maintainers and is not considered for the moment a good
move. On the other hand if problems found by users in production are
not all registered in GGUS, it becomes very difficult to compute
meaningful user-oriente metrics, which are strongly requested by the
project reviewers.
For the moment everybody is still allowed to submit directly to
Savannah, but if the submitter is a user, he/she is requested to
submit also a GGUS ticket, to be linked with the Savannah bug.
Francesco and Antonio to provide an estimate of bugs submitted by
users directly into Savannah.
- For the high-priority fixes, consider their impact on the staged
rollout process.
- There is no user represantative in the EMT, which is the body that
should decide on the priority of changes. This for the moment
doesn't seem to be a big problem, because bug submitters are usually
aware of the impact of that bug on the affected users. Moreover SA1
is present at the meeting and to some extent can provide an
infrastructure (i.e. user) perspective.
- What is the relation with GSVG, managing security vulnerabilities?
The proposal will be extended to cover also that.
- The priority of a patch should be related to the priority of the
attached bugs.
- The proposal should cover rollback of unsuccessful changes.
- The proposal does not currently cover the interaction between GGUS
and Savannah. There is agreeement that the way the state of the GGUS
ticket changes based on state changes of the corresponding Savannah
bug needs to be reviewed. In particular a GGUS ticket cannot be
closed until the Savannah bug has been closed.
AOB
===
Francesco: what is the support to be provided for services/components
on SLC4? Three options:
1. SLC4 is not supported any more;
2. SLC4 is in pure maintenance mode, only critical fixes are
applied;
3. SLC4 is fully supported
Each option has certain practical consequences.
Francesco and Oliver to prepare a proposal.
There are minutes attached to this event.
Show them.