Minutes of the LCG SC2 meeting, 1/4/05

Present: Jean-Jacques Blaising, German Cancio (secretary), Wisla Carena, Matthias Kasemann (chair), Pere Mato, Eric Lançon, Gerhard Raven, Les Robertson, Jim Shank (via VRVS)
Apologies: Marcel Kunze, Albert De Roeck
Absent: Frederico Ruggieri, Lothar Bauerdick, Tony Doyle

Applications Area (AA) Internal Review Closeout 1

Organisational matters. 5

News from the PEB.. 5

Discussion points. 5

AOB.. 7

Applications Area (AA) Internal Review Closeout

The LCG SC2 meeting was combined with the closeout of the LCG Applications Area Internal Review (link to agenda page).

Introduction (Jean-Jacques)

Jean-Jacques reminds that the mandate of the AA internal review includes the following: a) to examine the progress that has been made since the last review, b) the adoption of the recommendations and the preparation of the work program for the second phase of the LCG project, c) to examine the overall coherence of the software, d) to identify the real and potential problems and risks and to make recommendations on the evolution.
The experiments are satisfied with the progress in the AA. The proposed integration of the ROOT and SEAL projects is particularly welcomed.
Most of the recommendations made during the last review have been implemented or are part of the proposed plan, which is considered technically reasonable.
Jean-Jacques recommends that the SC2, and in particular the assigned godfathers, play an active role by closely checking the progress of the Application Area not only at the time of the Quarterly Reports but also in between.
He also proposes to have regularly extended AF meetings, which should include experiment and grid service providers, in order to establish better communication and working relationships (this was discussed by the SC2 at a later stage, see below).
A written report of the AA internal review will be available by mid-April.

Simulation (Vakho Tsulaia)

The size of the GENSER distribution is considered being too big; it is recommended to consider more granular packaging distribution options.
Concerns were expressed regarding the support level given to HEPMC, in particular in the areas of persistency and translators.
Another concern is the planned decrease in manpower in the Physics Validation area (from 2.3 to 0.8 FTE). It is suggested that LCG should try to add manpower in case that GEANT4 considers this being a CERN task.
GEANT4, which has proven its level of maturity, has become the main simulation engine for LHCb, ATLAS and CMS. ALICE is encouraged to clarify its doubts concerning hadronic physics.
It is recommended that SPI tools should be used for the distribution of FLUKA similar to what is already done with other AA software.
Simulation framework:

Experiments are showing no interest in having a common generic simulation framework. In case more than one experiment expresses its interest, VMC will remain an option.
Further development of GDML is encouraged.
GEANT4 Python interface: The documentation should be improved. Also, an exchange of experiences with experiments that are building similar solutions (like ATLAS) is suggested.

SPI (Marco Cataneo)

Marco underlines the impressive progress since the last review, with a widespread adoption of SPI tools by experiments and projects. The build system is no longer an issue.
Most recommendations have been implemented. In particular, there are already visible benefits of having a central librarian in place.
It is recurrently observed that those tools that have been developed for SPI should be packaged for general use.
The Doxygen/LXR documentation should be produced automatically as part of the release procedure; also, cross-referencing between projects should be possible.
Savannah:

It is recommended to set up a user forum, for example in the form of a mailing list similar to root-talk.
Tools for bulk submission and retrieval would ease migration and preparation of reports and statistics.
The proliferation of additional systems for bug and task tracking (Bugzilla, ROOT bug DB) is a concern. Experiments need a coherent system which eases cross-referencing and migration of bugs between projects. It is recommended to converge on Savannah and to not dedicate CERN/LCG resources for the maintenance of alternative systems.

The procedures for selecting and defining the lifetime of packages, platforms and compilers should be documented. This includes documenting the corresponding support commitments. Not only the AF but also other LCG areas should be involved in the decision making process.
Build and distribution:

Even though the choice of build tools is no longer an issue, a clear statement of strategy is required. This includes the clarification of the role of SCRAM.
Package dependencies should be minimized by making the distinction between build, test and runtime dependencies.
The LIM (experiment librarians) meeting should be used for discussing and defining what different distributions are needed. The needs of LCG deployment should be addressed as well.
The AA should be represented in Linux and compiler certification discussions.

An impressive suite of tools is now in place. Their adoption by the experiments is encouraged. Clear QA procedures to be followed by projects should be defined, as well as ways to encourage compliance.
SPI should adopt a coordinator’s role for the evaluation and selection of external tools, which includes making recommendations to the AA and AF community. In order to minimize duplications, this coordination role could be extended to include non-QA tools like profilers or XML parsers.

Even though training is not one of SPI’s current responsibilities, it is felt that it should become one. The very successful Python course should be continued.

POOL (Predrag Buncic)

The progress since the last review has been excellent. Most recommendations have been implemented. POOL is deployed and used in DC’s by three experiments; around 400TB of data has been stored.
The impact of the merger of SEAL and ROOT on POOL is of concern, since it will generate additional workload that must be taken into account in the planning.
The documentation has been greatly improved but in the User’s Guide, there are still missing, wrong/obsolete and garbled documentation items.
Despite a great effort in bug fixing, there is a small number of persistent bugs. The release process should be streamlined with the rest of the AA.
Error handling and reporting needs to be improved. In particular, error reporting must be propagated to end users with clear indications of what components failed.
With regard to POOL collections, a lack of clear requirements from the experiments is expressed. CMS is using POOL implicit collections but other experiments may require new functionality in the future. In order to anticipate the required efforts, deadlines for the submission of new user requirements should be suggested.
File Catalogues: POOL will have to work with different FC back-ends as selected by sites and VO’s. These FC’s should implement the POOL API’s. In order to differentiate between POOL and FC backend performance/problems, a reference benchmark for the FC’s should be defined.
COOL: Experiments interested in COOL are invited to commit more manpower in order to assure the survival of the project. So far, two experiments have expressed commitments to use COOL; CMS is considering its usage.
In order to comply with security aspects in POOL, end-to-end solutions should be taken into account; POOL should not be the weakest point in the chain. The impact of security on performance is of concern. Precise user requirements are needed in order to define an appropriate solution. It is suggested to check user requirements and solutions developed in the Grid community and other applications like PROOF.

ROOT and SEAL (Gerhard Raven)

Major progress has been made in the SEAL area. Since the last review, there has been a widespread adoption by the experiments.
In terms of project organization, all experiments welcome the proposed merger of ROOT and SEAL activities. The experiments should set the schedule and priorities in their role as stakeholders. LCG manpower should concentrate on high priority items (like dictionary, Mathlib), and the AF should supervise the process.
The merger should preserve the best of both projects. The architectural strengths of SEAL, like its component model, should be preserved. It is not sufficient to limit it to “adding missing features to ROOT”.
A light-weight packaging with minimized dependencies is considered crucial. Applications should be able to select core components without having to take the entire framework.
Basic classes and components should be decoupled where appropriate (e.g. removing inheritance from TObject in ROOT-CORE). Also due to the differences in the plug-in architecture of SEAL and ROOT, proposed changes will need to be carefully measured against the impact on existing experiment schemes.
There is a broad agreement for a common dictionary; more detailed planning will be defined in a workshop in May. The integration of Mathlib is the most advanced.
The proposed schedule for SEAL and ROOT migration, which aims for common and duplication-free libraries in January 2006, is supported.

The role of CLHEP and possibilities for its replacement were discussed. CLHEP is an external component and its evolution is not controlled by the AA. However, CLHEP is being used by GEANT4 and the experiments. The replacement costs need to be evaluated, and possible migration strategies need to be defined.

Organisational matters

The previous minutes (link) were circulated and will be approved, if no comments are made until 6/4/05.
Next meeting (June 3, Agenda page):

The next meeting will be focused on the review of the Q1/05 LCG Status Report, which is due by the end of April. In order to prepare the meeting, all SC2 members should review the quarterly status report and come up with concerns and questions, especially in the assigned godparent section. Questions and concerns should be sent to the SC2 list prior to the meeting (by Wednesday May 18), such that by the time of the meeting meaningful answers can be prepared by the LCG project.
A phone conference is proposed for Wednesday May 18, 17:00h.
Pere requests that since already subject to an internal review and an SC2 focus meeting, the Applications Area should be excluded from the next Status Report review. SC2 agrees with this proposal.

News from the PEB

Service Challenges: An improved plan for Service Challenges was presented (link) and discussed. Not only requirements on capacity and throughput have been defined, but also which of the major sites will be joining and at what dates.
SC2 is currently running. Its goal is to perform sustained disk-to-disk (SRM) transfers to seven Tier-1 sites at an aggregate target rate of 500MB/s during 10 days. Excepting hardware problems at CERN that have been looked after, the SC is running very smoothly and is reaching peaks of 700-800 MB/s. FermiLab and FZK are providing large capacities. With regard to SC1, a significant improvement has been achieved in terms of networking and sites organization. Les points out that the WAN speed record in June last year was around 6.4Gb/s. This is a very similar number to the one achieved now, sending data from real file systems to real file systems in steady rate.
SC3 will include disk-to-tape transfer tests from CERN/T0 to T1 sites and running experiment jobs. A number of T2’s will be involved as well. From September on, a service part will be started and experiments will get involved by carrying out tests, in order to validate their computing models. SC3 will represent a big increase in complexity over SC2.
Replying to a question by Matthias, Les reports that a draft of the LCG TDR is scheduled for April 11 and will contain a general LCG section, and experiment-specific sections.

Discussion points

Should the SC2 interact more with the experiments and how?

Even if some SC2 members belong to experiments, the SC2 committee as such does not directly interact with the experiments. The experiment computing coordinators were invited only to the first SC2 meeting after the reorganization. Matthias points out that the ALICE and LHCb sections are missing in the LCG Q4/04 status report, which makes the follow-up difficult, in particular for SC2 members not based at CERN. The SC2 would like to see that the experiment contributions are not missing from the LCG status reports. Based on this input, the SC2 godparents may contact experiments with questions and requests for clarifications. Matthias will contact the experiments coordinators in this regard.

How can the SC2 help the reorganized Application Area?

Pere considers it helpful that there was a review already at the beginning of his mandate as AA manager. This review provides him with valuable input and recommendations that he can now discuss with the experiments inside the Architect’s Forum. According to Pere, the AF is the right body for such discussions in particular if a clear strategy has been put in place. In case of items, for which no agreement is found inside the AF, there is a defined escalation procedure to the PEB where the experiment coordinators are represented. However, this escalation is rarely needed and should be avoided. An incentive for collaboration within the AF is that LCG resources are common to all experiments and need to be shared. Also, it was pointed out that LCG funded resources should be devoted to LHC Computing related activities.

Is a forum needed for client - to - Grid Service provider communication?

Should the AF be extended for dedicated meetings? In principle, ARDA was set up with the idea of providing such a forum. Ways have to be found how to better integrate ARDA into the main activities of the experiments. Alternatively, the successful Baseline Services Working Group (BSWG) might be continued as an ad-hoc working group. However, the BSWG was formed with a set of topics defined in advance, so there might be a mismatch in the expertise of the WG members as the subjects evolve.
On one hand, general-purpose forums may tend to grow too much and the discussions may become less focused and may not lead to decisions. On the other hand, there is a limited representation of expertise in restricted forums like the AF or BSWG. Pere suggests creating a software development activity targeted to physics analysis and solving concrete problems. The possibility of restructuring ARDA for this purpose is discussed. It is suggested that ARDA could move onto more focused products and away from independent and experiment-specific activities with emphasis on testing. However, any change to the ARDA work plan would need prior discussion and agreement with the experiments.
PROOF: A Program of Work needs to be defined for PROOF, including a specification of the required environment for significant testing. Within the AA, PROOF would be best placed outside the ROOT/SEAL CORE activities.

Are there changes in the AA staffing estimates?

Pere informs that the current manpower figures are essentially unchanged from Torre’s original planning. Pere will revisit the planning but he does not expect that the bulk number will change. In manpower terms, the LCG contribution should be considered a core around which the experiments contribute. However, manpower contributions below a given threshold (e.g. below 20%) should not be counted, since they are insufficient for productive work.

AOB

None.