Present: German Cancio (secretary),
Wisla Carena, Matthias Kasemann (chair), Eric Lançon, Gerhard Raven (via VRVS),
Les Robertson, Jim Shank (via VRVS)
Apologies: Jean-Jacques
Blaising, Marcel Kunze, Albert de Roeck
Organisational matters. 1
Report
from the last POB meeting. 1
News
from the PEB.. 1
LCG
Status Report Review (Q2 2005) 3
1. Applications Area (Jean-Jacques,
Albert, Gerhard) 3
2.
CERN Fabric Area. 3
3.
Grid Deployment Area. 4
4.
Middleware Area. 4
AOB.. 5
- The previous minutes (link)
were approved.
- The POB meeting where SC2 will report will
take place on Monday September 19.
- The next SC2 meetings are on
Friday October 14 (Middleware Area review focus), Friday November 25 (Q3
Status Report review) and Friday December 16 (Fabric Area review).
- Matthias’
slides presented to the POB are found here.
- The new LCG phase II project structure
was discussed at the last POB meeting on June 20. The structure and setup
of the phase II review bodies has however not been decided yet. The work
of the SC2 was defined only for phase I; with its termination by the end
of December 2005, the SC2 will cease to exist.
- The proposal (endorsed by the SC2) to
include status reports on US and Nordic grid infrastructure projects in
the LCG quarterly reports was accepted. It is preferable that this item is
followed up by the LCG project management and not the SC2 itself.
- A large number of jobs (at least
7000-8000) are now regularly running on the EGEE grid.
- The experiment-led task forces, aimed
towards focusing effort on the effective and reliable use of the EGEE grid,
are starting now. Separate task forces with specific mandates have been
set up for each experiment: CMS (focus on the delivery of their Computing
System by means of Service and Data Challenges; led by Stefano Belforte);
ATLAS (optimization of the Workload Management System, integration of
VOMS; led by Laura Perini); ALICE (integration of the ALICE Distributed
Computing Environment; led by Federico Carminati). LHCb has already a
similar structure including weekly meetings with the EIS team. The primary
aim of the task forces is to focus rather on operational than planning
aspects. Planning issues like site staffing levels are discussed in the
Phase-II planning meetings.
- Accounting: Most of the Tier-1 sites
are now providing consistent accounting information; suspected mismatches
between announced resources and usage are being addressed.
- Resource planning: At the next Phase-II
planning meeting, the resource tables for the CRRB in October will be
discussed. The resource situation for Tier-2 sites is improving
significantly.
- OSG-EGEE integration: An important step
forward has been achieved by running in operation one EGEE site with OSG
software and one OSG site with gLite. Les attended an OSG council meeting
in July; they are looking forward to actively collaborate with EGEE. The
EGEE-2 proposal will also contain formal statements on integration with
other Grid projects.
- SC3 throughput phase: 500MB/s of
sustained throughput have been achieved, which is unfortunately still
below the goal (1 GB/s aggregate data rate out of CERN). The tests have
been stopped in order to concentrate on observed reliability and
throughput issues.
- Significant progress has been achieved
at CNAF in addressing CASTOR-related problems; the reliability is now
very good. However, performance still needs to be improved.
- In the case of dCache, some observed reliability
problems may be due to configuration issues on new sites. A workshop dedicated to this will be
organized next week. It also seems that dCache has been optimized for
SRMCopy but not for GridFTP which is the protocol used in the current FTS
version. Even though FTS has now been extended to support SRMCopy, this
new functionality has not yet been deployed.
- New performance tests will be carried
out in October.
- Unlike during the SC3 throughput phase,
the service phase will be running on the standard production
infrastructure. This will increase reliability. Also, the aggregate data
rate will be at approximately 200MB/s and will therefore be lower than
during the throughput rate.
- CASTOR2 migration: The experiment
migration is not progressing in accordance with the planned schedule. The
new client software is already deployed but most experiments have not yet validated
their frameworks on CASTOR2. The resulting delays may have an impact on
the service phase of SC3. The originally planned schedule foresees that
LHC experiments should have their data migrated by the end of January
2006, starting first with experiment production groups and then followed
by LHC and general users.
- A pre-GDB meeting has been organized,
which will focus on the current status of SC3 and on the planning and
preparation for SC4.
- Manpower and budget: The recruitment
process has been completed, and LCG vacancies at CERN have been filled. The requirements defined in the TDR are still beyond the means of
the currently available budget. However, the CSO has announced to the LHCC that CERN will provide the full
set of service functionality. The budget situation will have to be re-evaluated
on a year-by-year basis.
- Purchasing plans: A paper will be presented to the Finance
Committee in September, in which permission will be asked for renting tape
equipment from two vendors until March 2007. Results from evaluating this
equipment will be reported to the FC in September 2006. The plans for full
acquisitions will then be based on the experience with the rented
equipment.
- Gerhard praises the progress made in
the AA area. However, he points out the lack of agreement with the
experiments on the AA work plan, in particular regarding the ROOT/SEAL
merger. The experiments are concerned by the support level for the
existing SEAL package. It is requested that SEAL remains supported until it
can be replaced inside the experiments frameworks by the new merged
library. Moreover, ATLAS, CMS and LHCb would like to avoid that the merged
library introduces new dependencies on ROOT.
- Achieving a detailed work plan was the
next major AA high priority milestone. SC2 considers its completion a key
factor for the success of the AA re-organization.
- John Harvey reported on behalf of Pere
Mato that a special AF meeting dedicated on work plan discussions took
place on July 21. In this meeting, it was agreed to continue support for
SEAL until the experiments are in a position to migrate their code. During
that period, the ROOT/CORE team will be in charge of supporting SEAL. The
work plan will be further discussed at the next AF meeting and finalized
in time for the next Quarterly Report. Matthias will contact Pere for a
summary of the planning status.
- Wisla summarizes the situation in the
CERN Fabric Area as very satisfactory. The new CASTOR version has been
delivered in June, and a migration plan has been agreed.
- As Bernd explains, the LTO evaluation
milestone (1.2.1.3.1) is completed for LTO-2 drives, as reference LTO-3
drives were only available in Q2 2005. LTO-3 drives will also be evaluated,
but there is no explicit milestone for that activity.
- Concerning milestone 1.2.1.1.4 (full
functional CASTOR release available), Wisla insists on quoting a release
number to the new CASTOR system (including development versions) for
future reference. This request is endorsed by the SC2. Bernd has agreed to
include it to the next report.
- With regard to the missing milestone
concerning the interface between CASTOR and the experiments, Bernd has
recognized its importance and proposes to set the requested milestone by
the end of the year. This new milestone will include the definition and
implementation of CDR set-up requirements for the different experiments.
The SC2 supports the addition of this milestone.
- Regarding milestone 1.2.1.2.9 (ALICE DC
at 1GB/s), Bernd has agreed to the carrying out of functional network
tests between CASTOR2 and ALICE DAQ by the end of Q4 2005.
See also the ‘News from the PEB’ section on
SC3 activities and status.
- Due to unavailability during summer
time, no review report is available from the Godfathers.
- Eric points out that there are no
milestones for the 3D project (Distributed Deployment of Databases). He
will contact Ian Bird on this matter.
- Jim highlights the high quality of the
Middleware Area contribution to the Quarterly Report.
- Jim acknowledges that the four
experiments are making progress with their ARDA prototypes. However, the
usage of gLite components by the experiments within the ARDA prototypes
needs to be increased, in order to profit from common functionality and
long-term maintenance and support. This is endorsed by the SC2 and will be
followed up at the next SC2 meeting in October (Middleware Area focus
meeting).
- The important differences between
experiment job success rates needs a better understanding. The end-to-end
job success rate is a very important metric for the experiments; however, a
more detailed failure analysis is required and should be reported.
- Bug reporting: In addition to the
existing charts with statistical information on status, subsystem and
nature of bugs, the SC2 recommends to add a severity classification as
well as information on problem resolving schedules.