LCG Management Board

Date/Time:

Tuesday 12 June 2007 16:00-17:00 - Phone Meeting

Agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=13799

Members:

http://lcg.web.cern.ch/LCG/Boards/MB/mb-members.html

(Version 1 14.6.2007)

Participants:

A.Aimar (notes), D.Barberis, N.Brook, F.Carminati, L.Dell’Agnello, X.Espinal, I.Fisk, S.Foffano, J.Gordon, F.Hernandez, M.Lamanna, H.Marten, P.McBride, L.Robertson (chair), R.Tafirout, J.Templon

Action List

https://twiki.cern.ch/twiki/bin/view/LCG/MbActionList

Mailing List Archive:

https://mmm.cern.ch/public/archive-list/w/worldwide-lcg-management-board/

Next Meeting:

Tuesday 19 June 2007 16:00-17:00 - Phone Meeting

1. Minutes and Matters arising (Minutes)

1.1 Minutes of Previous Meeting

F.Hernandez asked for the following change to the minutes of the 5 June 2007.

In section "4. Accounting Grid and Non-Grid Submitted Jobs" it is stated:

"F.Hernandez stated that for the sites is important to distinguish gird vs. non-grid jobs.
But this is not possible to store in the GOC database. Actually IN2P3 is not sending the
monthly report but a record for each job. But for the non-grid jobs they cannot provide the
VO information and the DN because these are modifiable only via the APEL data collector."

This is not what I meant. I propose to replace it by:

"F.Hernandez stated that currently CC-IN2P3 is sending to the APEL repository the accounting
information for both grid and non-grid jobs. However, the accounting web portal doesn't allow
you to distinguish between those 2 usages of the site because in the APEL database schema there
is no field for storing the type of job. As CC-IN2P3 has its own batch system it is not using APEL
directly. The BQS accounting database is queried to build the job records to be sent to the
APEL repository (through RGMA) in the appropriate schema. Records for non-grid jobs sent
to the APEL repository contain the VO name but not the DN of the submitter because it is
an attribute relevant for grid jobs only."

The MB accepted the change and the minutes were approved.

2. Action List Review (List of actions)

Actions that are late are highlighted in RED.

29 May 2007 - A.Aimar and L.Robertson circulate follow-up milestones on the VOs Top 5 Issues to the MB.

Done during this meeting.

3. Sites Current Capacity and Procurement Plans (more information) - Roundtable Sites

In order to see where we are vs. the 2007 pledges.

L.Robertson attached the status of the capacity installed at the sites (as from Harry’s table) and the MoU 2007 pledges. See more information.

TRIUMF

They have selected the supplier and will make a major purchase in the Summer.

By mid-August they will have available 1400 KSPECint of CPU, 720 TB of disk and 560 TB of tape.

This also includes non-LCG resources. But what in the TRIUMF 2007 pledge will be assigned to the LCG.

IN2P3

IN2P3 will honour their pledge by 1 July 2007, except for disk where they will not meet the pledges capacity.

The missing disk capacity will be installed by the end of September.

More details received via email.

1) CPU:
Installed capacity: 1171 MSI2k
Pledge for 2007: 1286 MSI2k
The 115 KSI2k remaining will be provided on July 1st as agreed.

2) Disk
Installed capacity: 404 TB
Pledge for 2007: 729 TB
The remaining 325 TB will be provided in two phases:
a) 200 TB on demand (from July 1st to the end of September) according to the experiments needs
b) 125 TB by end of September, when the purchased hardware (which is expected to be delivered
by the end of July) will be installed and ready for operation.

3) MSS
Installed capacity: 510 TB
Pledge for 2007: 745 TB
The remaining 235 TB will be delivered on demand from July 1st, as agreed.

INFN/CNAF

CNAF is installing the CPU purchased with the last tender. But most resources are for not LCG use.

Most capacity is available already; only disk capacity will be late and be ready for end of the summer.

More details received by email.

During the second half of 2007 the CNAF infrastructure will be upgraded
to support more electric power and increase redundancy of the a/c system.
Pending this upgrade there will be limited room for expansion (besides
the new 1500 KSI2k being installed now): about 200 TB of net disk space
and a new tape library.

1) CPU:
Installed capacity: 1300 KSI2k
Pledge for 2007: 1300 KSI2k

2) disk:
Installed capacity: 400 TB
Pledge for 2007: 500 TB
Comment: a tender for additional 200 TB is on going (part of this disk space will be for
LHC). We foresee to have this additional capacity in place for the end of this Summer.

3) Tape:
Installed capacity: 500 TB
Pledge for 2007: 650 TB
Comment: a tender is on going; new tape space will be available at the end of July.

FZK

The CPU and tape are provided since April 2007. For disk they are actually providing 315TB (not as indicated in the table).
They will inform H.Renshall of this mismatch. The Disk for July is at FZK already and is being installed and configured; therefore
FZK by July should match the 2007 pledges.

SARA/NIKHEF

The disk values in the table seem not up to date, as of April the disk capacity is 270 TB and should have been sent to H.Renshall.

The tendering process is taking longer than expected and will continue till the Fall.

Probably not new installations will be done before end of 2007.

ATLAS pointed out that they need more disks. SARA will try to match their requests by moving to LCG other resources from non-LCG activities.

For tape SARA has a central pool that manages all tapes; they will always buy what is needed.
Therefore SARA can actually be seen as already matching the 2007 pledges.

L.Robertson noted that the delay is quite worrying and reminded that sites will also have to fulfil the 2008 pledges by April 2008.

L.Robertson asked J.Templon to send details on the dates by when the resources will be available.

NDGF

NDGF not represented, information received by email.

CPU
---
MoU Numbers: 688kSI2k
Installed: 595kSI2k
Remaining 93kSI2k will be part of a Finnish cluster of 512CPUs already installed at CSC in
Helsinki. It is expected to be made available to ALICE during this week.

Disk
----
MoU Numbers: 385TB
Installed: 140TB
Further 100TB will be installed at the Swedish Tier-1 sites during the summer. And the
remaining storage will be installed during Q3 and Q4. With the current consumption rate
this will suffice.

D.Barberis commented that actually ATLAS is limited by the resources available and not vice versa as stated by NDGF.
ATLAS will contact NDGF to clarify the issue.

MSS/Tape
--------
MoU Numbers: 273TB
Installed: 112TB
The remaining tape storage will be installed in Sweden only when needed. Currently only
a minimal amount of data is on tape.

PIC

CPU installed is about 600 KSPECint but only 450 are assigned to the LCG for now, the whole 600 will be assigned in the next weeks.

Disk capacity is 100 TB available in the racks but PIC is currently busy installing dCache 1.7. Other 70 MB arriving in a few days.

Tapes are on site but they will be allocated only when required by the experiments.

PIC will honour the 2007 pledges.

More details received via email.

- CPU:

-------------

Installed: 600 kSI2k (450 appearing for the LHC)

2007 Pledge: 501 kSI2k

-> Plan: 150 kSI2k were reinstalled in SL4 for non-HEP applications, in the next days all WNs will be installed with SL4 and will recover a total of 600 kSI2k.

- Disk:

--------------

Installed: 86 TB

2007 Pledge: 218 TB

-> Plan (in two steps):

A) 100TB extra are already in the racks, dCache 1.7 is being installed and
should enter in production before 1st of July.

B) New purchase of ~70TB is arriving in the next days, same hardware as A),
so deployment should be straightforward.

- Tape:

----------------

Installed: 167 TB

2007 Pledge: 243 TB

-> Plan: ~200TB ready to be allocated as the experiments needs arises.

ASGC

Not represented at the MB meeting. Information received via email.

- CPU:
-------------
current capacity: 640 kSI2k
pledge for 2007 : 1770 kSI2k
plan to install 1130 SI2k (xeon 5150 processor (2.66ghz, dual core, each with 2.69ksi2k).
reinstallation of slc4 base on the schedule of experiment and deployment release milestone.
the new facilities plan to deliver mid of Aug.

- Disk:
--------------
current capacity: 360TB
pledge for 2007 : 900 TB
-> short for 540TB, the expansion of storage system will divide into two phases, the
first deliverable plan to install 80TB (gross, net close to 72TB with raid6). this plan
to be installed end of Aug. and the 2nd phase plan to meet target number, and expect to be
completed mid of Q4.

- Tape:
----------------
current capacity: 280TB and 4 LTO3 tape drives
pledge for 2007 : 800TB
-> plan to install 520TB (1300 cartridges, LTO3) with another 4 LTO3 tape drive to provide
dedicated throughput of tape migration. the new resources plan to deliver mid of Q3.

RAL

The 2007 procurement is still selecting the suppliers; there will not be new capacity until October or November 2007.
RAL can tune the resources provided to LHC and to non-LHC users. The balance will be changed in order to increase the
resources available to the LHC VOs.

BNL

Not present at the meeting, but M.Ernst had sent the information via email.

CPU

---

BNL has taken delivery of 120 nodes (4 cores/node, 2 GB memory/core, 4.5 TB disk/node)
last week. This will add ~1 MSI2k to the existing processing power of 1.6 MSI2k at the
end of this week. The total of 2.6 MSI2k will almost meet the pledged compute power for
ATLAS of 2.7 MSI2k in 2007.

Disk

----

~400 TB (net) are currently being installed as a first phase of procurements to be made
in 2007. This will make ~900 TB available to ATLAS in two weeks from now. A purchase order
worth of 700 TB (net) storage capacity has left BNL last week. We expect to take delivery by
the end of June. The capacity available to ATLAS by the end of 2006 combined with the two
procurement phases in 2007 adds up to a total storage capacity of ~1.5 PB, available to ATLAS
in late July, which meets the 2007 pledge.

MSS/Tape

--------

At the US ATLAS Tier-1 center there is a dedicated STK 8500 robot with 10,000 slots and
currently 10 LTO 3 drives installed. The foreseen ~1 PB of tape media is already available
to ATLAS. Another 10 LTO 4 drives will be added to the library in Q IV 2007.

FNAL

FNAL are above the LCG pledges already but below the pledges agreed with CMS (3.9 MSPECint) they will meet those by end of July.

Disk resources are at 700 TB for CMS but 1 PB is arriving in two weeks and in August. FNAL will reach 1.5 PB by the end of the Summer.

Tape capacity is at 1 PB. The 3 PB pledges are available as free slots in the robots. FNAL will wait as long as possible in order to purchase
newer kinds of tapes and fulfil the pledges agreed with CMS.

J.Templon asked to the Experiments what the plans are for June-August because there is no request of resources in Harry’s table.

D.Barberis and F.Carminati replied that for 1 July 2007, their experiments expect the MoU pledges to be honoured no matter what is
specified in Harry’s tables. Sites should try to get as soon as possible to the 2007 pledges.

H.Marten noted that Harry’s tables for Q3 and Q4 are missing and they should be added to the LCG Planning page.

{This has been done on 13 June.}

4. Experiments New Schedule and Requirements - Roundtable Experiments

L.Robertson reminded the Experiments that an updated version of their requirements is due to be published in July, forming the basis for 2008 commitments by funding agencies at the October meeting of the C-RRB. He asked the experiments for the status of the update.

ALICE

ALICE is revising the requirements and will align to a date from the other experiments, One month from now seems reasonable.

ATLAS

ATLAS will just remove the 2007 technical run and keep all the rest the same (cosmic runs, data taking without beam, etc). Therefore the changes will be negligible. During the ATLAS week at the end of June the revised requirements will be discussed and formally agreed.

CMS

Agreed with ATLAS. No major change of resources required is expected. More information by early July.

LHCb

The cancellation of the engineering runs will not change much of LHCb’s requirements.

L.Robertson reminded the experiments that the fact that the requirements are basically unchanged should be communicated by their spokespersons at the LCG Overview Board to avoid any misunderstandings of the urgency to meet the 1 April date for availability of the full 2008 resources.

5. Proposal of New High Level Milestones for 2007 (document) - A.Aimar

A.Aimar presented the table below and asked for feedback and comments to be sent to the MB mailing list before next MB meeting. Newly introduced milestones are in blue.

11.06.2007

WLCG High Level Milestones - 2007

Done (green)

Late < 1 month (orange)

Late > 1 month (red)

Date

Milestone

ASGC

CC IN2P3

CERN

FZK GridKa

INFN CNAF

NDGF

PIC

RAL

SARA NIKHEF

TRIUMF

BNL

FNAL

24x7 Support

WLCG-07-01

Feb 2007

24x7 Support Definition
Definition of the levels of support and rules to follow, depending on the issue/alarm

Sep
2007

Jun
2007

WLCG-07-02

Apr
2007

24x7 Support Tested
Support and operation scenarios tested via realistic alarms and situations

WLCG-07-03

Jun
2007

24x7 Support in Operations
The sites provides 24x7 support to users as standard operations

VOBoxes Support

WLCG-07-04

Apr
2007

VOBoxes SLA Defined
Sites propose and agree with the VO the level of support (upgrade, backup, restore, etc) of VOBoxes

Jun
2007

WLCG-07-05

May 2007

VOBoxes SLA Implemented
VOBoxes service implemented at the site,

Jul 2007

VOBoxes Support Accepted by the Experiments
VOBoxes support level agreed by the experiments

VOMS Job Priorities

Jun 2007

New VOMS YAIM Release and Documentation
VOMS release and deployment. Documentation on how to configure VOMS for sites not using YAIM

EGEE-SA1

Milestones below suspended until there is a YAIM Installation Package (and Documentation for Sites not using YAIM) 15..06.2006

WLCG-07-06

Apr
2007

Job Priorities Available at Site
Mapping of the Job priorities on the batch software of the site completed and information published

WLCG-07-07

Jun
2007

Job Priorities of the VOs Implemented at Site
Configuration and maintenance of the jobs priorities as defined by the VOs. Job Priorities in use by the VOs.

Accounting

WLCG-07-08

Mar 2007

Accounting Data published in the APEL Repository
The site is publishing the accounting data in APEL. Monthly reports extracted from the APEL Repository.

3D Services

WLCG-07-09

Mar
2007

3D Oracle Service in Production
Oracle Service in production, and certified by the Experiments

Jun
2007

squid frontier

WLCG-07-10

May 2007

3D Conditions DB in Production
Conditions DB in operations for ATLAS, CMS, and LHCb. Tested by the Experiments.

squid frontier

Procurement

WLCG-07-16

Jul
2007

Procurement of 2007 MoU Pledges
To fulfill the agreement that all sites procure they 2007 MoU pledged by July 2007

WLCG-07-17

Apr 2008

Procurement of 2008 MoU Pledges
To fulfill the agreement that all sites procure they MoU pledged by April of every year

FTS 2.0

WLCG-07-18

Jun
2007

FTS 2.0 Tested and Accepted by the Experiments
In production at CERN and accepted tested by each Experiment

ALICE

ATLAS

CMS

LHCb

WLCG-07-19

Jun
2007

Multi-VO Tests Executed and Tested by the Experiments
Scheduled at CERN for last week of June

ALICE

ATLAS

CMS

LHCb

WLCG-07-20

Sept 2007

FTS 2.0 Deployed in Production
Installed and in production at each Tier-1 Site

BDII

WLCG-07-21

Jun 2007

BDII Guidelines Available
On how to install BDII on a separated node

EGEE - SA1

WLCG-07-22

Jun 2007

Top-Level BDII Installed at the Site
For each Tier-1 site

WLCG-07-23

Jul 2007

Run the CE Info Provider on the Site-Level BDII
For each Tier-1 Site

glexec

WLCG-07-24

Jul 2007

Decision on Usage of glexec and Guidelines to Follow Ready

GDB

Site Reliability - June 2007

WLCG-07-12

Jun
2007

Site Reliability above 91%
Considering each Tier-0 and Tier-1 site

WLCG-07-13

Jun
2007

Average of Best 8 Sites above 93%
Eight sites should reach a reliability above 93%

MSS Main Storage Systems

WLCG-07-25

Jun 2007

CASTOR 2.1.3 in Production at CERN
MSS system supporting SRM 2.2 deployed in production at the site

CERN Tier-0

WLCG-07-26

Jul 2007

CASTOR 2.1.3 Tested and Accepted by the Experiments
Experiments tested CASTOR at some of the CASTOR sites

ALICE

ATLAS

CMS

LHCb

WLCG-07-27

Jul 2007

dCache 1.8 Tested and Accepted by the Experiments
Experiments tested dCache at some of the dCache sites

ALICE

ATLAS

CMS

LHCb

WLCG-07-28

Sept 2007

Demonstrated Tier-0 Performance
Demonstration that the highest throughput (ATLAS 2008) can be reached.

CERN Tier-0

WLCG-07-29

Sept 2007

CASTOR 2.1.3/dCache in Production at the T1 Site
MSS system supporting SRM 2.2 deployed in production at the site

WLCG-07-30

Dec 2007

SRM Implementations with HEP MoU features
With full features agreed in the HEP MoU (srmCopy, etc).

CASTOR

DCache

DPM

SL4 Migration

WLCG-07-11

Depl Date + 30d

SL4 Operational at Site (for WN and UI nodes)
This has to happen within 30 days after the release from GD.

Replaced by individual milestones for the Middleware components.

WN and UI

WLCG-07-31

Jun 2007

WN Installed in Production at the Tier-1 Sites
WN on SLC4 installed on each Tier-1 site, with the configuration needed to use SL4 or SL3 nodes

WLCG-07-32

Jun 2007

UI Certification and Installation on the PPS Systems

EGEE - SA1-PPS

WLCG-07-33

+4 weeks

UI Tested and Accepted by the Experiments

ALICE

ATLAS

CMS

LHCb

WLCG-07-34

+4 weeks

UI Installed in Production the Tier-1 Sites

gLite CE

WLCG-07-35

Sept 2007

gLite CE Development Completed and Component Released

EGEE - JRA1

WLCG-07-36

+4 weeks

UI Certification and Installation on the PPS Systems

EGEE - PPS

WLCG-07-37

+4 weeks

UI Tested and Accepted by the Experiments

ALICE

ATLAS

CMS

LHCb

WLCG-07-38

+4 weeks

UI Installed in Production at the Tier-1 Sites

SAM Vo-Specific Tests

WLCG-07-39

Sept 2007

VO-Specific SAM Tests in Place
With results included every month in the Site Availability Reports.

ALICE

ATLAS

CMS

LHCb

CAF CERN Analysis Facility

WLCG-07-40

Oct 2008

Experiment provide the Test Setup for the CAF

ALICE

ATLAS

CMS

LHCb

Xrootd

WLCG-07-41

Jul 2008

xrootd Interfaces Tested and Accepted by ALICE

ALICE

Site Reliability - Dec 2007

WLCG-07-14

Dec
2007

Site Reliability above 93%
Considering each Tier-0 and Tier-1 site

WLCG-07-15

Dec
2007

Average of Best 8 Sites above 95%
Eight sites should reach a reliability above 93%

6. AOB

S.Foffano raised the issue that the Megatable needs to be updated (Tier-1 Tape 0 disk cache values in particular). The only input received is from ALICE. A meeting will take place on Thursday 14 June and the experiments are expected to send their information prior to the meeting. A reminder will be sent to the participants to the Megatable’s meetings.

7. Summary of New Actions

The full Action List, current and past items, will be in this wiki page before next MB meeting.