____________________________________________________
____________________________________________________

Planning for May run of CCRC'08

____________________________________________________
____________________________________________________

The LHC Status and Schedule (Jos Engelen ) 
____________________________________________________

The experiments have to be closed by June. Detector cooldown starts in one 
week in some sectors. It will be all over the detector by the end of June (it 
takes 9 weeks to cool down and stabilize)

The schedule still has to be confirmed at the end of May.

The plan is to commission for 5TeV. It needs: one month to check the machine. 
Then the beam will start. Two months beam commissioning, and then plot run of 
a few weeks.

Questions and answers:
**********************
Kors: Once the detector runs, is there a standard running cycle?
Jos: No final decission yet. Probably around 3 weeks running, 2 weeks 
maintenance

Question: When is the first lead beam schedule?
Jos: Not in 2008. If everything goes according to plan, in 2009

Jamie: Any wishes or hopes for the CCRC?
Jos: Good luck! CCRC is a very important exercise

Dario: What is the minimum length of the winter shutdown?
Jos: It will be determined by the completion of the runs, and agrred between 
machine and experiments. For instance, CMS will need to install new parts. It 
will probably be around 3.5 months 
____________________________________________________

Resource / Service Review - What can we count on for 2008 production?
Harry Renshall
____________________________________________________

The numbers presented could be slightly modified after the presentations that 
the experiments will give.

All the CPU pledge for 2008 will be there for 1 April. For disk and tape, some 
sites will still have to catch up. The May run of CCRC will be with 55% of 
resources. Then individual reports from all the T1:
o)ASGC all ready for 1 April
o)CCIN2P3. CPU ready for May, tape for April and missing 50% of disk o)CERN. 
CPU and tape ready, disk part for May, part for June
o) FZK/GridKa. CMS part all ready, ALICE part will be deliver in October 
(ALICE agreed to that) o)INFN/CNAF. CPU and storage for mid may o)NDGF. CPU 
for April, disk and tape delivered on demand o)NL-T1. 2007 pledge in April, 
2008 pledge in November o)PIC. CPU April, storage from June to October o)RAL. 
Full pledge 1 april O)TRIUMF. Full pledge ready o)US-ATLAS. CPU and tape in 
April, disk during the summer o)US-CMS. CPU by May. Storage waiting on 
transition to LT04

To sum up, the CPU is all delivered. Storage is the most critical part.

Questions and answers:
**********************
Ian: Almost all the sites had problems to deliver the pledge storage, so there 
are reasons to be worried. We can discard the data of this may, but not of 
next year!!
Harry: The message has been passed loud and clear to the sites already.

Q: Would it be possible to redistribute resources?
Harry: Not that simple. If the storage is redistributed, then the CPU to 
balance it also has to be redistributed
Q: And the redistribution of storage only works if you delete the first copy

Q: What about T2?
Harry: Didn't look into that. Hopefully, each T1 will look after its T2s. In 
most regions, the T2s are in a good state

Q: What is the problem that the sites are having? Is it due to the need to buy 
the cheapest computers?
Wolfgang: It is better to order from several companies. At least two, if not 
three, so that nobody depends on a single source


____________________________________________________

LHCb Status, Plans and Requirements for May run and beyond Nick Brook 
____________________________________________________


Tasks for the CCRC:
  o) Data from pit to T0 (~ 70 MB/s, in total 84 TB)
  o) Distribution To-> T1 (~ 35 MB/s)
  o) Reconstruction at T0T1
  o) Stripping of data at T0T1
  o) Distribution of DST to all other centers. Each site needs 16 TB to store 
all the ESD)

At the same time, run fake analysis. Number of jobs: ~52K recon, and 17.5k 
stripping

Data access is still one of the big things to test.
There will be a new version of DIRAC


Questions and answers:
**********************
Templon: How many input and output files do you have in your analysis?
Nick: One input, one output

Templon: Data access is still a problem. At some point you were using gridftp. 
Are you working in other options?
Nick: We don't use gridftp anymore

____________________________________________________

ATLAS Status, Plans and Requirements for May run and beyond Kors Bos 
____________________________________________________

There is another WLCG ATLAS meeting going on. The things presented by Kors 
could be modified by the output of the other meeting.

The metrics and milestones for the CCRC are also being defined at the moment


Plan for the May run:
1. T0 processing and data distribution
2. T1 data re-processing
3. T2 Simulation Production
4. T0/1/2 Physics Group Analysis
5. T0/1/2/3 End-User Analysis

It usually takes 3 days to tune the detector.

Questions and answers:
**********************
Q: Who defines which percentage of data goes to each T2?
Kors: It is up to the cloud to decide

Q: Are you going to test pinning the data?
Kors: No, we are not.

Q: The condition data, is it going to be used in the same way that during data 
taking?
Kors: This will be discussed tomorrow

Q: The numbers in slide 20 don't match!
Kors: Sorry, there is a mistake with those numbers

Q: For the T2 sites, it would be nice if the requests from ATLAS are 
implemented in all the mayor tools (dpm, dcache and castor).
Kors: This will also be discussed tomorrow. It is quite clear what we want, 
although it is not clear how to implement it.

____________________________________________________

CMS Status, Plans and Requirements for May run and beyond Daniele Bonacorsi 
____________________________________________________

Main activies:
o)Tier0  workflows
     Cessy ->CERN transfer tests
     Tier0 processing
     CAF workflows
o)Distributed data transfer tests
    T0->T1, T1->T1, T1->T2, T2->T1
o)T1 workflows
    Reprocessing
    Skimming
o)T2 workflows
    MC production
    Analysis
o)Monitoring (metrics still to be established)

During May, CMS will run the CCRC and iCSA08

The iCSA goals are related to the commissioning and the physics schedule

For CCRC, the goals are also to include all the sites that have to be ready, 
and prove that the analysis load can be sustained

Most of the infrastructure is already in place, although a couple of minor 
things will be finalized by the end of April

The communication flow  has to be designed with care


Questions and answers:
**********************
Gordon: Could you please tell is the disk and cpu requirements?
Daniele: The numbers are not ready. Hopefully they will be ready by the end of 
this week

____________________________________________________

ALICE Status, Plans and Requirements for May run and beyond Latchezar Betev 
____________________________________________________
Tasks for the CCRC:
  o) Registration of data in CASTOR2 (T0) and on the GRID
  o)Replication T0->T1
  o) Conditions data gathering and publication on the GRID
  o)Quasi-online reconstruction – special emphasis
     o)Pass 1 at T0
     o)Pass 2 at T1s
     o)Replication of ESDs to CAF/T2s
  o) Quality control
  o)MC production and user analysis at CAF/T2s –


The ALICE commissioning exercise starts on the 18 May. The last two weeks of 
CCRC, ALICE will run with data from this exercise

For the first two weeks, ALICE can participate to the T0-T1 transfers

Questions and answers:
**********************
Q: In the February exercise, ALICE wrote 28T to SARA. What do we have to do 
with that data?
Latchezar: The data was sent to a SE that can be recycled. That data is not 
needed

Templon: Do you hav any Zero-suppression online?
Latchezar: There is some. However, most people prefer to write more data

Harry: So the data transfered between the 5 to the 18th of May doesn't have to 
be kept. Is that correct?
Latchezar: That's correct

____________________________________________________

Baseline Middleware Versions - what is in production Oliver Keeble 
____________________________________________________
New software to be used:
   o) LCG CE Released gLite 3.1 Update 20
   o) FTS (T0) Released gLite 3.0 Update 42
   o) FTS (T1) Released gLite 3.0 Update 41
   o) gFAL/lcg_utils Released gLite 3.1 Update 20
   o) DPM 1.6.7-4 Released gLite 3.1 Update 18

Questions and answers:
**********************
Q: what happens in non Scientific Linux systems? At the moment, there are some 
ignorable error messages for SUSE. Some centers can't choose the OS, so this 
is  an important problem Oliver:We are aware of that, and a serious effort is 
being done to solve that problem

Q. Is there a mailing list where this effort can be folloed?
Oliver: We have the bugs in Savannah for debian

Q: What about the voboxes, resource broker?
Oliver: There are no recent update for those services.



____________________________________________________

Baseline storage-ware versions - what is in production Flavia Donno 
____________________________________________________

o)CASTOR SRM v 1.3-20, backend 2.1.6-12
o)dCache 1.8.0-15
o) DPM 1.6.7-4
o) StoRM 1.3.20

Although there could be some last minute changes

Questions and answers:
**********************
MarioDavid: Only storm give static and dynamic solution for space tokens. If 
you want to configure space tokens, there are no dynamic plugins
   Flavia: There are some solutions, although they are not in these releases
Michel: For DPM, use the latest release


Q: Encourage maintainers to put all the release in the same place


Templon: What is the timeline for the GSI enabled xrootd?
Patrick: NOT FOR THE CCRC branch. 6 weeks from now.

Q: dcache and srmv2
  Flavia: with dcache 14, there were several problems 
____________________________________________________

Follow-up on Action Items from April CCRC'08 F2F Jamie Shiers 
____________________________________________________

o) Common convention for operator alarm mailing list:
  For T0, vo-operator-alarm@cern.ch
  For other sites, implement something similar (maybe vo-site-operator?)

Templon: 1 out of the 11 T1 is not happy with the solution
  Jamie: We could test these lists over the next weeks. Ask the sites at the 
operation meetings to  report any problem


Q: On the 4th march, there was a problem with the ATLAS alarm mailing 
list. It had wrong permissions, and anybody could send alarms
Jamie: Only members will be able to post.

Q: It is important that the experiment members see who is working on the 
problems


o) Negotiation.
Templon: Sites should know what they are doing. There is no good model 
right now.
Gordon: this was discussed at the GDB. Experiments should announce what 
they are doing to the GDB

(-- no comments from experiments --)

o) Intervention notifications:
Jamie: Sites should announce whenever they are down. At the moment, 
there is a enormous flood of emails. Does anybody feel this is a problem?

If you use only the GOC db, the experiments don't know. At the moment, 
the down time information is stored in the GOC db, and a notification is 
sent manually. It would be good if GOC db could generate the alarms or 
notifications.


Comment: there are so too many of those emails, so they are 
automatically deleted.  It would be nice to create a simple web page 
with the information of the down time
James: That's already in the gridmap (although it might not be obvious)
It would be good to send this info to the end users



o) Dashboards

Jamie: Information about where to look. Sites don't know which page 
they have to look at. Julia will give the answer on friday

Q: There is too much monitoring software!!
James: There is too much visualization, but all of them use the same 
information. We don't want everybody calculating different availability 
algorithms.  Make sure that data is gathered only once

James: We are working on reducing the number of sources of informations. 
For instance, RGMA, MonALISA dropped by OSG,  and the SAM transport 
layer has been replaced.
Q: OSG didn't drop MonALISA


o) How do sites figure out what's going on?
Related with the dashboards.

Templon: If there's something wrong, the experiments should tell the 
sites.

James: The wlcg portal still needs to be prototyped. Gridpp is different 
from the lcg.


o) Support issues:
discussed yesterday.



Are there any topics??

_____________________________________________

Post-mortem workshop

Jamie: The agenda is already there. It is divided by experiment, site, 
services. And of course, storage. Please, let us know if we have to 
spend more or less time in these topics.

Jamie: The Final items maybe defined at a later date (but please, don't 
do it 5 minutes before the workshop as usual).


The post-mortem workshop should be kept to only two days (thursday and 
friday). On the GDB on Wednesday, don't schedule any post-mortem discussion.


_______________________________________________

Templon: How are we going to measure yhe challenge and response time. 
What is the agreement for May? (ggus, elog)
  Jamie: This will be discussed during the daily meeting