____________________________________________________ ____________________________________________________ Planning for May run of CCRC'08 ____________________________________________________ ____________________________________________________ The LHC Status and Schedule (Jos Engelen ) ____________________________________________________ The experiments have to be closed by June. Detector cooldown starts in one week in some sectors. It will be all over the detector by the end of June (it takes 9 weeks to cool down and stabilize) The schedule still has to be confirmed at the end of May. The plan is to commission for 5TeV. It needs: one month to check the machine. Then the beam will start. Two months beam commissioning, and then plot run of a few weeks. Questions and answers: ********************** Kors: Once the detector runs, is there a standard running cycle? Jos: No final decission yet. Probably around 3 weeks running, 2 weeks maintenance Question: When is the first lead beam schedule? Jos: Not in 2008. If everything goes according to plan, in 2009 Jamie: Any wishes or hopes for the CCRC? Jos: Good luck! CCRC is a very important exercise Dario: What is the minimum length of the winter shutdown? Jos: It will be determined by the completion of the runs, and agrred between machine and experiments. For instance, CMS will need to install new parts. It will probably be around 3.5 months ____________________________________________________ Resource / Service Review - What can we count on for 2008 production? Harry Renshall ____________________________________________________ The numbers presented could be slightly modified after the presentations that the experiments will give. All the CPU pledge for 2008 will be there for 1 April. For disk and tape, some sites will still have to catch up. The May run of CCRC will be with 55% of resources. Then individual reports from all the T1: o)ASGC all ready for 1 April o)CCIN2P3. CPU ready for May, tape for April and missing 50% of disk o)CERN. CPU and tape ready, disk part for May, part for June o) FZK/GridKa. CMS part all ready, ALICE part will be deliver in October (ALICE agreed to that) o)INFN/CNAF. CPU and storage for mid may o)NDGF. CPU for April, disk and tape delivered on demand o)NL-T1. 2007 pledge in April, 2008 pledge in November o)PIC. CPU April, storage from June to October o)RAL. Full pledge 1 april O)TRIUMF. Full pledge ready o)US-ATLAS. CPU and tape in April, disk during the summer o)US-CMS. CPU by May. Storage waiting on transition to LT04 To sum up, the CPU is all delivered. Storage is the most critical part. Questions and answers: ********************** Ian: Almost all the sites had problems to deliver the pledge storage, so there are reasons to be worried. We can discard the data of this may, but not of next year!! Harry: The message has been passed loud and clear to the sites already. Q: Would it be possible to redistribute resources? Harry: Not that simple. If the storage is redistributed, then the CPU to balance it also has to be redistributed Q: And the redistribution of storage only works if you delete the first copy Q: What about T2? Harry: Didn't look into that. Hopefully, each T1 will look after its T2s. In most regions, the T2s are in a good state Q: What is the problem that the sites are having? Is it due to the need to buy the cheapest computers? Wolfgang: It is better to order from several companies. At least two, if not three, so that nobody depends on a single source ____________________________________________________ LHCb Status, Plans and Requirements for May run and beyond Nick Brook ____________________________________________________ Tasks for the CCRC: o) Data from pit to T0 (~ 70 MB/s, in total 84 TB) o) Distribution To-> T1 (~ 35 MB/s) o) Reconstruction at T0T1 o) Stripping of data at T0T1 o) Distribution of DST to all other centers. Each site needs 16 TB to store all the ESD) At the same time, run fake analysis. Number of jobs: ~52K recon, and 17.5k stripping Data access is still one of the big things to test. There will be a new version of DIRAC Questions and answers: ********************** Templon: How many input and output files do you have in your analysis? Nick: One input, one output Templon: Data access is still a problem. At some point you were using gridftp. Are you working in other options? Nick: We don't use gridftp anymore ____________________________________________________ ATLAS Status, Plans and Requirements for May run and beyond Kors Bos ____________________________________________________ There is another WLCG ATLAS meeting going on. The things presented by Kors could be modified by the output of the other meeting. The metrics and milestones for the CCRC are also being defined at the moment Plan for the May run: 1. T0 processing and data distribution 2. T1 data re-processing 3. T2 Simulation Production 4. T0/1/2 Physics Group Analysis 5. T0/1/2/3 End-User Analysis It usually takes 3 days to tune the detector. Questions and answers: ********************** Q: Who defines which percentage of data goes to each T2? Kors: It is up to the cloud to decide Q: Are you going to test pinning the data? Kors: No, we are not. Q: The condition data, is it going to be used in the same way that during data taking? Kors: This will be discussed tomorrow Q: The numbers in slide 20 don't match! Kors: Sorry, there is a mistake with those numbers Q: For the T2 sites, it would be nice if the requests from ATLAS are implemented in all the mayor tools (dpm, dcache and castor). Kors: This will also be discussed tomorrow. It is quite clear what we want, although it is not clear how to implement it. ____________________________________________________ CMS Status, Plans and Requirements for May run and beyond Daniele Bonacorsi ____________________________________________________ Main activies: o)Tier0 workflows Cessy ->CERN transfer tests Tier0 processing CAF workflows o)Distributed data transfer tests T0->T1, T1->T1, T1->T2, T2->T1 o)T1 workflows Reprocessing Skimming o)T2 workflows MC production Analysis o)Monitoring (metrics still to be established) During May, CMS will run the CCRC and iCSA08 The iCSA goals are related to the commissioning and the physics schedule For CCRC, the goals are also to include all the sites that have to be ready, and prove that the analysis load can be sustained Most of the infrastructure is already in place, although a couple of minor things will be finalized by the end of April The communication flow has to be designed with care Questions and answers: ********************** Gordon: Could you please tell is the disk and cpu requirements? Daniele: The numbers are not ready. Hopefully they will be ready by the end of this week ____________________________________________________ ALICE Status, Plans and Requirements for May run and beyond Latchezar Betev ____________________________________________________ Tasks for the CCRC: o) Registration of data in CASTOR2 (T0) and on the GRID o)Replication T0->T1 o) Conditions data gathering and publication on the GRID o)Quasi-online reconstruction – special emphasis o)Pass 1 at T0 o)Pass 2 at T1s o)Replication of ESDs to CAF/T2s o) Quality control o)MC production and user analysis at CAF/T2s – The ALICE commissioning exercise starts on the 18 May. The last two weeks of CCRC, ALICE will run with data from this exercise For the first two weeks, ALICE can participate to the T0-T1 transfers Questions and answers: ********************** Q: In the February exercise, ALICE wrote 28T to SARA. What do we have to do with that data? Latchezar: The data was sent to a SE that can be recycled. That data is not needed Templon: Do you hav any Zero-suppression online? Latchezar: There is some. However, most people prefer to write more data Harry: So the data transfered between the 5 to the 18th of May doesn't have to be kept. Is that correct? Latchezar: That's correct ____________________________________________________ Baseline Middleware Versions - what is in production Oliver Keeble ____________________________________________________ New software to be used: o) LCG CE Released gLite 3.1 Update 20 o) FTS (T0) Released gLite 3.0 Update 42 o) FTS (T1) Released gLite 3.0 Update 41 o) gFAL/lcg_utils Released gLite 3.1 Update 20 o) DPM 1.6.7-4 Released gLite 3.1 Update 18 Questions and answers: ********************** Q: what happens in non Scientific Linux systems? At the moment, there are some ignorable error messages for SUSE. Some centers can't choose the OS, so this is an important problem Oliver:We are aware of that, and a serious effort is being done to solve that problem Q. Is there a mailing list where this effort can be folloed? Oliver: We have the bugs in Savannah for debian Q: What about the voboxes, resource broker? Oliver: There are no recent update for those services. ____________________________________________________ Baseline storage-ware versions - what is in production Flavia Donno ____________________________________________________ o)CASTOR SRM v 1.3-20, backend 2.1.6-12 o)dCache 1.8.0-15 o) DPM 1.6.7-4 o) StoRM 1.3.20 Although there could be some last minute changes Questions and answers: ********************** MarioDavid: Only storm give static and dynamic solution for space tokens. If you want to configure space tokens, there are no dynamic plugins Flavia: There are some solutions, although they are not in these releases Michel: For DPM, use the latest release Q: Encourage maintainers to put all the release in the same place Templon: What is the timeline for the GSI enabled xrootd? Patrick: NOT FOR THE CCRC branch. 6 weeks from now. Q: dcache and srmv2 Flavia: with dcache 14, there were several problems ____________________________________________________ Follow-up on Action Items from April CCRC'08 F2F Jamie Shiers ____________________________________________________ o) Common convention for operator alarm mailing list: For T0, vo-operator-alarm@cern.ch For other sites, implement something similar (maybe vo-site-operator?) Templon: 1 out of the 11 T1 is not happy with the solution Jamie: We could test these lists over the next weeks. Ask the sites at the operation meetings to report any problem Q: On the 4th march, there was a problem with the ATLAS alarm mailing list. It had wrong permissions, and anybody could send alarms Jamie: Only members will be able to post. Q: It is important that the experiment members see who is working on the problems o) Negotiation. Templon: Sites should know what they are doing. There is no good model right now. Gordon: this was discussed at the GDB. Experiments should announce what they are doing to the GDB (-- no comments from experiments --) o) Intervention notifications: Jamie: Sites should announce whenever they are down. At the moment, there is a enormous flood of emails. Does anybody feel this is a problem? If you use only the GOC db, the experiments don't know. At the moment, the down time information is stored in the GOC db, and a notification is sent manually. It would be good if GOC db could generate the alarms or notifications. Comment: there are so too many of those emails, so they are automatically deleted. It would be nice to create a simple web page with the information of the down time James: That's already in the gridmap (although it might not be obvious) It would be good to send this info to the end users o) Dashboards Jamie: Information about where to look. Sites don't know which page they have to look at. Julia will give the answer on friday Q: There is too much monitoring software!! James: There is too much visualization, but all of them use the same information. We don't want everybody calculating different availability algorithms. Make sure that data is gathered only once James: We are working on reducing the number of sources of informations. For instance, RGMA, MonALISA dropped by OSG, and the SAM transport layer has been replaced. Q: OSG didn't drop MonALISA o) How do sites figure out what's going on? Related with the dashboards. Templon: If there's something wrong, the experiments should tell the sites. James: The wlcg portal still needs to be prototyped. Gridpp is different from the lcg. o) Support issues: discussed yesterday. Are there any topics?? _____________________________________________ Post-mortem workshop Jamie: The agenda is already there. It is divided by experiment, site, services. And of course, storage. Please, let us know if we have to spend more or less time in these topics. Jamie: The Final items maybe defined at a later date (but please, don't do it 5 minutes before the workshop as usual). The post-mortem workshop should be kept to only two days (thursday and friday). On the GDB on Wednesday, don't schedule any post-mortem discussion. _______________________________________________ Templon: How are we going to measure yhe challenge and response time. What is the agreement for May? (ggus, elog) Jamie: This will be discussed during the daily meeting