Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

CCRC'08 planning conference call

Europe/Zurich
513 R-070 (CERN)

513 R-070

CERN

Description
Mailing list: wlcg-ccrc08@cern.ch

Web archive: here

To join the call, do one of the following:

  • Dial +41227676000 and enter access code 0121632, or
  • To have the system call you, click here
(Leader code is 0111659).
Summary of CCRC08 planning conference call of Jan 28 2008 The meeting was chaired by J.Shiers with notes taken by H.Renshall. The agenda with attached material is at http://indico.cern.ch/conferenceDisplay.py?confId=26923 and the associated Twiki is at https://twiki.cern.ch/twiki/bin/view/LCG/WLCGCommonComputingReadinessChallenges Representatives Present: Tier0 (M.C-S), WLCG (JS, HR, GM), ALICE (PM, LB), ATLAS (SC, KB), LHCb (RS), CMS (AS, DB), TRIUMF (RT), NL-T1 (RT), PIC (GM), BNL (ME), IN2P3 (FH) Papers concerning current service issues of concern and possible service interventions are also attached to the agenda. Minutes of the previous meeting: (HR) No comments. CCRC'08 Calendar: (JS) The ccrc08 Wiki now includes a tabular calendar (thanks to P.Mendez) of major activities. We invite experiments to send us their additions such as other significant activities, software release dates etc. Items should be high level but relevant. The calendar currently only runs to June but will be extended to cover the whole year. JS queried if it should also have a site view but the consensus was to keep it with experiment views only. Metrics : (JS) There are still some gaps in the metrics that the MB wants filled. DM group have now proposed metrics for the conditions data bases (attached to the agenda) and experiments and Tier 1 are invited to comment and think of establishing metrics for their own database services. Metrics for the conditions service - akin to those defined initial for DBL3 and later extended to HEPDB in the LEP era - should also be established. (Conditions data must be made available within one hour onsite and a few hours at external sites; data integrity metrics etc.). I would like comments on this from the experiment reprocessing experts. Known Issues and Workarounds: GM said there is a new issue of Tier 2 sites migrating to srm2 where there was a correlation with an OSG rollout in the US. Atlas would like as many Tier 2 on srm2 as possible. The FTS team think about 100 are but will check. In the service issues JS said the SRM second point of not choosing the correct pool for a bring online operation in dcache will affect reprocessing and may affect data export by filling the wrong pools. If we get this far in the February functional tests we will have learned a lot but it must be resolved for the May run. JS has also added to the agenda, and ccrc08 Twiki, an associated service interventions page where we have put known intervention plans at CERN during February. He asked what other concerns and plans there are from the sites and experiments. LB pointed out that there are storage solutions being proposed in the service issues but which do not have a matching intervention proposed e.g. Castor and dpm upgrades. JS agreed these should be added to the interventions page. AOB: The Feb 5 pre-GDB face-to-face meeting will review the remaining service, site and experiment concerns and issues. H.Renshall IT/Grid Support
There are minutes attached to this event. Show them.
    • 17:00 17:05
      Minutes of the previous meeting 5m
      Summary of CCRC08 planning conference call of Jan 21 2008 The meeting was chaired by J.Shiers with notes taken by H.Renshall. The agenda with attached material is at http://indico.cern.ch/conferenceDisplay.py?confId=26870 and the associated Twiki is at https://twiki.cern.ch/twiki/bin/view/LCG/WLCGCommonComputingReadinessChallenges Representatives Present: Tier0 (M.C-S), WLCG (JS, HR, MG, AA), GD (AdM), ALICE (PM, LB), ATLAS (AdG), LHCb (RS), CNAF (LdA), FZK (AH, DR), IN2P3 (FH), RAL (AS), NL-T1 (MvdS), PIC (GM), CMS (DB joined later) Tier0 DB Resource Allocation for February: was presented by M.Girone (slides attached to agenda). They have asked experiments what changes (e.g. in DB volumes or server priorities) they will need at T0 and T1 sites for CCRC'08 and also when they expect the associated workloads to become significant. So far they have a reply from LHCb. R.Santinelli asked what happened to streams replication when the recent LHCb bulk LFC changes were made - he had heard that the PIC LFC crashed. MG said there were 10 million changes and they will analyse then give some recipes. She will check with PIC (reported later to be an out of memory problem from an over-large commit - now fixed). Status of SRM v2.2: J.Shiers reported that as far as we know all T1 have upgraded to this level. dcache sites should get version 1.8.0-12 from dcache.org (later info from P.Fuhrman - also take patches 1 and 2). Status of client tools and and services: A.di Meglio reported some changes to the baseline versions linked to the CCRC'08 Twiki namely that the dpm versions are now 1.6.7-1 for slc4 and 1.6.7-2 for slc3. He expected a new lfc to be certified today and a new gfal/lcg-utils to become available today. Baseline Storage Services: J.Shiers said they intend to document known problem features, and any work-arounds, in the baseline versions that will be used in February. The intention is to freeze the baseline versions next week so the experiments can start to ramp up their activities. NL-T1 reported they had problems in the gPLAZMA component to get the dcache space manager working. JS confirmed the intention was that next week be a baseline software stability week allowing final experiment setting up for the 4 February start. He proposed to organise a site by site readiness review of the Tier1. M Coelho dos Santos for CASTOR/CERN reported they have a version 2.1.6 in test with CMS intending to upgrade the CMS production CASTOR instance this week. Any upgrade of the other 3 experiments would however be in the 'stability' week. There is no problem to have mixed CASTOR vesrsions at a site and the Tier1 can stay on CASTOR 2.1.4 as CNAF and RAL have already announced. Update from Experiments: There was no report from CMS (there is a CMS week in Lyon and DB joined later). L.Betev reported for ALICE that they are looking at directory structures to deploy their CCRC'08 space requirements and are in contact with all of their sites. They were not sure of the availibility of the CASTOR xrootd plugin at CNAF. M.CdS said that there is a plugin for both the 2.1.4 and 2.1.6 versions of CASTOR and that a version with fixes for ALICE is under test. He recommended CNAF install the current plugin since a later upgrade is then easy. Schedule and Metrics: J.Shiers gave an overview of his paper (attached to the agenda). The schedule for February is very tight, March includes the Easter holiday then in April there is a WLCG collaboration workshop at CERN. The intention is to use the April face-to-face CCRC'08 meeting to decide on the software releases to be used in May with the target date for deployment of April 15. For the February run we have three sets of metrics - the scaling factors for the experiment functional blocks which we expect them to monitor and report on, the problem resolution targets of the experiment critical services (note scalability such as of streams replication does not have a target) and the MoU site and service availibility targets. These latter two will be followed by the daily and weekly CCRC meetings. JS said he was worried there are some services with no February targets leaving only the May run to attempt to reach them. He requested experiments to give complete metrics for everything they want to test in their functional blocks for next Monday's meeting. We will continue the Tuesday to Friday daily meetings in February but merge the weekly CCRC'08 planning meeting into the EGEE operations meeting (16.00 on Mondays). We encourage the Tier1 to join the daily meeting, usually less than 15 minutes, then will resume this weekly CCRC'08 planning conference call in March. AOB: For CMS D.Bonacorsi said that the information systems were still not publishing enough information on sites spaces for CMS needs. JS promised to raise this at the MB tomorrow. L.Betev for ALICE confirmed they had circulated all their Tier1 space requirements and agreed to link this information into the CCRC'08 Twiki. For PIC G.Merino reported they were having problems deploying new cpu capacity in blade servers. They would very much like to know the number of batch job slots they are expected to provide per experiment as already documented by LHCb. JS will bring this to the MB. Finally JS announced the intention to update the baseline software versions list on Wednesday and that there will be a last weekly CCRC'08 planning meeting before the February run on next Monday, 28 January.
    • 17:05 17:15
      CCRC'08 Calendar 10m
      Calendar (wiki page)
    • 17:15 17:35
      The Metric - additional input 20m
      Following the MB discussion on the attached paper, it was agreed that it is too risky not to test against pre-agreed metrics for all aspects of the February run and that we should identify areas where metrics are missing.

      In addition to the DB service metrics - and in the absence of any update from the experiments - the DBL3 / HEPDB metrics will be used, i.e.

      • (calibration data) must be available online within one hour on 'the offline computer'
      • must be available within a few hours on other sites
      Paper
      Slides
    • 17:35 17:55
      Final Readiness Check for February run of CCRC'08 20m
    • 17:55 18:15
      Known Issues & Workarounds 20m
      KnownServiceIssues
      Possible CCRC'08 Service Interventions
    • 18:15 18:20
      AOB 5m