LCG Service Coordination Meeting

Europe/Zurich
28/S-029 (CERN)

28/S-029

CERN

35
Show room on map
Jamie Shiers
Description
LCGSCM

Internal meeting for host laboratory LCG service providers focussing on issues on a timescale of ~1 week to ~1 month.

Mailing list: lcg-service-coordination-meeting@cern.ch

Minutes
    • 10:00 10:20
      Outstanding Issues & Actions 20m
      • Reinstallation of service nodes - target end 2006
        Should be 'history' by now?
      • Experiment servers / services: status
        See CMS task force of Sep 20 and this wiki
      • CASTOR interventions - update on scope & schedule
        This should cover not only the Western Digital interventions, but also the various - many other things coming along:

        • Pending moves to new h/w of the various CASTOR components
        • Upgrade of CASTOR s/w components
        • etc.

        All these things have to be scheduled in / around the expeirments' FDR preparations and on-going productions

      • Cleaning of databases prior to multi-VO tests
        On several occasions, the database behind various services have needed to be "cleaned" to prevent performance degradation. This has applied to CASTOR DBs, FTS, dCache etc. Have all the necessary cleaning operations been performed at CERN and outside sites ready for the multi-VO throughput tests starting March 26th? Should these operations be performed regularly? Automatically?
      • Service Resilience to "glitches" <EM>(place-holder - no action expected until next round of updates???)</EM>
        How robust are the current services to short-term "glitches" (longer term outtages is a separate but needed discussion...)

        Examples include:

        • Power glitch - all machines protected up to 10' (the definition of a "glitch" (A fault or defect in a system or machine)?)
        • Rolling upgrade / loss of contact to DB backend
        • OS patch / upgrade
        • m/w patch / upgrade

        More major interruptions - such as loss of network switch / non-rolling DB upgrade - need also to be considered and where possible protected against.

        H/W upgrades / moves -> rolling where possible?

      • Oracle Critical Patch Update
        Critical Patch Updates - see Release Schedule

        Starting 2005, Critical Patch Updates are the primary means of releasing security fixes for Oracle products. They are released on the Tuesday closest to the 15th day of January, April, July and October. The next four dates are:

        • 17 April 2007
        • 17 July 2007
        • (16 October 2007)
        • 15 January 2008
      • Other changes in the pipeline
      • Service preparations for Full Dress Rehearsals
        DB Service Status
        FTS 2 status / plans
    • 10:20 10:40
      LCG Service Review 20m
      • <a href=https://twiki.cern.ch/twiki/bin/view/LCG/LcgScmStatusDeploy">Certification/Pre-production</a>
        Highlights:
      • LFC/DPM 1.6.4-2 and FTS 2.0 are both in certification.
      • In the PPS, bugs involving GridFTP segfaulting have been found in the native SL4 WN.
      • In the PPS the upgrade path from the interim SL4 WN to the native SL4 WN has been found not to work. This is probably a show-stopper for production.
      • Highlights of bugs currently in PPS:
        1115 New version of lcg-info with support for VOViews, sites and services
        1101 GFAL 1.9.0-2/lcg_utils 1.5.1-1
      Speakers: Nick Thackray, Oliver Keeble
  • <a href="https://twiki.cern.ch/twiki/bin/view/LCG/LcgScmStatusMLR">Monitoring, Logging & Reporting</a>
    Speakers: Ian Neilson, James Casey
  • Core Grid Services
    Speakers: Gavin McCance, Jan van Eldik, Thorsten Kleinwort, Ulrich Schwickerath
  • Fabric & Infrastructure Services
    Speakers: Maria Dimou, Maria Girone, Remi Mollon
  • Experiment Issues
    wiki page

    The primary purpose of this item is to bring up any (hopefully rare) issues that are in danger of becoming "hot" if not given attention.

    By definition, this must be used sparingly (i.e. not for every single problem seen by an experiment) and experience has shown that these issues should be raised in advance, e.g. the day before (or even earlier if possible...)

    Speaker: EIS Team
  • 10:40 10:45
    Any Other Business 5m
    • WLCG RAC Weekly Report 15m
      Speaker: Miguel Anjo