Deployment team

Europe/Zurich
EVO - GridPP Deployment team meeting

EVO - GridPP Deployment team meeting

Jeremy Coles
Description
- This is the weekly DTEAM meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +41 22 76 71400. The phone bridge ID is 353506 with code: 4880.
Minutes
    • 11:00 11:20
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb - CMS -- T2 impacts of limiting non-production jobs at RAL T1 - ATLAS - Other
    • 11:20 11:35
      ROC update 15m
      ROC update *************** QMUL: Urgent issue over unresolved R-GMA problem. COD is moving towards suspension of the site. The Pre-Production Service is going to be formally split into two different classes of services: (a) The Middleware Quality Services, focused on deployment and release testing, closer to certification. (b) The Middleware Preview Services, focused on the presentation in preview of new versions of clients and services to the end-users. This idea was introduced at the EGEEII->EGEEIII transition meeting (http://tinyurl.com/6afws9). We are being asked if there are any production sites potentially interested in some particular pilot activities (http://tinyurl.com/5rmeck). WLCG update ***************** GDB next Wednesday http://indico.cern.ch/conferenceDisplay.py?confId=20230. Check of who is T2 rep this month. Where is the summary from May? TCG = TMB news ********************* Ticket status *************** https://gus.fzk.de/download/escalationreports/roc/html/20080602_EscalationReport_ROCs.html
    • 11:35 11:45
      Site issues 10m
      - Input on availability http://www.gridpp.ac.uk/wiki/SAM_availability:_October_2007_-_May_2008 -- SouthGrid - not started -- London - 1 site started -- NorthGrid - 1 site started -- ScotGrid - just Durham left - The need to follow up on comments in the site reports -- Regular slot at this meeting? Examples from this week: Brunel: "There have also been multiple instances of transfers failing as the pool node disk is full (on several pool nodes!) - isn t the head node supposed to select a target with sufficient space?" "Re-started CE but >50 calicesgm jobs started at once and filled up instantly home directories. This happened despite still being in downtime. Had to kill all jobs, clean-up homedirs. Re-ran YAIM before bringing the CE again online" - Are the regional reports checked by the T2Cs? - Would a site-by-site summary every (other) week be useful?
    • 11:45 11:55
      CCRC - site lessons 10m
      - What has been learnt site-side? - Areas that need to be addressed - Performance tuning - Availability of sites during upgrades - Stability of sites - Do sites using certain m/w or supporting a given VO see more problems? - Did sites with Nagios perform better? - What next?
    • 11:55 12:05
      Actions review 10m
    • 12:05 12:10
      AOB 5m
      - There is a draft of a revised operational procedures manual for EGEE sites (http://tinyurl.com/6q3wyx). The updates can be viewed here: http://tinyurl.com/5rz774. - Latest on the UI front!