Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This is the biweekly ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 78425 with code: 4880. Apologies: Matt, Alessandra, Andrew, Raja, Santanu
    • 11:00 11:20
      Meetings & updates 20m
      - ROD team update QMUL, RHUL and ECDF had few Nagios failures. In RHUL, it was to due reinstallation of a few WN and in QMUL it was due to black holing of one of the WNs. RALPP had a few intermittent failures likely due to a repeat of https://savannah.cern.ch/bugs/index.php?78721." -- Team rota updates -- Applying for support role in NGI_UK - Nagios status - Tier-1 update - Security update -- T2 issues Please recheck your site publishing here: http://wlcg-rebus.cern.ch/apps/capacities/sites/. This is to enable some checks on usage and for 'other' VO capacity figures. -- General notes. - Tickets Direct link: http://tinyurl.com/3jjnvca if not working Indirect link: https://ggus.eu/ws/ticket_search.php (select support unit 'ROC_UK/Ireland' and Creation Date 'Any') or paste https://ggus.eu/ws/ticket_info.php?ticket= and type a ticket number for the URL. -> No new tickets to cover this week. Any updates to be discussed?
    • 11:20 11:40
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb No major issue at either the Tier-1 or the Tier-2s. Total LHCb for 2011 : · Luminosity : 745 pb-1 (Delivered) 669 pb-1 (recorded) · Beginning and end of the week marred by power cuts which caused various compressor failures. RAL issues : · 3 files lost from a bad tape - recovered from other sites. · Disk-server gdss416 went briefly offline on 18 August. No files lost and disk-server came back after fsck. Tier 2 : · No Tier-2 issues found. Not much MC production running on the grid. - CMS - ATLAS - Other - Experiment blacklisted sites - Experiment known events affecting job slot requirements - Site performance/accounting issues - Metrics review
      ATLAS-report
    • 11:40 12:00
      Site issues 20m
      - Current activities at each site Cambridge: upgraded 5 of the disk-servers to SL5 and having some post-update issues. Work is in progress.
    • 12:00 12:01
      AOB 1m