Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This is the biweekly ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 78425 with code: 4880. Apologies: Wahid, Alessandra, Mark S, Andrew W [ScotGrid as a whole!]
Minutes
    • 11:00 11:20
      Meetings & updates 20m
      - ROD team update - Nagios status - NGI -- We need to set the UK dashboard to read UKI and NGI_UK sites to avoid a break in monitoring -- From JG "ask sites to change their BDII to NGI" - Tier-1 update - Security update Mail today about IP lists. -- T2 issues WLCG T2 results are now available for July: http://gvdev.cern.ch/GRIDVIEW/downloads/Reports/201107/wlcg/WLCG_TIER2_ACE_Jul2011.pdf. These repeat the EGI ones circulated last week: https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics. John asked “Is there anything in common among the sites in the 89-94% reliability range that has stopped them being 100?” - Any conclusions on "How do other people renew host certificates?" -- General notes. "A glite 3.1 UI will not work with an EMI WMS. If you have one lying around, now it really might be time to upgrade." DB Email addresses in UK CA certificates: http://nationalgridservice.blogspot.com/2011/08/on-email-address-in-host-certificates.html - Tickets Direct link: http://tinyurl.com/3jjnvca if not working Indirect link: https://ggus.eu/ws/ticket_search.php (select support unit 'ROC_UK/Ireland' and Creation Date 'Any') or paste https://ggus.eu/ws/ticket_info.php?ticket= and type a ticket number for the URL. 73182 IC=HEP. hone jobs cancel on one CE. Looks like missing env var. 73080 Brunel. biomed. SE transient errors seen in 'closed' Nagios. Seen again? 72959 RHUL. biomed. SE transient errors. SE problem noticed? 72903 Region. Configure Nagios for NGI. Progress? 72359 RAL myproxy for T2K. Cross-ref 72358. 72358 T2K myproxy... escalated to FTS developers. Current comment from ML: "AFAIK the FTS VOMS proxy renewal code is broken since a few years. The only workaround is for the client to delegate a proxy explicitly with glite-delegation-init, let transfers use that proxy, and keep refreshing it, e.g. in a cron job. That is what the LHC experiments do.." 72161: IC-HEP. T2K. 3TB spacetoken created. Waiting for user to test. 72160: Oxford. T2K spacetoken. User gave some indication of space needs! 72156: QMUL. T2K spacetoken. Waiting for user since 2nd August. 71640: Cambridge. Biomed. Files are being migrated off the SE. 68865: UCL-HEP. Retirement of SL4 and 32bit DPM Head nodes and Servers. On hold 68859: Durham. Retirement of SL4 and 32bit DPM Head nodes and Servers. On hold 68858: Glasgow. Retirement of SL4 and 32bit DPM Head nodes and Servers. On hold 68853: RAL T1. Master ticket. Brian reviewing recommended versions. 68077: RAL T1: Mandatory WLCG InstalledOnlineCapacity not published. Expect test version this month. 64995: RAL T1: No GlueSACapability defined for WLCG Storage Areas. should have something you can test this month (August.) 57746: Cambridge. Status not updated for job submitted to... ML discussing offline.
    • 11:20 11:40
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb - CMS - ATLAS - Other - Experiment blacklisted sites - Experiment known events affecting job slot requirements - Site performance/accounting issues - Metrics review
      ATLAS-squids
      Slides
    • 11:40 11:50
      CVMFS 10m
      - Directions (do we need to capture the email thread from last week into documentation!?) - What is required "if you want the latest installation and enable condb as well you have to follow this http://northgrid-tech.blogspot.com/2011/07/cvmfs-installation.html if follow that it will work, but be aware that as I said name of the machines and paths will change again soon."
    • 11:50 11:55
      WLCG middleware baseline 5m
      The current view: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions
    • 11:55 12:00
      CREAM and glexec status 5m
      - Which sites do not have CREAM - Which sites (except those needing the relocatable install) do not have glexec - Any progress on the relocatable glexec install? Birmingham: "CREAM and glexec are up and running. For CVMFS, we've ordered a new beefy server that we're going to shift several of our VMs over to and that will free up a couple of machines, one of which we're going to dedicate to CVMFS. Hopefully, this should happen within a few weeks!"
    • 12:00 12:01
      AOB 1m
      Andrew Elwell's request for feedback on machine management. Please complete the survey: http://bit.ly/onGUI9 "Figures for no. of physical (or virtual) boxes managed -- the 'no of machines' question relates to how many OS instances they manage".