Deployment team

Europe/Zurich
EVO - GridPP Deployment team meeting

EVO - GridPP Deployment team meeting

Jeremy Coles
Description
- This is the weekly DTEAM meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +41 22 76 71400. The phone bridge ID is ?????? with code: 4880.
Minutes
    • 11:00 11:20
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb -- Pilot job role status - CMS - ATLAS -- Token deployment - Other
    • 11:20 11:30
      ROC update 10m
      ROC update *************** SA1 coordination meeting takes place today: http://indico.cern.ch/conferenceDisplay.py?confId=39104. It concerns the EU review recommendations; QR issues and partner reviews. From this week's ops meeting: Nothing new to report WLCG update ***************** No GDB this week. Nothing new from the MB. Ticket status *************** https://gus.fzk.de/download/escalationreports/roc/html/20080811_EscalationReport_ROCs.html A request to stop the Footprints reminders has been sent. Are GGUS reminders arriving?
    • 11:30 11:40
      Security incident 10m
      - From report to PMB " A security problem was discovered last week and is still ongoing. The details of the compromise and the status of the follow up are still internal to the security teams. By way of summary, rootkits have been found installed on machines within three (sites and) countries. In some cases root has been compromised and ssh keys taken. There have been many ssh connections attempted subsequent to the keys being compromised. The earliest known breach happened at a UK site and a user's ssh key was taken and used to access their central account. As a precaution his grid certificates were revoked. Investigations at most sites are ongoing and compromised machines rebuilt. This incident has highlighted several areas of the process to be improved - for example what the user should do once their accounts are suspended, how grid security interfaces with site CERTs and group admins. Information flow between grid contacts and services is also to be looked at more closely." - Raises questions about local procedures - Raises questions about user actions/limitations (why is A still not able to work) - Have all sites responded that they have carried out checks?
    • 11:40 11:50
      CE stability 10m
      - Feedback yesterday suggests many CE related problems across GridPP sites - GridMap http://gridmap.cern.ch/gm/ once again looks worst for UKI -- ECDF - CE crashed -- RAL-PPD - redundant CE suffered load problems -- QMUL - CE unstable -- Brunel - WN-CE connectivity issue over weekend (Paul asks about current GridMap status - site is ok) -- IC-HEP - CE lcg-CA rpms out of date -- MAN - ce02 unstable for quite some time - What tickets if any have been raised? - Are the underlying issues similar or specific to each site? - What can we do to improve situation?
    • 11:50 11:55
      Job issues 5m
      1) Instances of incorrect VO mapping - recent TB-SUPPORT mails suggests biomed users mapped as dteam - any other instances? JG mentioned site accounting showing work for unsupported VOs! - how can this be prevented? 2) Biomed jobs also hint that multithreaded jobs are becoming an issue - what is the impact? - what can be done to prevent it? - What is going to be our (KG) strategy?
    • 11:55 12:00
      imense job data 5m
      - KH has provided some data on camont submissions over the last month (about 4000 jobs) - Graphs (nb. these are stacked histograms) are here: http://www.hep.phy.cam.ac.uk/~harrison/imense/performance/ - What can we learn from them? -- Shows improvement with WMS over RB - odd tail effects -- Indicates performance differences across clusters -- Show similar connectivity performance
    • 12:00 12:05
      AOB 5m
      1) UKQCD "regarding setting up a UKQCD VO within GridPP. I am to be the UKQCD VO manager. We are already part of a larger VO (the ILDG) which is administered from DESY. Ideally we would "mirror" that information in the gridPP VO. How should we proceed?"
    • 12:05 12:25
      Actions review 20m
      http://www.gridpp.ac.uk/wiki/Deployment_Team_Action_items