Operations team & sites

Europe/London
EVO - GridPP Operations team & sites meeting

EVO - GridPP Operations team & sites meeting

Jeremy Coles
Description
- This is the biweekly ops-team & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 77907 with code: 4880. Apologies: Alessandra, Matt
Minutes
    • 11:00 11:20
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb -- http://lhcbweb.pic.es/DIRAC/LHCb-Production/visitor/jobs/SiteSummary/display - CMS - ATLAS --> Procedure for testing new CEs at sites. - Other - Experiment blacklisted sites - Experiment known events affecting job slot requirements - Site performance issues - Site downtimes
      ATLAS-report
    • 11:20 11:30
      Meetings & updates 10m
      - ROD team status (any points to raise to sites or issues to follow up?) - Tier-1 update -- What do most of those attending this meeting want from the Tier-1 report? Is it useful and needed? Anything missing and will a written only report do? Operational security -- Checking results at https://pakiti.egi.eu This week's GDB (https://indico.cern.ch/conferenceDisplay.py?confId=106644): The WLCG information system HEPiX EOS Current middleware issues The EMI-1 release WLCG middleware support HEPiX virutalisation working group - Experiment positions - WLCG position summary Escalated tickets https://gus.fzk.de/download/escalationreports/roc/html/20110509_EscalationReport_ROCs.html https://gus.fzk.de/ws/ticket_info.php?ticket= 57746 - 64995 - RAL T1. No GlueSACapability defined for WLCG Storage Areas (hold) 65700 - Durham. lcg-cr error on se01.dur.scotgrid.ac.uk (hold) 65991 - Encrypted DN not passed to accounting data (reopened from 46024) 66564 - WLCG sites not publishing the GlueCECapability shared. 67365 - WLCG sites not publishing the GlueCECapability Share. Can this now be closed? 68077 - RAL-LCG2. Mandatory WLCG InstalledOnlineCapacity not published. Needs a T1 update. 67945 - Mandatory WLCG GlueSAReservedOnlineSize value not published. QMUL only now. Waiting on Storm? (Both Bristol and QMUL were declared fixed, so I closed it - Daniela) 67954 - relates to 67945 issue. QMUL? No - this is waiting for a comment on Bristol - Daniela GGUS open tickets: http://tinyurl.com/6z6uq5v
      EGI-April-availability
    • 11:30 11:35
      Security 5m
      - The next round of Security Service Challenges start soon -- SSC5 involves one site from each Tier-2 (it is EGI wide). There is not expected to be any reason for job throughput to be affected and the challenge is partly about checking communication and community support. The sites selected are Glasgow, Lancaster and Cambridge. Awaiting LondonGrid "random" selection. -- SS4 will involve all sites. As preparation for both challenges site admins are encouraged to review the response procedures and other material here: https://www.gridpp.ac.uk/security/ssc/index.html. This will ensure that you know what to do and remind you of conclusions (areas to improve) from previous challenges.
    • 11:35 11:45
      glexec & ARGUS status 10m
      Snapshot taken from http://www.gridpp.ac.uk/wiki/Site_status_and_plans for yesterday's PMB: Deployed: 5 sites (26%) In progress: 6 sites (32%) To start: 5 sites (26%) No plans: 3 sites (16%) A tarball install is wanted/needed by 6 sites (for primary or shared clusters). The individual site positions are: 1. Brunel: glexec to be deployed in April. 2. Imperial: testing a hacked tarball glexec install with SGE. Waiting for EMI-1 ARGUS release. 3. QMUL: Not yet deployed. Need version compatible with tarball WN install. 4. RHUL: Installing ARGUS now. Plan to install glexec by 20th May. 5. UCL: Waiting for tarball installation (will deploy on HEP and Legion). 6. Lancaster: Had earlier test installation. Tarball install wanted. 7. Liverpool: ARGUS running. Glexec on test node. Waiting for request to fully rollout. 8. Manchester: ARGUS installed. Working on glexec. 9. Sheffield: Will deploy in June. 10. Durham: glexec not needed for supported VOs. Could be deployed. 11. ECDF: No firm objections but have no current plan to deploy. SGE. 12. Glasgow: SCAS and glexec deployed since November. ARGUS planned for May. Will put in production upon request. 13. Birmingham: Deployed and in testing on local cluster. Tarball needed for shared cluster. 14. Bristol: Waiting for better resourced sites to deploy first. 15. Cambridge: Reviewing compatibility with Condor batch system. 16. EFDA-JET: Not a major analysis site. Low priority. 17. Oxford: Deployed across WNs with ARGUS server back end. 18. RALPP: Installed and working. 19. RAL Tier-1: Installed and working. - What issues have you been having? - Feedback from those sites that have deployed plus pointers to wiki entries that will help other sites.
    • 11:45 11:50
      AOB 5m