Core-ops tasks

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This is the start of a biweekly meeting to review ops core tasks - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 126179 with code: 4880. Apologies: Mark M; Mingchao M
    • 11:00 11:15
      Documentation 15m
      https://www.gridpp.ac.uk/wiki/Documentation F2F - States & updates - Create schematic of our document life-cycle - Checks for broken links (on web pages) - automate? - Apply 80:20 rule. What documentation is useful? Standards/survey - Highlight duplication - Open dialogue with EGI documentation group - Assign review task areas to members of core-ops group - Tagging pages - Templates 17th Feb - schematic - VOMS data key document - Content not in twiki but will be - Basic operations absent - Workflow of job & how info system works - Cheat sheet
      Speaker: Andrew/Steve
    • 11:15 11:30
      Monitoring 15m
      https://www.gridpp.ac.uk/wiki/Monitoring F2F - Review monitoring tools (what is needed and what is useful) - Provide feedback on WLCG dashboards - Recommendations for internal monitoring (local nagios vs Ganglia) and pull together best practices - Better define monitoring and regional tools - Devise plans on and advise sites on perfsonar (LHCONE) - GridMon resurrection and best use - Sites dashboards (ala Glasgow example) Comments: 1) What is the state of GridMon? 2) Do most people understand LHCONE aims? 3) DRI purchases will need tuning 4) What feedback has been given on dashboards? 5) We need more guidance to sites on areas like http://dashb-siteview.cern.ch/templates/siteview/index.html. 17th Feb - scope how best to do things (input from TEGs) - Put up pages for local tools - (internal monitoring wiki) - Observation - too many pages in this area - Place most useful information behind site status dashboard - Issue with contradictory results (provide examples) - Current monitoring often little information about what is wrong if site failing - Is there a document listing the tests sites are expected to pass
      Speaker: David
    • 11:30 11:45
      Accounting 15m
      https://www.gridpp.ac.uk/wiki/Accounting F2F: - Tracking APEL issues - Rules of publishing - Recommendations on gLite cluster and configuration - Cross-checking site publishing - SL metrics vs HS06 APEL - Keeping a record of accounting issues .. see also the wiki page Comments: 1) Storage accounting is hot topic in EGI 17th Feb - GridPP metrics need revision - Capture reasons in wiki for others (simulation under fast/full labelled with same tag etc. but has large impact on simulation weights) - SL6 tests & benchmarking
      Speaker: Alessandra/Rob
    • 11:45 11:55
      Security 10m
      - Next SSC end March (pending pilot and tools test) - Self audit and best practice - glexec directions - Policies review - Contribution to security TEG - Default s/w configurations (are they secure mySQL for example needs further locking down) - UIs best practice - Review of open ports required - Support incident follow-up
    • 11:55 11:56
      AOB 1m
      - Next group review Friday 16th March at 11am.