Core-ops tasks

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This a meeting for the review of the ops core tasks - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the Janet(UK) Community area. - Direct EVO link: http://evo.caltech.edu/evoNext/koala.jnlp?meeting=vsvivIeueMIMIeauavItas - The phone bridge number is +44 131 474 4520 (CERN number +41 22 76 71400). The phone bridge ID is 669 7242 with code: 4880. Apologies: Andrew M
    • 11:00 11:10
      Documentation 10m
      https://www.gridpp.ac.uk/wiki/Documentation https://www.gridpp.ac.uk/php/KeyDocs.php Issues: - Team changes have left some documents without owners - Some documents are not being updated even with reminders - VO approvals information not propagating. - Blogs being updated infrequently Concerns: 1) Grid interoperation. [+ Now a combination of David and Raul] . Docs still need attention. 2) Monitoring tools: (+) November 2013. 3) EGI early adopters list. Points to EGI page https://www.egi.eu/earlyAdopters/table but that table is inaccurate and last updated 22 March 2012 (e.g. Santanu and Stuart remain). ... but EMI3 page updated 4) HEPSPEC - probably updated? 5) Accounting - (+) links updated. 6) Regional tools: It was updated in August. Blogs: http://planet.gridpp.ac.uk - Tier-1 - August 2013 - http://gridpp-ops.blogspot.co.uk - humm 2009 - http://gridpp-storage.blogspot.co.uk - (+) October 2013 - http://londongrid.blogspot.co.uk - June 2013 - http://nationalgridservice.blogspot.co.uk - October 2012 (SHA2) - http://northgrid-tech.blogspot.co.uk - May 2013 - http://scotgrid.blogspot.co.uk - (+) 14th October. - http://southgrid.blogspot.co.uk - March 2012 (hyperthreading) Any more advances on impact areas?
    • 11:10 11:20
      Monitoring 10m
      https://www.gridpp.ac.uk/wiki/Monitoring - Is graphite being adopted? - Investigate more active alert options - Write-up and share scripts (e.g. related to temperature - monitoring differences between nodes/motherboards...) - Check out SIte Nagios - Involvement with WLCG group ... contribute to https://twiki.cern.ch/twiki/bin/view/LCG/WLCGMonitoringConsolidation. Several meetings during August and October. October had a final report - https://indico.cern.ch/conferenceDisplay.py?confId=276905. - Should/could this task evolve to coordinate the activities around puppet? Still an open question. HEPiX pushed Puppet work forward but how are we integrating with that?
    • 11:20 11:30
      Staged rollout 10m
      https://www.gridpp.ac.uk/wiki/Staged_rollout - SL6 WNs (+) now almost complete. - EMI-3 testing: https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3? (are the current contributions sufficient?) - (+) involvement in middleware readiness work.
    • 11:30 11:40
      Core services 10m
      https://www.gridpp.ac.uk/wiki/Core_Grid_services - perfSONAR - http://perfsonar.racf.bnl.gov:8080/exda/?page=25&cloudName=UK -- Which sites are needing to catch up? -- Rolling out latest mesh/version -- Following up on issues... are we making good use of perfSONAR? - VOMS - https://voms.gridpp.ac.uk:8443/vomses/ -- Rolling out network of servers now at ticketing sites stage. The plan next? -- Administering VOs documents (-) Did not make much progress with: 1) Disabled: monitoring.ngs.ac.uk & applications.ngs.ac.uk & ngs.ac.uk 2) All users expired: ukmhd.ac.uk (69 days); eresearch.ac.uk (454 days); ukqcd.vo.gridpp.ac.uk (89 days); ralpp (454 days); minos.vo.gridpp.ac.uk (454 days) & constellation.stfc.ac.uk (454 days). 3) Several VOs are one person. 4) We need to remove obsolete VOs: e.g. supernemo and minos. - Testing (Perhaps a separate core task to replace accounting?) -- IPv6 (Glasgow, Oxford, IC...) -- WLCG early adopter sites (T1 & Brunel)
    • 11:40 11:50
      Wider VOs 10m
      https://www.gridpp.ac.uk/wiki/Wider_VO_issues - More communities (hyperk). Document lessons learned? - Test DIRAC server running ... trying to get an update! - Push wider WebDAV usage? https://www.gridpp.ac.uk/wiki/WebDAV#Federated_storage_support - No 'quick start' documentation - Future service requirements (e.g. interest in cloud interfaces/resources)
    • 11:50 12:00
      Regional tools 10m
      - Nagios (status while also testing SHA-2) - VO Nagios (more instances?) - DIRAC
    • 12:00 12:10
      Interoperation 10m
      https://www.gridpp.ac.uk/wiki/Grid_interoperation - Representation at EGI ops meetings - (DC cloud discussions expanded) - Use of NGI services (e.g. CA and certwizard)
    • 12:10 12:15
      Security 5m
      https://www.gridpp.ac.uk/wiki/Security - Discussions happen in dedicated meeting - Team approach needed and will continue Issues/concerns - Some areas not getting attention such as reviewing cloud approaches - Helping with glexec in WN tarball - Sites not always picking up on pakiti warnings
    • 12:15 12:20
      Accounting 5m
      https://www.gridpp.ac.uk/wiki/Accounting - Broaden area to 'New technologies and impacts'? (e.g. use of whole node or cloud scheduling, impacts of many core....) -> testing? - Mainly HS06 updates and Steve's metrics page: http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html. This needs more regular reviewing.
    • 12:20 12:22
      Ticket follow-up 2m
      https://www.gridpp.ac.uk/wiki/Ticket_follow-up - Only obvious issue is that some sites are slow to follow up on certain tickets.
    • 12:22 12:25
      Summary/conclusions 3m
      - Focus for next month
    • 12:25 12:40
      Priorities - for review/discussion 15m
      - Actions page still not effective as we'd like. - Bulletin updates not being shared around... more useful if the updates come from those most closely involved. Monday update request? - glexec and ARGUS enablement at sites. (happening now) -- Status https://twiki.cern.ch/twiki/bin/view/LCG/GlexecDeploymentTracking -- Observations from SL6 upgrade work Ops coordination update for later https://indico.cern.ch/getFile.py/access?contribId=3&resId=1&materialId=slides&confId=280057. - ROD -- Team number (still) being addressed - WN tarball -- CVMFS done. But glexec a pain. - SHA-2 in progress (needs reviewing) Dec 1: some CAs may make SHA-2 the default for new certs EMI-3/Middleware - Steady progress. Good Brunel involvement. - Do we need more here and what about SHA-2 testing? - Other middleware/baseline concerns? Batch integration for example? - Hardware review/purchasing. Would a dedicated GridPP discussion meeting help? - Push on site data publishing -- (http://gstat2.grid.sinica.edu.tw/gstat/summary/EGI_NGI/NGI_UK/) and glue2 validator --> Clearly incomplete with 2 sites missing! Sheffield and RAL T1.
    • 12:40 12:41
      AOB 1m
      - Tentative next meeting date Thursday 12th December