Core-ops tasks

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This is the start of a biweekly meeting to review ops core tasks - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 577 9376 with code: 4880. Apologies: Mark, Kashif
    • 1
      Documentation
      https://www.gridpp.ac.uk/wiki/Documentation Target for this meeting: - First draft of page on 'stale' pages +2 Updating VO admin guide to VOMS... +2 Keydocs (https://www.gridpp.ac.uk/php/KeyDocs.php) all assigned and reviewed Target for next meeting; - Investigate EGI templates for procedures (e.g. removing storage) - Glue 2.0 review?
    • 2
      Monitoring
      https://www.gridpp.ac.uk/wiki/Monitoring Target for this meeting: - Monitoring links ranked (expected by 29th) - Ops meeting special topic talk on monitoring pages (early July - 12th?) Target for next meeting: - Ranked links in web page - Review of monitoring data quality in top 10?
    • 3
      Accounting
      https://www.gridpp.ac.uk/wiki/Accounting SL 7th June "I believe what we agreed was that 3% (or some other fraction to be agreed) of the total disk budget would be divided up amongst the providers of disk to 'others'. Unless someone tells me how to monitor the provision automatically we agreed to use quarterly reports do find out who was providing what. As to whether 'other' includes LHC experiments you are not part of - we didn't discuss this. My feeling is that we shouldn't give credit for this as the model is to provide production (CPU) to other LHC experiments but not analysis (Disk). If a non ATLAS site wants to give ATLAS some disk then its up to them or maybe ATLAS should pay if they really want it. But as I said this hasn't been discussed properly yet." Target for this meeting: - 26th June ops meeting review HEPSPEC06 figures and accounting metrics page results for recent months Update: Figures for review at 3rd July meeting Target for next meeting: - Again check the position for other disk
    • 4
      Staged rollout
      https://www.gridpp.ac.uk/wiki/Staged_rollout Target for this meeting: - With EMI-2 available for SR we should have some feedback by the next meeting on the main components - Another target will be to have a list of concerns regarding gLite vs EMI services that we offer. Target for next meeting: - View on gLite 3.1 services still in production - Transition roadmap?
    • 5
      Ticket follow-up
      https://www.gridpp.ac.uk/wiki/Ticket_follow-up - Ticket tracking working well Target for this meeting: +1 Starting to take more interest in UK submitted tickets +1 Review of stalled tickets +1 Including stakeholders on ticket - what is the process to "involve" other support units. Target for next meeting:
    • 6
      Core services
      https://www.gridpp.ac.uk/wiki/Core_Grid_services - The main focus is currently perfSonar Target for this meeting: - Target for end of June? Another 4 sites active on the dashboard? - Survey of DRI deployment issues / followup needed - What do we want to measure (matrix of tests)? - documents list for process for rate capping Perfsonar-PS boxes so that 10 gig connected sites don't accidentally flood 1 Gig connected sites. Update from Mark (June meeting): - Updated wikis - priority is now getting everyone installed. - As many sites are bringing on 10 Gig connections we need to look at some basic tests such iperf or tcpnut file transfers, I prefer iperf. - To gain a rough understanding of the capabilities of each connection. This should be recorded in the GridPP wiki. It will need a new page. - fire up Perfsonar between the sites at a basic rate of 1 Gig to start with for a couple of weeks once we have installed as many sites as we think is reasonable I would say 7 - 9 would do. - basic understanding of the inherent latency and bandwidth limitations between sites. - The tests should be staged between RAL and the tier-2s and then the tier-2s to tier-2s. - When this is done we can then test the 10 gig links. I am vary of bursts of 10 gig traffic on the network at Glasgow but if other sites want to run at this line rate for bandwidth testing then that is a local decision. - All positive and negative results should again be recorded on a new wiki page. - From here we can build up an idea of how well all the connected sites are performing. - Then as we add more sites we have a foundation to build against. - After we are happy that the UK cloud is working correctly we can look at testing intra cloud as discussed Target for next meeting: - Template for installation config (for example how long tests run) - Agreement on site setup (bandwidth and latency on one machine?) - Understanding of communities to be joined - Setup 'standard' T2 site to US test - Agreement on restricting the tests
    • 7
      Wider VOs
      https://www.gridpp.ac.uk/wiki/Wider_VO_issues Target for this meeting: - Survey current issues - VOs supported: http://pprc.qmul.ac.uk/~walker/votable.html - Summary of status of problems outstanding (such as proxy renewal/myproxy). - Documents available to VOs Target for next meeting: - Volunteer VO using Tier-1 CVMFS - Webpage summary of git vs other methods for s/w updates (VO reference doc) - Neuro VO enabled in WMS and LFC - Longer-term: steps to follow for best practice
    • 8
      Regional tools
      Target for this meeting: - to have successfully run tests using the Lancaster server - Smaller VO testing - awaiting for new SAM Nagios that enables new profile management. Target for next meeting: - VOMS strategy and work plan defined Other: - There is a current discussion about VOMS and what happens in the event that NGS do not get further funding.
    • 9
      Interoperation
      https://www.gridpp.ac.uk/wiki/Grid_interoperation Target for this meeting: - latest EMI/EGI plans - experiences from SARonNGS - other areas of interest from NGS such as the certwizard (what is it + how to use) Target for next meeting: - Milestones for next 6 months? - Does DPM community support fall here?
    • 10
      Security
      https://www.gridpp.ac.uk/wiki/Security Target for this meeting: - Linda is building up the role descriptions - We need to prepare for SSC5 and 6. Target for next meeting:
    • 11
      Discussion/Other Areas
      - Contributions to the WLCG Operations Coordination Team (https://indico.cern.ch/materialDisplay.py?contribId=9&materialId=slides&confId=155070) - glexec and ARGUS enablement of sites - CA-TAG meeting (to have covered September 'gap' concerns)
    • 12
      AOB
      Next review Wednesday 5th September at 10:45.