Core-ops tasks

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This is the start of a biweekly meeting to review ops core tasks - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 550 8646 with code: 4880. Apologies: David, Mark, Chris
    • 10:45 10:55
      Documentation 10m
      https://www.gridpp.ac.uk/wiki/Documentation - Recent focus VO information Target for this meeting: - Updating VO admin guide to VOMS... - Keydocs (https://www.gridpp.ac.uk/php/KeyDocs.php) all assigned and reviewed Target for next meeting;
    • 10:55 11:00
      Monitoring 5m
      https://www.gridpp.ac.uk/wiki/Monitoring Target for this meeting: - Monitoring links ranked (expected by 29th) - Ops meeting special topic talk on monitoring pages (early July - 12th?) Update from David: " The ranking is underway, I intend to be finished by the end of the week." Target for next meeting:
    • 11:00 11:05
      Accounting 5m
      https://www.gridpp.ac.uk/wiki/Accounting Target for this meeting: - Statement on disk accounting (requested JC) SL 7th June "I believe what we agreed was that 3% (or some other fraction to be agreed) of the total disk budget would be divided up amongst the providers of disk to 'others'. Unless someone tells me how to monitor the provision automatically we agreed to use quarterly reports do find out who was providing what. As to whether 'other' includes LHC experiments you are not part of - we didn't discuss this. My feeling is that we shouldn't give credit for this as the model is to provide production (CPU) to other LHC experiments but not analysis (Disk). If a non ATLAS site wants to give ATLAS some disk then its up to them or maybe ATLAS should pay if they really want it. But as I said this hasn't been discussed properly yet." - 26th June ops meeting review HEPSPEC06 figures and accounting metrics page results for recent months Update: Figures for review at 3rd July meeting Target for next meeting
    • 11:05 11:10
      Staged rollout 5m
      https://www.gridpp.ac.uk/wiki/Staged_rollout Target for this meeting: - With EMI-2 available for SR we should have some feedback by the next meeting on the main components - Another target will be to have a list of concerns regarding gLite vs EMI services that we offer. Target for next meeting:
    • 11:10 11:15
      Ticket follow-up 5m
      https://www.gridpp.ac.uk/wiki/Ticket_follow-up - Ticket tracking working well Target for this meeting: - Starting to take more interest in UK submitted tickets - Review of stalled tickets - Including stakeholders on ticket - what is the process to "involve" other support units. Target for next meeting:
    • 11:15 11:20
      Core services 5m
      https://www.gridpp.ac.uk/wiki/Core_Grid_services - The main focus is currently perfSonar Target for this meeting: - Target for end of June? Another 4 sites active on the dashboard? - Survey of DRI deployment issues / followup needed - What do we want to measure (matrix of tests)? Update from Mark: I have updated the following parts of the wiki: https://www.gridpp.ac.uk/wiki/Protected_Site_networking#UKI-SCOTGRID-GLASGOW https://www.gridpp.ac.uk/wiki/PerfSonarInstall#PerfSonar_Documentation_Links ( Added Documentation Links) https://www.gridpp.ac.uk/wiki/Core_Grid_services For the Core Services task the priority is now getting everyone installed. This is difficult due to the issues of time for all the sites. However, what we need to test and monitor is fairly straight forward. As many sites are bringing on 10 Gig connections we need to look at some basic tests such iperf or tcpnut file transfers, I prefer iperf. To gain a rough understanding of the capabilities of each connection. This should be recorded in the GridPP wiki. It will need a new page. Then we should fire up Perfsonar between the sites at a basic rate of 1 Gig to start with for a couple of weeks once we have installed as many sites as we think is reasonable I would say 7 - 9 would do. This will give us a basic understanding of the inherent latency and bandwidth limitations between sites. The tests should be staged between RAL and the tier-2s and then the tier-2s to tier-2s. When this is done we can then test the 10 gig links. I am vary of bursts of 10 gig traffic on the network at Glasgow but if other sites want to run at this line rate for bandwidth testing then that is a local decision. All positive and negative results should again be recorded on a new wiki page. From here we can build up an idea of how well all the connected sites are performing. Then as we add more sites we have a foundation to build against. After we are happy that the UK cloud is working correctly we can look at testing intra cloud as discussed before. I will send you a separate email which documents the process for rate capping Perfsonar-PS boxes so that 10 gig connected sites don't accidentally flood 1 Gig connected sites. Target for next meeting:
    • 11:20 11:25
      Wider VOs 5m
      https://www.gridpp.ac.uk/wiki/Wider_VO_issues Target for this meeting: - Survey current issues - VOs supported: http://pprc.qmul.ac.uk/~walker/votable.html - Summary of status of problems outstanding (such as proxy renewal/myproxy). - Documents available to VOs Update from Chris: " will probably have to send my apologies .. there's not actually much to discuss I'm afraid. Target for next meeting: - Longer-term: steps to follow for best practice
    • 11:25 11:30
      Regional tools 5m
      - Backup hardware is now at Lancaster Target for this meeting: - to have successfully run tests using the Lancaster server - Smaller VO testing - awaiting for new SAM Nagios that enables new profile management. Target for next meeting: Other: - There is a current discussion about VOMS and what happens in the event that NGS do not get further funding.
    • 11:30 11:35
      Interoperation 5m
      https://www.gridpp.ac.uk/wiki/Grid_interoperation Target for this meeting: - latest EMI/EGI plans - experiences from SARonNGS - other areas of interest from NGS such as the certwizard (what is it + how to use) Target for next meeting:
    • 11:35 11:40
      Security 5m
      https://www.gridpp.ac.uk/wiki/Security Target for this meeting: - Linda is building up the role descriptions - We need to prepare for SSC5 and 6. Target for next meeting:
    • 11:40 11:45
      Discussion/Other Areas 5m
    • 11:45 11:46
      AOB 1m
      Next review Wednesday 18th July at 10:45.