Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This is the biweekly ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 77907 with code: 4880. Apologies: Mark M, Kashif, Catalin
Minutes
    • 11:00 11:20
      Meetings & updates 20m
      - ROD team update Few alarms at different sites but all has been fixed with in time. Oxford is facing intermittent network problem as we have installed new Dell network switches and facing some problem with it. gridppnagios was off the network for almost 2 hours on Sunday night because of University wide network issue. - Nagios status 1. gridppnagios was updated to latest release 11.2. Now WMS's are properly tested by Nagios https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_WMS&style=overview Nagios send jobs through WMS to GoodCE which is the list of CE's currently passing Nagios test. it reduces the chances of false alarm for WMS test. frequency of test is 5 min. 2. Removed test org.ggus from bdii which was not very popular. 3. Steve Traylen pointed to a problem after last update that UKI T1 availability is showing zero at https://sls.cern.ch/sls/?view=lcg . I opened a ticket (https://ggus.eu/ws/ticket_info.php?ticket=72115 ). It turned out that SLS was using old grid view algorithm. Anyway SLS is CERN internal monitoring system and UK TI availably/reliability is not effected. 4. Another interesting incident which I have already reported to ROD mailing list. ROC_Asia/pacific had mis-configured their ROC Nagios and it started testing UKI sites and sending alarms to Dashboard https://ggus.eu/ws/ticket_info.php?ticket=72304 . Apparently availability/reliability should not be affected but something to keep in mind when new report come. 5. On 17th July Nagios was offline for around 2 hours between 18:00 and 20:00 due to University wide network problem. - Tier-1 update - Security update -- T2 issues --- Accounting: QMUL (early July) -- General notes. The creation of NGI_UK has started. Currently in the process of setting up GOCDB entries and cross-checking lists and management requirements. - Ticket status: http://tinyurl.com/3uo5get
    • 11:20 11:40
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb - CMS - ATLAS - Other - Experiment blacklisted sites - Experiment known events affecting job slot requirements - Site performance/accounting issues - Metrics review
      Atlas Report
    • 11:40 11:50
      Overview of WLCG workshop 10m
      The agenda: https://indico.desy.de/conferenceOtherViews.py?view=standard&confId=4019 The summary talk - attached JC notes - attached
      JC notes
      Summary slides
    • 11:50 11:55
      EGI service operations security policy (draft) 5m
      V1 draft: https://documents.egi.eu/document/669 Main changes: 1. Generalise the policy to include *all* Services not just those run by a Resource Centre (Site). This includes services run by third parties, such as VOs, and virtual services as well as real services. 2. Exclude items which are operational in nature and not related to security (please note here that we have retained statements about IPR, liability and dispute handling because these are not yet included in other policy documents). 3. Change terms to more appropriate ones now used in the EGI, e.g. "Site" becomes "Resource Centre", "Grid" becomes "Infrastructure". For those of you wishing to see more details of SPG discussions on this policy revision please see: https://wiki.egi.eu/wiki/Talk:SPG:Drafts:Operations_Policy
    • 11:55 12:00
      Actions 5m
      - http://www.gridpp.ac.uk/wiki/Deployment_Team_Action_items
    • 12:00 12:01
      AOB 1m
      - Please aid Tier-2 reps in completing the Tier-2 reports. Due this week. - Please register for GridPP27 at CERN in September: http://www.gridpp.ac.uk/gridpp27/. Deadline is 12th August but booking earlier is advised. If you have a topic for the joint PMB-ops team meeting please let Jeremy know. - Note message today about the Tier-2 accounting periods. Together the periods will use metrics that are continuous.