Operations team & Sites
Tuesday 19 July 2011 -
11:00
Monday 18 July 2011
Tuesday 19 July 2011
11:00
Meetings & updates
Meetings & updates
11:00 - 11:20
- ROD team update Few alarms at different sites but all has been fixed with in time. Oxford is facing intermittent network problem as we have installed new Dell network switches and facing some problem with it. gridppnagios was off the network for almost 2 hours on Sunday night because of University wide network issue. - Nagios status 1. gridppnagios was updated to latest release 11.2. Now WMS's are properly tested by Nagios https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_WMS&style=overview Nagios send jobs through WMS to GoodCE which is the list of CE's currently passing Nagios test. it reduces the chances of false alarm for WMS test. frequency of test is 5 min. 2. Removed test org.ggus from bdii which was not very popular. 3. Steve Traylen pointed to a problem after last update that UKI T1 availability is showing zero at https://sls.cern.ch/sls/?view=lcg . I opened a ticket (https://ggus.eu/ws/ticket_info.php?ticket=72115 ). It turned out that SLS was using old grid view algorithm. Anyway SLS is CERN internal monitoring system and UK TI availably/reliability is not effected. 4. Another interesting incident which I have already reported to ROD mailing list. ROC_Asia/pacific had mis-configured their ROC Nagios and it started testing UKI sites and sending alarms to Dashboard https://ggus.eu/ws/ticket_info.php?ticket=72304 . Apparently availability/reliability should not be affected but something to keep in mind when new report come. 5. On 17th July Nagios was offline for around 2 hours between 18:00 and 20:00 due to University wide network problem. - Tier-1 update - Security update -- T2 issues --- Accounting: QMUL (early July) -- General notes. The creation of NGI_UK has started. Currently in the process of setting up GOCDB entries and cross-checking lists and management requirements. - Ticket status: http://tinyurl.com/3uo5get
11:20
Experiment problems/issues
Experiment problems/issues
11:20 - 11:40
Review of weekly issues by experiment/VO - LHCb - CMS - ATLAS - Other - Experiment blacklisted sites - Experiment known events affecting job slot requirements - Site performance/accounting issues - Metrics review
11:40
Overview of WLCG workshop
Overview of WLCG workshop
11:40 - 11:50
The agenda: https://indico.desy.de/conferenceOtherViews.py?view=standard&confId=4019 The summary talk - attached JC notes - attached
11:50
EGI service operations security policy (draft)
EGI service operations security policy (draft)
11:50 - 11:55
V1 draft: https://documents.egi.eu/document/669 Main changes: 1. Generalise the policy to include *all* Services not just those run by a Resource Centre (Site). This includes services run by third parties, such as VOs, and virtual services as well as real services. 2. Exclude items which are operational in nature and not related to security (please note here that we have retained statements about IPR, liability and dispute handling because these are not yet included in other policy documents). 3. Change terms to more appropriate ones now used in the EGI, e.g. "Site" becomes "Resource Centre", "Grid" becomes "Infrastructure". For those of you wishing to see more details of SPG discussions on this policy revision please see: https://wiki.egi.eu/wiki/Talk:SPG:Drafts:Operations_Policy
11:55
Actions
Actions
11:55 - 12:00
- http://www.gridpp.ac.uk/wiki/Deployment_Team_Action_items
12:00
AOB
AOB
12:00 - 12:01
- Please aid Tier-2 reps in completing the Tier-2 reports. Due this week. - Please register for GridPP27 at CERN in September: http://www.gridpp.ac.uk/gridpp27/. Deadline is 12th August but booking earlier is advised. If you have a topic for the joint PMB-ops team meeting please let Jeremy know. - Note message today about the Tier-2 accounting periods. Together the periods will use metrics that are continuous.