Operations team & Sites

Name: Operations team & Sites
Start: 2012-07-24T11:00:00+01:00
End: 2012-07-24T12:16:00+01:00
Location: EVO - GridPP Operations team meeting

Tuesday 24 Jul 2012, 11:00 → 12:16 Europe/London

EVO - GridPP Operations team meeting

Description

- This is the biweekly ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 14 0782 with code: 4880. Apologies: Kashif, Mark, Sam, David, Gareth R, Andrew M

- 11:00 → 11:20
  
  Experiment problems/issues 20m
  
  Review of weekly issues by experiment/VO - LHCb - CMS - ATLAS UKI-NORTHGRID-GLASGOW: problems with aircon during the weekend. The site was put in downtime and the system set panda queues and storage offline. Job Recovery: recovery has now been tested both at RAL and Lancaster and it has caused no problems. Sites have to create a suitable space on the WN and monitor it with tmpwatch to delete data that have become too old. To avoid hardcoding the path in panda schedconfig they can set an env var pointing at the recovery path. When this is done they should contact cloud support so the recovery can be enabled in schedconfig. More information in Alaistair email and slides https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1207&L=TB-SUPPORT&F=&S=&P=107128 Memory leaks and how to control them: I wrote a post on how to set the limits without killing everytihng off. The most important one is the limit on vmem because torque will kill jobs if the jobs exceed the allocated vmem, the limit on mem is enforced by torque only if the jobs arrives with memory requirements but not if it exceeds them to keep the memory in check you need to set pmem. http://northgrid-tech.blogspot.co.uk/2012/07/atlas-jobs-with-memory-leaks-containment.html - Other
- 11:20 → 11:40
  
  Meetings & updates 20m
  
  With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest - Tier-1 status - Accounting - Documentation - Interoperation - Monitoring - On-duty - Rollout - Security - Services - Tickets - Tools - VOs - SIte updates
- 11:40 → 12:00
  
  Obsolete GridPP wiki/web pages 20m
  
  * For wiki examples see https://www.gridpp.ac.uk/wiki/Stale_documents#Stale_Documents
- 12:00 → 12:05
  
  Actions 5m
  
  To be completed: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Completed: https://www.gridpp.ac.uk/wiki/Operations_Team_Completed_Actions
- 12:05 → 12:06
  
  AOB 1m

Choose timezone

Operations team & Sites

EVO - GridPP Operations team meeting