Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This is the biweekly ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 14 0782 with code: 4880. Apologies: Kashif, Mark, Sam, David, Gareth R, Andrew M
Minutes
    • 11:00 11:20
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb - CMS - ATLAS UKI-NORTHGRID-GLASGOW: problems with aircon during the weekend. The site was put in downtime and the system set panda queues and storage offline. Job Recovery: recovery has now been tested both at RAL and Lancaster and it has caused no problems. Sites have to create a suitable space on the WN and monitor it with tmpwatch to delete data that have become too old. To avoid hardcoding the path in panda schedconfig they can set an env var pointing at the recovery path. When this is done they should contact cloud support so the recovery can be enabled in schedconfig. More information in Alaistair email and slides https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1207&L=TB-SUPPORT&F=&S=&P=107128 Memory leaks and how to control them: I wrote a post on how to set the limits without killing everytihng off. The most important one is the limit on vmem because torque will kill jobs if the jobs exceed the allocated vmem, the limit on mem is enforced by torque only if the jobs arrives with memory requirements but not if it exceeds them to keep the memory in check you need to set pmem. http://northgrid-tech.blogspot.co.uk/2012/07/atlas-jobs-with-memory-leaks-containment.html - Other
    • 11:20 11:40
      Meetings & updates 20m
      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest - Tier-1 status - Accounting - Documentation - Interoperation - Monitoring - On-duty - Rollout - Security - Services - Tickets - Tools - VOs - SIte updates
    • 11:40 12:00
      Obsolete GridPP wiki/web pages 20m
      * For wiki examples see https://www.gridpp.ac.uk/wiki/Stale_documents#Stale_Documents
    • 12:00 12:05
      Actions 5m
      To be completed: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Completed: https://www.gridpp.ac.uk/wiki/Operations_Team_Completed_Actions
    • 12:05 12:06
      AOB 1m