Operations team & Sites
Tuesday 28 June 2011 -
11:00
Monday 27 June 2011
Tuesday 28 June 2011
11:00
Meetings & updates
Meetings & updates
11:00 - 11:20
- EGI updates UPDATE 30 for gLite 3.2 is now ready for production use. The priority of the updates is: Normal The highlights of the update are: - New version of glite-BDII_top - New version of glite-CREAM - New version of glite-LB - New version of glite-SGE_utils All details of the update can be found in: http://glite.cern.ch/R3.2/sl5_x86_64/updates/30/ - ROD team update - Nagios status -- Note Steve Lloyd's email about his SAM pages. Who uses them? Problem yesterday "WN-RepCr SAM test failing across UK": "The problem with gridppnagios has been fixed and sites which have failed this test should be OK with in an hour. A little detail about the problem. We are changing network switches at Oxford site and made sure that gridppnagios and storage-monit.physics.ox.ac.uk which is used as primary storage for replication should not be affected but I missed the point that without site bdii jobs at WN would not be able to locate storage-monit machine. I am also using heplnx204.pp.rl.ac.uk as backup replication storage server but unfortunately it started failing for some other reason. I removed both storage server and added two new SE,s and I have checked that it is working". - Tier-1 update - Security update - WLCG update: A new WLCG Technology Evolution Work Group is being formed with Markus Schulz and Jeff Templon as chairs: “The overall goal is to ensure the long term support of the LHC community use cases, taking into account experiments, sites, and operational needs. Reducing where possible complexity and manpower needs for users, sites and developers. Improving functionality and performance where needed…. to define the vision for evolution according to the WLCG collaboration, and secondly to coordinate work being done… The group will cover topics such as: Security Model, Job Management, Virtualization, Data Management, Data Access, Information and Service Discovery etc. To get started we ask the Computing Coordinators to nominate for their experiments a permanent member and deputy. We will try to identify suitable site delegates, security and operations watchdogs.” -- T2 issues Please check the site data here under "Tier-2": http://wlcg-rebus.cern.ch/apps/topology/ - Specific question for Peter/Durham: Is 1920 logical CPUs correct? - Several sites still publishing "EGEE" -- General notes. - GGUS summary for UKI of open tickets: http://tinyurl.com/6a93yme - 10 red tickets (5 on hold)
11:20
Experiment problems/issues
Experiment problems/issues
11:20 - 11:40
Review of weekly issues by experiment/VO - LHCb - CMS - ATLAS - Other - Experiment blacklisted sites - Experiment known events affecting job slot requirements - Site performance/accounting issues - Metrics review
11:40
Open discussion
Open discussion
11:40 - 11:55
Some areas that could be covered: - glexec issues (https://gridppnagios.physics.ox.ac.uk/myegi/history/) [click simple/advance filter: select glexec from profile tab]. Today it shows RHUL; Liverpool; Brunel and Oxford. Last week we had 9 sites!? - perf-sonar work (http://tinyurl.com/6a7dshg) - topics we want explored at the WLCG workshop in July (https://indico.desy.de/conferenceTimeTable.py?confId=4019#all)
11:55
Actions
Actions
11:55 - 12:00
- http://www.gridpp.ac.uk/wiki/Deployment_Team_Action_items
12:00
AOB
AOB
12:00 - 12:01