Indico celebrates its 20th anniversary! Check our blog post for more information!

Deployment team & sites

Europe/London
EVO - GridPP Deployment team meeting

EVO - GridPP Deployment team meeting

Jeremy Coles
Description
- This is the weekly DTEAM meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 1826702 with code: 4880.
Minutes
    • 11:00 11:20
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb - CMS - ATLAS - Other - Experiment blacklisted sites - Experiment site performance -> Expt. reps please remind us of the *best* monitoring/status pages to check for your experiment.
    • 11:20 11:30
      ROC update 10m
      ROC update *************** - Update from operations on-duty Tier-1 update ************** From the EGEE ops meeting **************************** - No recent ops meeting and it looks like the regular Monday meeting will soon end - One current topic being discussed is removing support for some gLite 3.1 services. ------------------------------------------------------------------ | Service Name | Released on gLite 3.2 | Candidate to | | | | stop 3.1? | ------------------------------------------------------------------ | glite-BDII | 29.06.09 (6 updates) | YES | | glite-CREAM | 07.01.10 (3 updates) | YES | | glite-GLEXEC_wn | 07.01.10 (2 updates) | YES | | glite-LB | Only in staged rollout| NO | | glite_LFC_mysql | 27.07.09 (5 updates) | YES | | glite_LFC_oracle | 08.02.10 (2 updates) | YES | | glite_LSF_utils | 24.03.10 (1 update) | NO | | glite-MPI_utils | 08.02.10 (1 update) | NO | | glite-SCAS | 07.01.10 (2 updates) | YES | | glite-SE_dpm_disk | 08.02.10 (0 updates) | NO | | glite-SE_dpm_mysql | 27.07.09 (5 updates) | YES | | glite-TORQUE_client | 12.03.09 (6 updates) | YES | | glite-TORQUE_server | 07.01.09 (2 updates) | YES | | glite-TORQUE_utils | 07.01.09 (2 updates) | YES | | glite-UI | 15.06.09 (7 updates) | YES | | glite-VOBOX | 13.10.09 (4 updates) | YES | | glite-WN | 12.03.09 (7 updates) | YES | ------------------------------------------------------------------ Stephen B noticed That would seem to end support for SL4 WNs - I'm surprised if EGEE in general is ready for that, although it may be OK for us. At a quick count I get 234 SL4 subclusters vs 311 for SL5 ... | glite-UI | 15.06.09 (7 updates) | YES The UI may be hard to control and again I'm surprised if everyone is ready to lose SL4 - even CERN still has SL4 as the default on lxplus! Derek noted: | glite-BDII | 29.06.09 (6 updates) | YES | Our experience with the glite 3.2 BDII running as a top bdii is that it is not as good as the 3.1 version - it suffers from frequent performance degradation and needs restarted to fix. On restart it takes sometime for the BDII to be able to respond to queries correctly as it has to regather the information from all the site bdiis. And from John Gordon in the last hour: "The proposal is to stop support, not to ban them, at least until a security exposure comes along. Since 3.2/SL5 32 bit is not supported, this is the biggest issue. Does it hit any UK sites?" WLCG update ***************** - If you are interested in WLCG T1 performance and issues then read Jamie's slides here: http://indico.cern.ch/conferenceDisplay.py?confId=81986. The next GDB is on 12th May: http://indico.cern.ch/conferenceDisplay.py?confId=72055. Any comments on/suggestions for the agenda? UK NGI ********* - At the moment work "on the ground" has not changed. NGS has developed a transition plan to an NGI and this includes deploying more gLite components. GridPP will develop a similar plan in the coming weeks. In the meantime there are a series of NGS surgery meetings where more technical exchange will take place. The next is tomorrow on the CREAM CE (with Kashif and Dug presenting). Ticket status *************** https://gus.fzk.de/download/escalationreports/roc/html/20100419_EscalationReport_ROCs.html 50491 - on hold. CMS transfers IC-RHUL. Probably jumbo frames issue. Opened in July 09*************. 53834 - on hold. ECDF old CE. Waiting on second (new) CE?******* 54923 - R-GMA. Incorrect access assigned. Is someone at the T1 aware? Unsolved: ? - LHCb Glasgow transfers 56128 - LHCb - Sheffield. Transfer related. Same as Glasgow's problem? 56154 - LHCb - Brunel. Again transfer related.
    • 11:30 11:40
      Monitoring 10m
      - What are you (the site admins) checking regularly? - An issue was raised by ATLAS in that experiments are spotting problem disk servers before sites. The question for this forum to discuss is what fabric monitoring is currently in place that may help and is such monitoring uniform across all sites? - Can we revitalise the wiki pages in this area? What tools should a site have in place as a minimum and what is recommened? http://www.gridpp.ac.uk/wiki/Monitoring_Tools_for_LCG
    • 11:40 11:45
      Actions 5m
      See http://www.gridpp.ac.uk/wiki/Deployment_Team_Action_items
    • 11:45 11:46
      AOB 1m
    • 11:46 12:01
      Tier-2 Coordinators - Quarterly reports 15m