Deployment team & sites

Europe/London
EVO - GridPP Deployment team meeting

EVO - GridPP Deployment team meeting

Jeremy Coles
Description
- This is the biweekly DTEAM + sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area. - The phone bridge number is +44 (0)161 306 6802 (CERN number +41 22 76 71400). The phone bridge ID is 44709 with code: 4880.
Minutes
    • 11:00 11:20
      Experiment problems/issues 20m
      Review of weekly issues by experiment/VO - LHCb There are no major issues with the UK over the last week for LHCb. Very low load - primarily from users and not much monte carlo running over the last week. Interesting issues : 1. dCache-Root incompatibility forcing LHCb to ban all the dCache Tier-1s leaving us with only CERN, CNAF and RAL within the mask. The problem seems to be with Root developers changing their dCache plugin and we hope to have it fixed sometime this week. 2. A stop of DIRAC for 24 hours beginning later today to migrate to new, more resilient hardware with a new version of Dirac - CMS - ATLAS - Any conclusions/feedback from the ATLAS T1/2/3 Jamboree? (http://indico.cern.ch/conferenceDisplay.py?confId=76900 ). See the last talk link for Graeme's summary slides. A couple things to note: -- Chamonix workshop outcomes. 3.5TeV-3.5TeV collisions. Run until 1 fb-1 or end 2011. 2012 upgrades for higher LHC energies. -- Machine development Nov/Dec 2009. -- Expect 50% machine efficiency -> 12hrs/day. 4 days/month maintenance. -- Since 15.6.3 release only build on SL5. -- Memory usage improving - leveling at 2GB. Pile up event can reach 3GB. Suggestion is not to limit memory. -- Some discussion about old releases at sites. -- CREAM testing well underway. Suggestion is for any site with >1CE to have one on CREAM.
    • 11:20 11:30
      ROC update 10m
      ROC update *************** - Update from on-duty -- Things to follow up From the EGEE ops meeting: From the site reports: Tier-1 update: Ticket status *************** https://gus.fzk.de/download/escalationreports/roc/html/20100215_EscalationReport_ROCs.html 50491 - on hold. CMS transfers IC-RHUL. Probably jumbo frames issue. Opened in July 09******. 53349 - on hold. Bristol. Publishing vast amount of storage. Opened in November****. 53363 - Lancaster +2 (split). Fusion issue with .lsc file? Last update 08/01**. 53364 - TCD ticket regarding fusion VO. No submitter follow up?** Wait on submitter. Close?** 53598 - ATLAS T1. On hold. Channel load change request. wait for data to test?** 53600 - Oxford Nagios. On hold pending Savannah bug fix.****** 53834 - on hold. ECDF old CE. Waiting on second (new) CE?
    • 11:30 11:40
      WLCG (GDB) updates 10m
      There was a GDB last Wednesday. Many topics covered: http://indico.cern.ch/conferenceDisplay.py?confId=72049. For a Tier-2 summary see Alessandra's notes here: http://www.gridpp.ac.uk/wiki/GDB_10th_February_2010. - EGEE minimum version control Project wants to stop supporting very old releases. Sites will be expected to be at either the last high-priority release or a release not more than x months old. x will probably be around 6-8 months and be reduced over time. - ARGUS update Release 1.1 is now in certification. Going to staged rollout. Some experiment specific integration work and CREAM integration ongoing. Latest version has a need for recent lcmaps. - Milddeware update Release process uses a staged rollout. The quick adopter sites must be able to downgrade if so required. - Operational security Mainly about the new Pakiti release. Sites are strongly recommended to deploy this service. - EGI without ROSCOE This was the support centre bid combined across HEP and a few other areas. The bid was not successful. Has several impacts. CERN trying to find funding for the approx. 1FTE per experiment affected. - Distributed database workshop Review of test results. Indicates that backup and recovery needs further work. - Tier-1 coordination meeting - Virtualisation working group The idea is to allow the distribution of machine images. Still issues of transmission and trust. Part of a HEPiX working group. The need to support multiple hypervisors is being worked on. - Site management John reviewed the areas (such as FTS, DPM...) to check whether more support work was needed. - Multi-user pilot jobs Maarten summairsed the questionnaire responses so far: - CREAM Some issues being found with the CREAM-CE but sites are now recommended to deploy it. It seems much more efficient at processing jobs than the LCG-CE. Still some bug fixes coming! - OSG update - The EGEE-EGI transition Review of what is being done to regionalise the operations work - e.g. regional dashboards, monitoring etc. EGI.org is now in a position to setup.
    • 11:40 11:48
      SL5 status & benchmarking 8m
    • 11:48 12:03
      Site issues & updates 15m
      -
    • 12:03 12:04
      AOB 1m