lecture Deployment team
Date/Time: Tuesday, 8 July 2008 - 11:00 (Europe/London)
Location: EVO - GridPP Deployment team meeting
Chairperson: Jeremy Coles
Description:
- This is the weekly DTEAM meeting
- The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the GridPP Community area.

-  The phone bridge number is +41 22 76 71400. The phone bridge ID is 353551 with code: 4880.
Material: Minutes unknown type file

 
 Tuesday, 8 July 2008
 11:00
Experiment problems/issues (20')    
Review of weekly issues by experiment/VO
- LHCb

- CMS
 
- ATLAS
-- atlas/uk VOMs status. Progress with additional spacetokens.


- Other
-- Very low grid activity this last week
-- "Is there anything apart from number of available machines that limits the number of camont jobs that can run? ... The running-time for a job is about two-hours, so when queuing times are small I expect to have about 200 jobs processed per day.  The number of jobs currently being processed per day is only about 50 "

- gridpp VO discussed at the PMB yesterday. Wide support to require sites to support the VO. Please notify sites. 
 11:20
ROC update (15')    
ROC update
***************
There is an SA1 coordination meeting on Thursday: http://indico.cern.ch/conferenceDisplay.py?confId=37379

WLCG update
*****************
There is a GDB this Wednesday http://indico.cern.ch/conferenceDisplay.py?confId=20231. Do we have any issues to be raised?
 
ops meeting
**************
... the decision was made to extend the duration of Phase1 of the pilot
(deployment in PPS environment) until the 22nd of July before the pilot service
starts being migrated toward the production environment

- A PPS all sites meeting was held on 1st July: http://indico.cern.ch/conferenceDisplay.py?confId=36928. 

- Latest version of LFC (3.1.12-0) contains a bug which can cause it to hang or crash.

- (DECH) SAM Problem: (network) problem with the CERN BDII used by the RB/WMS for job submission. Also a file was missing for the host certificate test.


Ticket status
***************
https://gus.fzk.de/download/escalationreports/roc/html/20080707_EscalationReport_ROCs.html
 11:35
Availability & reliability (5')   more information pdf file    
- Review of June's performance and main issues
 11:40
Site purchase information wiki page (5')    
- Request sites to complete some set information following each new purchase
- Provide any guidelines we can about requirements (or grid trends)
- Perhaps offer HEPSYSMAN summary on site updates

- Guides?

1) The T1 uses 5 or 10 TB disks for performance. What should T2s use?
2) Spacetokens impact disk server/pool distribution. What factors need to be highlighted when making a purchase?
3) What nework connection is a minimum (CE-WN; CE-SE...)
4) Memory per core is a VO defined requirement. Currently 2GB/core?
5) Network links - external 1Gb/s required for clusters up to....
6) Benchmarking 
7) CEs should have dual PSU and disk... (how much memory?)
8) What is the advice about UPS?
9) What are sites using for MON and other services
10) Which nodes can be put on a virtual host and what hardware should be used for such a host?
11) Which site last underwent procurement?
 11:45
SL end-to-end transfer tests (10')    
- http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest.html
- Initial reactions
- Explaining oddities or why the test may be wrong in some cases
- What analysis needs to be done (record site SE configuration etc.)
 11:55
Topics to revisit (5')    
- gstat publishing. Small group being formed. 
- Wiki/web page updates (see for example http://www.gridpp.ac.uk/deployment/contact.html). Admin task! 
- Completion of the GridPP-NGS site status information in http://www.gridpp.ac.uk/wiki/Working_with_NGS

- Regional Nagios monitoring (ScotGrid have progressed - who else is moving forward with it?). At DTEAM on 1st July agreed on deployment on September timescale - YAIM component may be available then. 
- COD training in August

- Collecting site queue/fairshare information
- Reminder for sites to add comments to http://www.gridpp.ac.uk/wiki/SAM_availability:_October_2007_-_May_2008. 
- Look at the Site Readiness Review reports

- "We need to audit T2 sites to understand how many concurrent transfers each can cope. This requires details of how many servers are available and how the pools are allocated between the VOs."

- 080630: The first public version of the Operations Automation Strategy (MSA1.1) is now in EDMS at https://edms.cern.ch/document/927171/1
 12:00
Actions review (5')    
 12:05
AOB (5')    
- Comment on the EGEE SLDs (Feedback so far from SouthGrid and NorthGrid. London David was happy but what about others in the T2?). Are there any further comments or objections? I will mail the T2 managers this week for agreeement. 

- T2 quarterly reports are due next week. We looked at them last week (http://indico.cern.ch/conferenceDisplay.py?confId=35286). Are there any further comments/issues with them? 

- The WLCG service report may be of interest: http://indico.cern.ch/conferenceDisplay.py?confId=33702.