GridPP PMB
****************
Met last week (as I think you know!). The agenda covered most areas that have been of recent interest:
0) Tier-1 site review (TD)
1) GridPP3 project planning (SP)
2) EGI/NGI plans (RM)
3) Disaster Planning (SB) + network resilience (PC to circulate update)
4) GridPP3 MOU (TD)
5) CASTOR status/ CSA07 summary (DN +)
6) Middleware issues - R-GMA status and Network area plans (RM)
7) Tier-2 Hardware Allocation result/process (SL)
8) Tier-3 (RJ)
GridPP DTEAM
******************
- Recent input from Graeme who has been getting involved with ATLAS production work.. latest on Panda "Would like UK to be moved across to complete Panda production by start of December. Rest of EGEE cloud will follow."
- Stephen has kept us informed on ATLAS use of CASTOR. The latest release has gone well. Now a plan to close dCache in 6 months time.
- GridPP Tier-2 hardware allocations
- The biomed user issue. The activity stopped but a new way is being sought to do things properly. [Question to geant4 supporting sites]
- Occasional APEL issues. Are there any at the moment?
EGEE/OSG/WLCG ops & ROC manager's
************************************************
- SpecInt2000 "How To" from the HEPiX working group [http://tinyurl.com/2of7vm]
-- "The SPEC CPU2000 benchmark suite has been retired by SPEC and replaced by its successor CPU2006. The CPU2000 benchmarks are, however, still widely used within the HEP community... A benchmarking working group, launched at HEPiX Fall 2006 and run by Helge Meinhard (CERN), is currently working on a strategy how to move away from CPU2000 to a more recent benchmark." - useful reference page http://hepix.caspur.it/processors/.
- response on our request 14A: "Provide administration tools such as add/remove/suspend a VO, a user, add/remove/close/drain a queue close a site (on site BDII), close a storage area". Claudio Grandi remarked:
1. "Start/stop are available for all services. Misbehaving
commands are bugs: submit a bug to Savannah if you
think the start or stop of a service is not doing what
expected (e.g. processes left behind, etc…)"
2. "Missing features are better identified by clients. We
propose to form a group within SA1 with the aim of
developing a service management interface for all gLite
services."
- A ROC-Site Service Level Description (SLD) document is almost final: https://edms.cern.ch/document/860386/0.5. Similar to MoU but less constraints. E.g. Minimum site availability 70%. Maximum time to acknowledge GGUS tickets - 2hrs, and to resolve GGUS incidents 5 working days.
- ATLAS VO Views problems (any left?)
WLCG GDB/MB
********************
- Attempting to find a way out of the long standing deadlock on pilot jobs/glexec. Take a look at John's summary from the MB: http://tinyurl.com/2wt3t6. "WLCG sites must allow job submission by the LHC VOs using pilot jobs that submit work on behalf of other users."
- The short-term work to get job priorities working is "ongoing"
- The status of the middleware is best summarised in Markus's talk to the LHCC Comprehensive Review (the general meeting may be of interest: http://indico.cern.ch/conferenceDisplay.py?confId=22243) last week http://tinyurl.com/2qytlm. The talk also offers a good summary of the current build, configuration and test process.
32-bit:
-- LCG-CE now ported (with torque) SL4+VDT1.6 (released?)
-- CREAM-CE - expect certification to start January 2008
-- WMS/LB gLite 3.1 SL4 - in testing (IC)
-- BDII on SL4 PPS this week
-- DPM & LFC - internally tested but ongiong configuration
-- gLite-SE dCache - ready for certification. Is the 32-bit version needed?
64-bit:
-- Priorities - WN (in runtime testing), Torque_client, DPM_disk & UI
- Migration to SL4 will be complete in early 2008. In parallel porting to SL5 will start.