- There was a GDB last Wednesday: http://indico.cern.ch/conferenceDisplay.py?confId=45475
Introduction
**********
Subjects for future meetings: Batch systems; T2 storage; Virtualisation; Site management ...
- Are there any suggestions for topics that should be visited at the GDB?
SL5
Events/Meetings: SRM workshop for developers this week; HEPiX 25th-29th May; STEP09 June; STEP09 review 9-10th July
Site to give feedback.
Pre-GDB was on NGI progress
Installed capacity - automated monitoring about to become a reality so please check what your site is publishing (logical/physical CPUs)
GridMap View. Question asked regarding how to publish the number of CPUs for sub-clusters. No real conclusion. SB
Possibility of a WLCG technical forum.
GGUS structure and workflow changes
******************************
Sites can now all be directly notified
Interfaces with other ticketing systems (region/experiment) being improved
Storage
******
LHCb tests running analysis jobs on large amounts of data
- to understand current limitations for user job data access
- Eg. 600 jobs - data in 100 files (200MB) with 500 events each
- ROOT based application opens files and reads events from SEs
- dCache tuning required but old version used.
- Promising results but either performance or file open time problems seen at most T1s.
SRM usage
- Measured SRM utilization patterns at CERN and RAL T1
- Looked at number of polling requests, failures etc.
- Main SRM client at CERN is FTS
- ATLAS runs 5 requests/s which is 5x more than CMS and LHCb
SL5
***
Experiment requirements
- Existing SL4 (gcc-3.4) binaries on SL5
-- Incompatibilities found with SELinux module (impacts ROOT, Oracle client and CERNLIB)
-- Identified compatibility libraries (but do the experiments distribute them or LCG/GD via a meta-RPM)
-- Working to make gcc-3.4 available on SLC5 systems
ATLAS
- No production release yet compatible with SL5 but can run with SELinux partially disabled (with compatibility libs installed). 15.2.0 expected in production by September.
CMS
- Can run on SL5 but do not want sites fiddling with SELinux!
LHCb
- Can run on SL5 for analysis of existing MC data but older releases will not run
- Testing and distribution of libs TBC
ALICE
- No problem
Native builds for SL5 with gcc-4.3 (skipping 4.1)
- much effort in porting C++ code
- external libraries an issue. Needs unified installation approach or shipping gcc-4.3 compiler or libraries.
ATLAS
- Native build soon. Deployed around August. Will produce SL4 and SL5 binaries.
- Inclined to retain SLC4/gcc-3.4/32-bit) as primary platform till after 09-10 physics run.
CMS
- Finished native port to 64-bit. Builds. Doing validation
- Want to switch binaries in one go
LHCb
- Port not yet done
- Plan to use SL5/gcc4.3/64-bit for real data and corresponding MC
ALICE
- No problem
=> Slow migration & calls for virtualization!
Experiment data flows
*****************
Presented to show rates for various tasks. T1 breakdowns.
LHCb - gave rates for T1s
CMS - gave detailed rates per T1
ATLAS - detailed T1 and T2 requirements including pilot shares. Request "feedback on how well balance between activities works during STEP09: jobs run/queued, CPU efficiencies each day). Also gives spacetoken summary (slide 25)
Memory use with 64-bit
- Graeme pointed out that sites should not kill jobs based on vmem as in 64-bit there is a large overhead as each process has a memory footprint of about 50MB vs 5 MB for 32-bit.
- Believe that torque kills on the basis of memory consumption of process tree not payload.
Pilot jobs
*******
- SCAS/glexec pilots running (inc. Lancaster) - but concerns and CREAM issues
Lanaster deployemnt ahead of others, running using lcmap-plugins. SCAS plugin not available for 64bit (ETICS issue.) BNL testing this week usage with ATLAS pilot system.
- The WLCG management board has asked T1s and larger T2s to deploy SCAS/glexec for testing by experiments:
Not ready to dep[loy ; so not asking T1 and large T2s to deploy yet.
- Some progress in ATLAS framework issues
- VDT asked to provide MyProxy server built with support for VOMS attributes
CREAM
*****
- Status vs transition criteria: https://twiki.cern.ch/twiki/bin/view/LCG/LCGCEtoCREAMCETransition
- CMS just started testing
- Request for more sites to provide CREAM CEs for testing
- MB plan has 50+ sites providing CREAM by 1st October (currently about 14 sites have it)
[Dug recently wrote a useful summary about the Glasgow installation http://scotgrid.blogspot.com/2009/05/cream-in-action-local-users-glexec.html]
For more on installation see: https://www.scotgrid.ac.uk/wiki/index.php/Glasgow_GLite_Cream_CE_installation and http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream31-devel
A common file access protocol
***********************
- Proposal to have XROOT as a common protocol. Consolidation would come after 09-10 run.
- Response was not great - may work for CASTOR and security aspect there but experiments just want anything that does not introduce a large overhead!