StoRM
-----
* blocksize mismatch GridFTP (hardcoded) vs. GPFS
* SLC4 brings twice the performance of SLC3
* tuning number of streams
* CNAF FTS SRM Get timeout 3600 --> 3000
* CMS farm activity saturated number of slots on disk servers
* GPFS: high random access latency for software area
* GPFS problems due to limited hardware and misconfiguration
* better logs needed to distinguish StoRM problems from other problems
* better configuration/admin tools needed

RAL
---
* LHCb RFIO core dumps, not yet understood
* CMS tape mounts for skimming halted production
* tape servers flaky, probably due to older CASTOR version

CASTOR SRM v2.2
---------------
* DB deadlocks
* too many DB connections, need more machines and better configuration
* CGSI errors unclear
* SRM stuck in recv(), cured by timeouts in latest version
* timeout on stager calls needed
* pinning/GC problem fixed
* logging trail being improved
* Put/Get processing typically 1-5 s after authentication
* moving to SL4, 2.1.7 and new MoU all urgent
* more tests to avoid problems in production
* test tool to come with release

SARA
----
* GSIDCAP server only on SRM node, due to bug that will be fixed
* read/write/cache pools separated
* queues for GridFTP and GSIDCAP
* full pools due to orphaned files removed from PNFS
  --> FTS timeouts increased, cron job to clean up
* slow ATLASDATATAPE --> increased number of movers, extra node
* slow staging for LHCb --> more hardware needed
* DIRAC staging small (150 MB) files, bad for tape system
* SRM reports NEARLINE also for T0D1 when file is only on a write pool
  --> T0D1 should be made read-write
* space token VOMS checking problem fixed
* GSIDCAP no longer listening on port 22128, not understood
* LFC crashes, fix coming
* ATLAS DDM bugs: failures seen as successes and vice versa
* LHCb: bringOnline not enough to make status ONLINE, should be fixed
* D1 <--> D0 transition function or pinning?
  --> changeSpaceForFiles not on roadmap, PNFS admin command available
* dCache release should highlight configuration changes
* should not mix patches with new features (was an accident)
* stage tests:
  - 500 + 50 (different tape) 2 GB files
  - bringOnline crashes with 500 files --> use "dccp -P" for now
  - 100 MB/s with pre-stager, else ~60 MB/s

DPM at GRIF
-----------
* 1.6.10, 64-bit, 100 TB
* 250 MB/s transfers without tuning
* ATLASGRPDISK needs multiple FQANs --> feature expected in September
* XROOTD plugin rpm coming
* advanced monitoring tools by Greig Cowan

Databases
---------
* most critical service
* RAC + DataGuard, down << 1%
* old hardware on standby during transition period
* Streams replication: online --> offline, T0 --> T1, OpenLab collaboration
* CMS: Frontier, Squid
* 3D project for sharing policies and procedures
* 24x7, but still best effort
* with more memory fewer physical reads
* DB usage increase should be at T1, not T0
* DB dashboard for easy monitoring, technology also available to T1
* most applications guided to 1 preferred node each, better cache utilization,
  less intracluster traffic
* locked owner accounts to avoid accidents (e.g. drop table)
* SAM/GridView biggest consumers
* T1: no problems, hardware upgrades foreseen
* power cut:
  - 1 Ethernet switch not on critical power
  - faulty OEM agents scripts prevented automatic startup
* completing migration to 64-bit and 10.2.0.4, first T0, then T1 (3D)
* Streams setup improvements
* reliable, manageable service
* close collaboration between application developers and DBAs

ATLAS DB
--------
* reprocessing launched at the end of May
* need ~1k concurrent Oracle sessions
* some sites not yet OK/tested
* 3D streaming to T1 OK
* DCS (slow control) has largest volumes
* replication to calibration sites OK
* reprocessing: DB average load OK, bursts limited by capacity
* more tests foreseen
* T1 firewall issues --> use proxy

CASTOR DB
---------
* background bulk query for sync. between stagers, disk servers, name space:
  - slowed down name server during backup
  - sync. suspended during backup, DB disks defragmented
* stager_rm slow in certain cases --> fixed by forcing index use via hint
* deadlocks between concurrent requests --> fix coming
* too many concurrent connections:
  - lowered number of connections
  - lowered number of SRM threads
  - split DB into several RACs
* increase during CCRC'08-2 not large compared to continuous activity

Middleware
----------
* software process operated as usual: updates, priorities
* longstanding fix for job priorities released
* dCache in gLite not at cutting edge, slightly behind
* lcg-CE Globus marshal daemons security fix
* gLite 3.1 WMS released
* baseline versions for services and clients
* GFAL desired pin time bug affected ATLAS reprocessing:
  - fix entering certification
  - no EMT request for fast track yet
* FTM to be deployed at T1
* VDT bugs:
  - MyProxy linked against wrong version of Globus
  - Globus proxy chain length limit too low
  - fixes expected shortly
* CREAM: still functional problems, stress tests started
* gLExec security problems affecting CREAM and pilot jobs,
  fixes expected shortly
* more lcg-CE performance improvements in the pipeline
* VDT 1.10 for SL5 in September, driven by sites
* EMT for short term planning, TMB for medium term, ALICE not present
* Application Area repository for early access to new client versions
* ATLAS, CMS: SL5 possible in winter