Response from TRIUMF Tier-1 to proposal to centrally distribute client (glite-WN) software: These are our objections and/or questions. Whereever the subject 'you' is used in this text, we mean: 'Developers and/or Central/VO Operations'. We are speaking for ourselves as a Tier-1 centre in this response, and not for the Canadian Tier-2's. - Why do you not adapt the current timeout mechanism for CA version checking in SAM to force sites to also be at required software levels for glite-WN? http://grid.cyfronet.pl/sam-doc/CE/CE-sft-caver.html The CA check is very effective. - Worker nodes are the easier software to install for us! Other services are an order of magnitude more difficult to understand, configure and monitor. Why is the easiest system to manage being singled out here? - One big advantage of an RPM-based installation of software is that it has built-in mechanisms to verify integrity of the installed base, to find modified files, and to patch and downgrade. Tarball installations do not readily have these attributes. - with RPMs we CAN downgrade to older RPMs if we need to (rpm --oldpackage ..) and we are very good scripters :-). - so we don't understand their statement: 'Fast rollback in case of problems (not currently possible).' At any rate, in case of problems you should issue new RPMs that fix the problem and we can update them under a timeout constraint as noted above. - We are concerned that this is being investigated and being considered so late. We have been installing and patching lcg/gLite software since 2005 at TRIUMF. We are confident that we know how to manage our software. Moreover, we expect that the frequency of updates to middleware should decline, given that we are on the brink of LHC startup. - This is a big change, requiring at some point that we delete all glite-WN RPMs from worker nodes, fix configuration files in /etc/profile.d/ that collide with changes, and fix library loader configurations that collide with the changes (assuming that this central installation method becomes a requirement). - There are many reasons why doing this over NFS is a bad idea. No one in their right mind is going to champion NFS for its scaling and performance considerations, particularly in single-server NFS implementations. We would need to reconsider how we deploy NFS services for the next upgrade in this scenario. - In the current system we have control over TIMING of updates, in the full knowledge of what is RUNNING on our worker nodes and what is QUEUED in the batch system. In a central push of software we will, in all likelihood, be asleep in our beds when they intervene with our cluster. Daytime changes are infinitely preferably to a cluster located on the other side of the world from CERN. - What will happen to currently running jobs when the nfs area for the shared libraries and binaries are changed from underneath the jobs? Has this been investigated? - We currently have full control of the dcache-based client-side software that lands on our worker nodes - we do not use the dcache RPMs present in the gLite repositories. We chose instead to manage our dCache servers separately, and when needed, we push dcache client RPMs that match what we use on our dCache servers. If a central mechanism controls this then we would lose control of this important aspect in managing our server-side dcache changes. (example - our transition from dcache 1.6 to dcache 1.7). In summary, we do not favour a change to central client software distribution. cheers from Denice, speaking for the TRIUMF Tier-1