Notes from the LCG GDA meeting

2004-03-22

Agenda: http://agenda.cern.ch/fullAgenda.php?ida=a04986
Contact: project-lcg-gda@cern.ch

The chairman, Ian Bird, informed the participants that Alice and CMS now use the LCG Core sites to run their Data Challlenge jobs.

Dirk Duellmann (CERN IT DataBase group and POOL co-developer) reported on work with CMS to test and improve performance in queries involving "like" which are still problematic. The reason is that, due to the present design, metadata and mapping data are currently split and the "join" operation is done at the client, which is very inefficient. A design change is basically required and relevant discussions were held with EDG WP2 since last year. At the moment a fix is attempted in POOL. CMS was now offered direct SQL read-only access to bypass problems with metadata queries involving RLS.

Jean-Philippe Baud is preparing, on Atlas request, a fix for testing that allows copy and register with guid (Grid Unique IDentifier).

Markus Schulz reported that the core sites now run LCG-2 on 1800 CPUs. The system usage observed in production is minimal. CERN and Taipei suffered problems last week due to Alice jobs, requiring big swap space, submitted to nodes with limited disk space, which they quickly exhausted. The job distribution doesn't work as desired but the algorithm behaves in an understandable way given the way of job submission, as done by Alice.

Joel Closier reported that the "edg-rm" command requires the specification of the VO. However, the
"edg-rm --vo=<VO-name> printInfo" returns all resources available and there is no way to only view those available for the given <VO-name> , so that one can know where to install the experiment software. However, this is not an appropriate use of the Replica Manager (edg-rm commands) as one can obtain this information by submiting a test job or by doing an "ldapsearch".

Flavia Donno reported on Experiment and Application Software Installation. Basic rules of today's procedures are:

The certificate Subjects of the Experiment Software Managers (ESMs) are mapped to special pool accounts at the sites.
Software installation and validation take place in 2 separate steps.
Experiments requested for a Production Manager (ESM) and a normal user to be allowed to run jobs using the same account. They are informed about the risks of possessing more system privileges than required.

Ian Bird asked for feedback from the audience about this point. Markus Schulz reminded that this attitude towards usage of system privileges across experiment members was always present before the Grid era.

Flavia briefly presented a prototype developed by Roberto Santinelli based on the agent Tank that runs on a Computing Element (CE) and accepts requests from ESMs to do software installations on Worker Nodes (WN). Tank communicates via authenticated channel the request to install the software bundle on the entire WN farm, to Spark, a client running on a WN. A cron job that runs as normal user on all WNs polls, at the discretion of sys.admins, the bundle from Tank. The prototype will be available for testing on the EIS testbed. (***ACTION***: Sites, please send feedback on whether such a tool is desired).

A brief discussion on alternative solutions followed. The CERN Quattor team can't offer something today that satisfies the requirements. A shared filesystem is the only alternative used so far but many sites, including CERN, CNAF and NIKHEF might not wish to continue with it, in the long run. A large site possessing its own tools can use those, to make the software available on the WNs.

A.O.B.:

The next meeting will take place on Monday 29 March 2004 from 14:00 to 16:00 in the CERN IT Auditorium.
People are reminded that this meeting is agenda-driven and discussion items should be submitted to the list project-lcg-gda@cern.ch which is a self-subscription mailing list with web archiving activated.

Maria Dimou, IT/GD, Grid Infrastructure Services