Discussion/Brainstorming

Issues that came up during the tutorial exercises:

Do we need anything more than a Config() method? (Ignacio, German)

RMS problems under "job storms"

During the exercises it was noticed that the RMS-PBS setup would block job submission if a lot of jobs were sent at the same time (job storms). In the end it seemed that all jobs were executed but the qsub blocking was not very user friendly. David Front had been speaking to Markus Schultz who said that he had never seen this happen for vanilla PBS used at the EDG testbeds.

Fault tolerance tutorial

Exercise didn't work because of a bug in the new version of the software. At the end of the session the old software was downloaded and a few WP4 members (Olof, Sylvain, Thomas) saw it working :). Main problems:

Installation

Rafael pointed out that while the objective is to develop a system to scale to 1000’s of nodes one shouldn’t forget small farms. System too complex.

Monitoring

General comments