Discussion/Brainstorming
Issues that came up during the tutorial exercises:
Do we need anything more than a Config() method? (Ignacio, German)
- Comparison with SUE:
- Other methods (from SUE) are
- Installation
- Update
- Daily/weekly/monthly
- Init
- SUE provided methods for starting but not stopping services
- (Helge)Among all the provided methods, most SUE features use only “Update”.
- (Ignacio)If we only have one method Configure() it must be idempotent
- (German)Do we also need a Deconfigure() method? Could be a hook when de-installing an RPM. Normally empty but could be used for removing entries from shared configuration files.
- (Olof)Is there only one application responsible for running the components? For instance: FT could call configuration components directly or through some interface to cdispd.
- Calling components directly would require locking of the running of the component. The component must handle the locking itself.
- (Phil)Components should never run in parallel. Is this necessary? long discussion ...
RMS problems under "job storms"
During the exercises it was noticed that the RMS-PBS setup would block job submission if a lot of jobs were sent at the same time (job storms). In the end it seemed that all jobs were executed but the qsub blocking was not very user friendly. David Front had been speaking to Markus Schultz who said that he had never seen this happen for vanilla PBS used at the EDG testbeds.
Fault tolerance tutorial
Exercise didn't work because of a bug in the new version of the software. At the end of the session the old software was downloaded and a few WP4 members (Olof, Sylvain, Thomas) saw it working :). Main problems:
- Need examples (templates) for how to proceed when configuring rules.
- GUI not pretty and user friendly. Need something else (html forms?)
- Error messages when something went wrong. The plan is to use a syntax checker.
- David Front was worried about the fact that actuators can be very strong. Can this be controlled? No, it’s up to the administrators to know what he is doing. Distinction between normal user and admin will come in the future.
Installation
Rafael pointed out that while the objective is to develop a system to scale to 1000’s of nodes one shouldn’t forget small farms. System too complex.
Monitoring
- (All)Fix segmentation violation when subscribing to a non-existing metric.
- (Enrico and others)The use of tabs in the MSA configuration file is not very userfriendly. A mistake (tab --> space) is fatal.
- (Sylvain)Perl interface for the MSA-sensor would be useful.
General comments
- (Andrea)When are the different things available? Monitoring could be used at CNAF but it is not among the released EDG software. So how could one convince site admins to use it instead of other well-known tools (Nagios, Ganglia,...).
- (Michele)GUIs are too primitive. For instance, with the FT rule editor it was a lot of work for producing four lines of code.
- (Maite)The different maturity of the software is a problem for us and we have a lot of things to integrate. Unstable software is not going to be accepted by EDG at all! For instance, RMS interface between globus and PBS need to be very well tuned if to be accepted by EDG.