ELFms/LEMON meeting, 29/01/2004

 

Present: German, Karim, Miroslaw, Dennis, Piotr, Helge, Fabio, Sylvain, Jan

Action Items:

Closed actions:

·        [A3] 22/1/04 JanvE: DAEMON_DEAD alarm when production OraMon daemon dies.
implemented, action closed

·        [A4] 22/1/04 JanvE: Deploy the MR API RPM onto LXPLUS
RPM doesn’t contain PHP yet. With this proviso, action closed.

·        [A10] Manuel: Do we need to access external data (data not available on the node) for local correlations?
Soap server allows to transparently redirect requests from the local spool to the central repository if so configured.
Sylvain has an idea how to reduce the server load (multiplexer)

·        [A11] Dennis: enable access to historical data inside CMsensor
done
 

       Ongoing actions:

·        [A1] 22/1/04 All: Provide list of 'TO DO' items to German for 2004 planning
input received from Fabio, David, Dennis, Hugo: others (including those whose contract will stop soon) to provide input

·        [A2] 22/1/04 German: Check with ADC support level for PSF framework
no progress

·        [A5] 22/1/04 German: Propose a new CVS layout (check with Bill Tomlin)
no progress

·        [A6] Fabio: evaluate AlarmGUI (set up with Karim and test it)
no progress, action remains open. Inform German once happening

·        [A7] Dennis: Test using ForSure.pl correlations once CMsensor functionality is stable (Dennis)
stalled

·        [A8] Jan: send out a list of legacy (RH6) metrics to be stopped on February 1st.
done.
 Action turned to removing these metrics

·        [A9] German: schedule an ELFms meeting where Jan will present the WP4 proposal.
no progress

·         [A12] German: Provide arequirements document enumerating FIO and PS requirements for the actuator framework
ongoing

Fabio: Has discussed with Bill how to automatically implement propagation from HMS to PVSS, Sure, ... clear how this would be done.

 

Report from PVSS presentation (German)

(see his written report)

-          SNMP driver: no further details known, assumed to still be based on snmp V2

-          Oracle: backend will only be used for historical data; live state data will be in their proprietary DB (History DB)

-          Linux implementation of Oracle not in initial 3.0 release, time scales somewhat vague

Discussion about the proposals made by German:

-          Fabio: Sure also needs automatisation, should focus on that rather than on PVSS. Will also need to plan for Sure phaseout in case we fully go the WP4 way. German: Summertime looks like the right time for such a decision

-          PVSS options: automatising add/remove hosts not possible due to manpower constraints (effort should go to Sure), carrying on as before not possible for the same reason, keeping PVSS running without maintenance is useless, hence stopping PVSS is the only realistic choice. German to propose this to FIO mgmt

-          WP4/Lemon: constraint for GUI implementation is that it must support SOAP.

-          Evaluation: needs to be detailed further. Sylvain: Seems we do no need to re-evaluate the repository, OraMonServer has been picked by FIO group leader as production solution. Helge: will need to check how OraMonServer and/or new server code will behave

Status reports:

CCS prototype 2:

-         Fabio: PVSS restarted by itself on Sunday night because low on virtual memory; not clear what caused this. Patches received, but probably no need to install them (see discussion above)

MSA and sensors:

-         Jan: clusters added for lxcmgmt, LCG RLS servers. FioSensor.pl moved to EDG repository.

-         Sylvain: has written developer documentation for MSA (and for other pieces of software he has written)

-         Jan: Will add more metrics, and work on the configuration of the MSA and OraMonServer. Will drop metrics as proposed previously

Lemon on Solaris:

-         Piotr: Solaris sensor still segfaults from time to time, he is investigating. Sylvain: If bug suspected in the library part, he will investigate. Piotr: No, probably the problem is elsewhere.

-         Jan: has ensured that ForSure.pl works on Solaris. Will remind PS about packaging and testing. Contacts with AS ongoing, they are to pick their favourate metrics.

-         IS asking about monitoring on Solaris 6

Lemon on Windows:

-         no news

MR API:

-         Jan: deployed.

-         DS starting to use in order to put up a Web page with combined information also from the CDB SQL interfaces.

-         Jan: has tried API to access data for creating his Web page. Took a long while, but worked.

Oracle monitoring servers:

-         David: now working on metatables after having been busy with bug fixes.

Alarm display:

-         Karim: bugs fixed, still working on configuration. Has triggered a bug in OraMonServer that closes ports.

-         Has looked at WebStart for Java, looks reasonable so far – would mean to be independent of the browser.

CMsensor:

-         Dennis: continuing the development, now supporting subscriptions firing at most once every time interval

AOB:

-         German: new mailing list project-elfms-lemon, should stop using it-proj-ccs. German to test, then Helge to change/suppress existing list

-         Jan: script to be converted to a sensor (by Tony Cass), doesn't work on RedHat 7.3 yet. Will report on additional sensors for DS purposes next week.