Present: Miroslaw, Karim, German, Sylvain, Dennis, Hugo, Fabio, Helge
Closed actions:
·
[A1]
input received from everybody.
·
[A8] Jan: send
out a list of legacy (RH6) metrics to be stopped on February 1st.
done. Action turned to removing these metrics
Ongoing actions:
·
[A2]
no progress, but interest somewhat reduced.
·
[A5]
stalled
·
[A6] Fabio:
evaluate AlarmGUI (set up with Karim
and test it)
Fabio tried several times, but could not establish a connection to the
server. Karim: perhaps related with OraMonServer crashes (and subsequent AlarmBroker
crashes). Karim to give instructions to the list how
to restart AlarmBroker
·
[A7] Dennis: Test using ForSure.pl correlations once CMsensor functionality is stable (Dennis)
stalled
·
[A9] German: schedule an ELFms meeting where
Jan will present the WP4 proposal.
postponed
·
[A12] German: Provide arequirements
document enumerating FIO and PS requirements for the actuator framework
stalled
New actions:
·
[A13] Dennis: Check with metrics can be obtained via DMIdecode out of the existing hardware ones; check for a DMIdecode RPM; send around DMI specification links to
service managers
·
Dennis: Currently metric 6310 deployed reflecting a
BIOS checksum. There are however no reference values
yet; one would need to take out machines from production in order to understand
all the differences. Another issue: dmidecode; script exists to parse the information, but nothing
is currently being stored.
·
General feeling that establishing reference values,
and comparing the checksum against it, would be useful, but would require a
very significant effort to collect it. Different opinions on whether or not
this wouldbe useful. Message to be given to FS and DS
sections that deployment could be useful, but the investment cost is high.
·
Dmidecode considered more
useful, delivering more precise information about hardware than the existing
hardware sensors. Deployment would be more straightforward than for the BIOS
checksum.
·
Jan to send a list of metrics collected by existing
hardware sensors to Dennis, Dennis to cross-check which
metrics could be obtained via dmidecode.
·
Dmidecode (as well as SMART
monitoring) contained in kernel-utils; however, SMART
monitoring found too old, hence kernel-utils cannot
be installed. Dennis to check for a dmidecode
RPM, otherwise will package it ourselves.
·
Dennis to send around to service managers
links to the DMI specifications, and to typical dmidecode
output.
·
Fabio: PVSS stopped, all
scripts etc. uploaded into it-ccs-sw CVS repository.
Client machines not sending data to ccs002d any more.
·
Jan: Metrics removed as agreed; Marek's
metrics added to configuration, should now be in Oracle. Another metric added
on Vlado's request for load balancing monitoring
(just another daemon to be checked).
·
Hugo: Changes to network monitoring sensor applied,
being tested. Hugo to give all deployment information to the mailing list.
·
Manuel: Work ongoing. Artur:
no news
·
German will meet Alberto in order to explain our
strategy.
·
Dennis: Has found that subscriptions cause segfaults on standard RH 7.3 clients, but not on other
client machines using RHEL 3/ia64 or other distributions. Work ongoing in order
to factorise the problem.
·
David: [see his e-mail]
·
Jan: RPM not as important as other work requiring
Sylvain's advice (metadata for example). Server should be written such that unknown
metrics do not crash the server.
·
Fabio: Operator procedures for OraMonServer
restart have been updated.
·
Sylvain: progressing. Subscription
thread added, efficiency tested. Server can fully process more than
10'000 samples per second. API established (subset of MR API) that needs to be
implemented for plugging a database backend (e.g. Oracle). This means that
David will have to implement it with Oracle.
·
Karim: problem of
configuration file still around. Using Insure++ had unpleasant side effects.
·
German: Would like to see priority given to Webstart investigations as well as making the display
available to Fabio.
·
Dennis: Development ongoing, including making the
framework independent of the MSA
·
Nothing to report