ELFms/LEMON
meeting, 22/01/2004
Present: Helge
Meinhard, Piotr Kolet, Marek Misiowiec, Dennis Waldron, Miroslaw Siket, Manuel
Guijarro, Harry Renshall, German Cancio (minutes), Fabio Trevisani, Hugo
Cacote, Jan van Eldik
PSF -
Python Scripting Framework
- Marek’s slides can be found here.
- PSF consists of two parts:
- a framework for easing writing MSA sensors, shielding the
MSA/sensor protocol details. Implementation is done using Python
- Linode: a collection of C libraries for monitoring the Linux
installation servers at CERN
- Marek has choosen Python as implementation language due to the
large number of libraries available, and its (true) object orientation.
The framework is compact (some 1K LOC). How much experience does FIO have
with Python?
- With PSF, we have three MSA sensor frameworks: the other ones are
written in C++ (Sylvain) and Perl
(JanvE).
- Linode is a use-case targeted application, but some of the code
could be reused for other purposes. It does remote monitoring of the
installation servers but not provide any generic remote monitoring
functionality. We could reuse some of the ideas/concepts for remote
service monitoring (eg. are all neccessary servers reachable from a given
rack’s network segment)
- The framework has not yet been released to other
users. Marek will leave CERN in 10 weeks, what is the
support level by ADC? (German to check)
- Subject to the above - should we integrate the Linode metrics into
the FIO repository?
CCS
prototype 2
- Fabio: the kernel on the PVSS servers was upgraded and the machines
rebooted.
- HMS integration (automated adding/removing of machines into the
PVSS configuration): no progress.
Oracle MR
- David’s report can be found here.
- The current procedure for restarting OraMon on
the production server is considered insufficient (periodically checking if
OraMon still works..). It is decided that a ‘daemon
dead’ alarm will be sent to the
operators, which will inform us via the mailing list (Jan van Eldik to
follow up).
MSA /
sensors
- Jan: Repackaging of MSA and sensors under
progress – the MSA RPM should come from the EDG autobuild, an additional
RPM provides CERN-specific sensors and configurations. Jan will deploy the MR API RPM onto LXPLUS.
- Hugo: how to store switch sensor measurements?
Helge: in I/O per time unit, like KB/s.
- Currently, our software is spread over three CVS
servers: PVSS repository (on Helges server), EDG repository (in Lyon), and
FIO repository (on the CERN central CVS servers). As the EDG repository
will be frozen, it will be neccessary to move the repository module
containing the Lemon sources to CERN. At the same time, the monolithic
source structure and build makefiles should be revisited. Bill has
proposed some FIO-wide guidelines for structuring CVS repositories. German
suggests using the quattor build tools for package generation. Manuel
raises the point that PS should have write access to this new repository. German to check and propose a new CVS layout.
Derived
metrics
- Dennis: working on adding wall clock time restrictions to Cmsensor.
The XML parsing has been improved (now using Apache Xerces), as well as
the MSA parameter handling and debugging instrumentation.
AOB
- It is decided to track actions via an action list (see below).
- For the next meeting, German asks for a ‘to do’ list
from all project participants, listing the items which to their
understanding need to be addressed in 2004. The collected items
would be used as input for a detailed planning in 2004.
- Examples:
- CDB integration of MSA, OraMon, AlarmDisplay configurations
- Integrating the new MR server (S_Server) into OraMon
- Software re-packaging, in particular sensors
- OraMon stability, including fail-over / redundancy techniques
- AlarmDisplay GUI using WebStart
- MR_ API improvements (eg. convert from SOAP lists to arrays)
- Providing MR access to experiment users
- Deploying Cmsensor locally and globally
- Local Recovery actuator framework
- ...
Action Items:
New
actions:
·
[A1] 22/1/04 All:
Provide list of 'TO DO' items to German for 2004 planning
·
[A2] 22/1/04 German: Check
with ADC support level for PSF framework
·
[A3] 22/1/04 JanvE:
DAEMON_DEAD alarm when production OraMon daemon dies.
·
[A4] 22/1/04 JanvE: Deploy
the MR API RPM onto LXPLUS
·
[A5] 22/1/04 German: Propose
a new CVS layout (check with Bill Tomlin)