FIO - ELFms Meeting, 10th February 2004
- Present: Vladimir Bahyl, German Cancio, Benjamin Chardi, Jan van
Eldik, Harry Renshall, Thorsten
Kleinwort, Veronique Lefebure, Miroslav Siket, Dennis Waldron
- Minutes: Vladimir Bahyl
1) Linux Virtual Server
Miroslav Siket gave a presentation about his investigation to use Linux
Virtual Server to resolve our scalability problems with software repository. The
presentation in can be downloaded from
here.
The outcome of the discussion that followed was that Linux virtual Server
solution might in fact not be needed as we should soon have serial console nodes
available as a head software repository nodes in each rack.
Harry Renshall also proposed, that if we plan to implement this, Linux
Support should be involved, hence Miroslav should give presentation to them as
well.
2) sysctl NCM component
Vladimir Bahyl briefly explained the structure of the /etc/sysctl.conf file. The
information in the file should be kept in CDB in some format.
German Cancio pointed Vladimir Bahyl to the spma NCM
component, there all necessary key/value options are defined in the profile.
Vladimir Bahyl will look into the spma NCM component and
supervise Benjamin Chardi to write sysctl SPMA component.
3) Deployment of the NCM framework on nodes running
RedHat Enterprise Server
German Cancio will deploy NCM framework on
1 development node: lxdev03. All people are advised to test their NCM components
as well as all programs that use CCConfig. Wider
deployment should then happen next week.
Jan van Eldik will talk to Vladimir Bahyl
to agree on a reasonable list of components that should be installed on a RedHat
Enterprise Server node.
None.
3)Action items :
New actions:
- [A98] Vladimir Bahyl will look into the spma NCM component and
supervise Benjamin Chardi to write sysctl SPMA component.
- [A99] German Cancio will deploy NCM framework on 1
development node: lxdev03. All people are advised to test their NCM
components as well ass all programs that use CCConfig.
- [A100] Jan van Eldik will talk to Vladimir Bahyl to
agree on a reasonable list of components that should be installed on a
RedHat Enterprise Server node.
Completed actions
- [A10] T.K. (David Hughes): The program 'check-this-host.pm' should be installed locally through RPM
Done - RPM is deployed
everywhere.
- [A11] Veronique: Put scripts in AFS C3/bin area into
CVS.
Done; but old scripts need to be moved to 'old'.
Judit will update her scripts in the AFS area.
- [A58] TS & JvE: Add support for RAID in pan partition
manipulation functions
Done by Tim Smith. - [A62]
Tim/Thorsten/German: Structure the LXSERV server
cluster into replicated "read-only" front end nodes serving clients and a back
end node hosting the CDB
Done in a sense that all
servers serve as read-only nodes because of demand from clients. To be
addressed later. - [A84] Bill: Disk HW information should be added to
CDB. Depends on [A58]
Done by Jan van Eldik.
- [A85] German: Prepare IA64 templates for OpenLab cluster
node(s), install and test quattor.
Done - a number of
minor, but time-consuming issues have been identified which would need to be
addressed before being able to run SPMA and the rest of quattor on IA64. - [A93] Vlado: DNS round-robin using service ping for SWREP
alias
SWREP.cern.ch DNS alias has been moved to
the new system, functionality didn't change, but the framework now allows
it.
- [A95]: Benja: enhance his maintenance script to check
for an existing instance and exit if it is already running
Done by default - check for maintenance.lock file.
- [A80] VL: write a 'change_state' (aka
"bring_me_down") script, which will undertake the neccessary actions to
bring the node to the desired state.
Done
Stalled actions:
- [A51]: (TK): automounter SUE feature: should also start
the portmapper, so that it can be disabled on nodes not needing
it.
Stalled - will be done as an NCM
component for the next RH release.
- [A63] German: LCG to provide a list of services per node
type, including the list of RPM's.
Stalled - LCG
provides a manual installation guide, but no service to RPM
mapping.
- [A74] German: New RH release needs to be added to the
Software Repository platforms
Stalled - on hold until
new RedHat version is decided, IA64 RH ES3 was added
- [A75] All: The CDB templates need to be cloned/adapted
for new RH release
Stalled - on hold until new RedHat
version is decided
- [A86] TK: One new SEIL node should be added to the LXDEV
cluster.
Stalled until new batch of machines
arrive
Ongoing actions:
- [A6] GCM: SWRep->CDB interfacing. Use new CDBOP
no progress.
[A8] T.K. was suggesting to have a file in /etc/ with
some installation releated information, like the date and time of the install,
and the version and release of the installed Linux. Evtl. the Serial
number.
no progress
- [A48] Tim, German: ACL feature modification needed for
serial console users.
No update.
- [A57] Judith: CASTOR CLI. For testing purposes, the
corresponding SUE feature will generate a dummy shift.conf.new configuration
file with real data. The CLI needs to be packaged as RPM. Real data has to be
put in CDB.
More testing to be performed before handing over to CASTOR
operations team.
- [A70] TK (All): lxdev01 should be reconfigured to use the
same setup as lxserv01. Thorsten to reinstall it with lxserv01 setup once
lxservb01 is done.
No progress
- [A76] Harry/Uli: Find out what is the full information
needed for server nodes in LSF 5.1 (split from [A61])
Action changed from Tim to Harry
- [A77] Piotr: implement XML API allowing pan to invoke
external programs
No update
- [A87] Bill: Create new Oracle9i based CDB SQL
database.
Some emphasis should be put into making CDB
SQL interface usable (= faster) to help justify CDB as a wider deployable
solution.
- [A89] GCM to write the new set_partition function that
allows multiple SWAP partition
Depends on [A58].
- [A90] Vlado: check LXBATCH SW templates division into
interactive and non-interactive parts
Discussion done, list of RPM that will be removed has
been prepared, Thorsten Kleinwort will send e-mail to experiments.
- [A92] German: Provide documentation on how to manage the
SPMA cache.
German will send an e-mail where he described where we
can find the documentation (man spma) - [A94]
TK: Deploy NCM on remaining clusters: LXMASTER, LXFONT, SURE (and others ?)
Vlado deployed the NCM framework on many clusters,
remaining ones: LXMASTER, LXFONT, SURE will be done by Thorsten Kleinwort.
- [A96] Bill: Implement left join on existing CDB2SQL
views
no update
- [A97] Veronique/Jan: Update/refine HostState.pl (Vero),
'sensorify' HostState.pl (Jan)
Jan working on providing a generic command handler for
sensors
![]() |
Vladimir Bahyl, 10th February 2004