Speaker
Hegoi Garitaonandia Elejabarrieta
(Instituto de Fisica de Altas Energias (IFAE))
Description
ATLAS Trigger & DAQ software, with six Gbytes per release, will be installed in about
two thousand machines in the final system. Already during the development phase, it
is tested and debugged in various Linux clusters of different sizes and network
topologies. For the distribution of the software across the network there are, at
least, two possible aproaches: fixed routing points, and adaptive distribution. The
first one has been implemented with the SSH worm Nile. It is a utility to launch
connections in a mixture of parallel and cascaded modes, in order to synchronize
software repositories incrementally or to execute commands. A system administrator
configures, in a single file, the routes for the propagation. Therefore it achieves
scalable delivery, as well as being efficiently adapted to the network. The
installation of Nile is trivial, since it is able to replicate itself to other
computers memory, being implemented as a worm. Moreover, the utilization of routing
and status monitoring protocols together with an adaptive runtime algorithm to
compensate for broken paths, make it very reliable. The other aproach, adaptive
distribution, is implemented with peer to peer protocols, or P2P. In these solutions,
a node interested in a file acts as both client and server for small pieces of the
file. The strength of the P2P comes from the adaptive algorithm that is run in every
peer. Its goal is to maximize the peer's own throughput, and the the overall
throughput of the network. Hence the network resources are used efficiently, with no
configuration effort. The selected tool in this case is BitTorrent. This paper
describes tests performed in CERN clusters of 50 to 600 nodes, with both technologies
and compares the benefits of each.
Primary author
Hegoi Garitaonandia Elejabarrieta
(Instituto de Fisica de Altas Energias (IFAE))
Co-authors
gokhan unel
(CERN)
Mr
haimo zobernig
(CERN)