Speaker
Dr
Nick Garfield
(CERN)
Description
As computing systems become more distributed and as networks increase in
throughput and resources become ever increasingly dispersed over multiple
administrative domains, even continents, there is a greater need to know the
performance limits of the underlying protocols which make the foundations of
complex computing and networking architectures. One such protocol is the
Network Time Protocol (NTP) which is often overlooked as an important part of
any large scale computing system. With the adoption of new highly distributed
technologies, such as those employed in grid computing, the increasing number
of users and resources will test not only the synchronization of these resources
but also the transaction logging and event correlation in any problem
resolution/diagnostic systems. In essence, good quality and reliable time
synchronization is a key component to the actual operation of any large scale
production system incorporating many components. In this paper we present
the CERN NTP server and client architecture and discuss the statistical quality of
time synchronization of 4 computing clusters of increasing size from
approximately 50 to 3000 nodes and inter-connected via a high-performance
10Gbit/s symmetrically routed network backbone infrastructure. Each cluster is
dedicated to a specific task or application resulting in various IO load profiles,
some more deterministic than others. The relationship between the reliability
of time synchronization, system load and network IO is analysed and
optimization suggestions are presented.
Primary author
Dr
Nick Garfield
(CERN)
Co-author
Mr
Vlado Bahyl
(CERN)