Speakers
Mr
Mario Lassnig
(CERN & University of Innsbruck, Austria)Mr
Mark Michael Hall
(Cardiff University, Wales, UK)
Description
In highly data-driven environments such as the LHC experiments a reliable and
high-performance distributed data management system is a primary requirement.
Existing work shows that intelligent data replication is the key to achieving
such a system, but current distributed middleware replication strategies rely
mostly on computing, network and storage properties when deciding how to
replicate data-sets across a global set of data centres.
While the distributed nature of such data management systems reduces the
requirement for co-location of data and users interested in specific data,
reliability and performance considerations mean that where possible co- or
close-location are preferred.
We present an approach for improving existing replication strategies based on
geographical data available in existing communication infrastructures.
Information on the geographical distribution of interested users is
extracted from an existing communication infrastructure using automated
analysis of locational expressions in research documentation, operational
logbooks, e-mail correspondence or web presences. Combined with the linking
of data-sets to interested users this allows for an intelligent,
anticipatory data replication strategy for data placement at locations close
to the interested users.
Summary
Replica placement strategies based on feature extraction by natural language parsing.
Authors
Mr
Mario Lassnig
(CERN & University of Innsbruck, Austria)
Mr
Mark Michael Hall
(Cardiff University, Wales, UK)