Speaker
K. Nienartowicz
(CERN)
Description
Data management is one of the cornerstones in the distributed production computing
environment that the EGEE project aims to provide for a European e-Science
infrastructure. We have designed a set of services based on previous experience in
other Grid projects, trying to address the requirements of our user communities.
In this paper we summarize the most fundamental requirements and constraints as well
as the security, reliability, stability and robustness considerations that have
driven the architecture and the particular choice for service decomposition in our
service-oriented architecture. We discuss the interaction of our services with each
other, their deployment models and how failures are being managed.
The three service groups for data management services are the Storage Element,
the Data Scheduling and the Catalog services. The Storage Element exposes interfaces
to Grid managed storage, with the appropriate semantics in the Grid distributed
environment. The Catalog services contain all the metadata related to data: The File
Catalog maintains a file-system-like view of the files in the Grid in a logical user
namespace, the Replica Catalog keeps track of identical copies of the files
distributed in different Storage Elements and the Metadata Catalog keeps application
specific information about the files. The Data Scheduling services take care of
controlled data transfer and keep the information in the Catalog services consistent
with what is actually available in the Storage Elements, acting as the binding
between the two.
We conclude with first experiences and examples of use-cases for High Energy Physics
applications.
Primary authors
A. Frohner
(CERN)
G. McCance
(CERN)
K. Nienartowicz
(CERN)
P. Badino
(CERN)
P. Kunszt
(CERN)
R. Da Rocha
(CERN)