13–17 Feb 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

A Distributed File Catalog based on Database Replication

15 Feb 2006, 09:00
9h 10m
Tata Institute of Fundamental Research

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
poster Distributed Event production and processing Poster

Speaker

Andreas Joachim Peters (CERN)

Description

The LHC experiments at CERN will collect data at a rate of several petabytes per year and produce several hundred files per second. Data has to be processed and transferred to many tier centres for distributed data analysis in different physics data formats increasing the amount of files to handle. All these files must be accounted for, reliably and securely tracked in a GRID environment, enabling users to analyze subsets of files in a transparent way. The talk describes a distributed file catalogue that gives consideration to the distributed nature of these requirements. In a GRID environment there is on one hand a need for a centralized view of all existing files for job scheduling. On the other hand each site should be able – for performance reasons - to have autonomy to access files without the need of centralized services. The proposed solution meets the need for a local and global operation mode of a file catalogue. Commands can be executed autonomously in a local catalogue branch or heterogeneously in all of them. The catalogue implements a file system like view of a logical name space, user-defined meta data with schema evolution, access control lists and common POSIX user/group file permissions. Architecture, interface functionalities, performance tests and very promising results in comparison to other existing GRID catalogues will be presented.

Primary authors

Presentation materials