1–3 Mar 2006
CERN
Europe/Zurich timezone

Data Grid Services for National Digital Archives Program in Taiwan

1 Mar 2006, 14:30
15m
40-SS-D01 (CERN)

40-SS-D01

CERN

Oral contribution Earth Observation - Archaelogy - Digital Library 1c: Earth Observation - Archaeology - Digital Library

Speaker

Mr Eric Yen (Academia SINICA Grid Computing Centre, Taiwan)

Description

Digital archives/libraries are widely recognized as a crucial component of the global information infrastructure for the new century. Research and development projects in many parts of the world are concerned about using advanced information technologies for managing and manipulating digital information, ranging from data storage, preservation, indexing, searching, presentation, and dissemination capabilities to organizing and sharing of information over networks. Digital Archive demands for reliable storage systems for persistent digital objects, well-organized information structure for effective content management, efficient and accurate information retrieval mechanism and flexible services for varying users needs. Hundreds of Petabyte of digital information has been created and dispersed all over the internet since computers had been used for information processing, and the amount still grows in the rate of tens of Petabyte per year. Grid technology offers a possible solution for aggregating and processing diversified heterogeneous Petabyte scale digital archives. Metadata-based information representation makes specific and relative information retrieval more accurately, makes information resources interoperable, and paves the way for formal knowledge discovery. Taking advantage of advancing IT, semantic level information indexing, categorizing, analyzing, tracking, retrieving and correlating could be implemented. Data Grid aims to set up a computational and data-intensive grid of resources for data analysis. It requires coordinated resource sharing, collaborative processing and analyzing on huge amounts of data produced and stored by many institutions. In Taiwan, a National Digital Archive Project (NDAP) was initiated in 2002 with its pilot phase started in 2001. According to the record in 2005, more than 60 Terabytes digital objects was generated and archived by 9 major content holders in Taiwan. Not only delicate and gracious Chinese cultural assets can be preserved and made available via the Internet, but this approach could be proposed as a new paradigm of academic researches based on digital and integrated information resources. The design and implementation phase is ongoing and we would like to illustrate in the EGEE User Forum. Academia SINICA Grid Computing Centre (ASGC) is in charge of building a new generation of Grid-based research infrastructure in Academia SINICA and in Taiwan based on EGEE and OSG as the Grid middleware. This infrastructure is a major component for the development and the deployment of the National Digital Archive Project (NDAP) providing long-term preservation of the digital contents and unified data access. These services will be built upon the e-Science infrastructure of Taiwan. The Storage Resource Broker (SRB) developed at SDSC, is a Middleware which enables scientists to create, manage and collaborate with flexible, unified "virtual data collections" that may be stored on heterogeneous data resources distributed across a network. The SRB system is the first and the largest (in terms of the data volume) data store in Academia SINICA right now. The system was deployed by ASGC in early 2004, which consists of 7 sites in different institutes, linked by a dedicated fibre campus network, and provided 60 TB capacities in total. In early 2006, it will expand to 120 TB. As of January 2006, more than 30 TB and 1.4 million files have been archived in the distributed mass storage environment. All files are also preserved in two copies on different sites. In this presentation, idea for utilizing Data Grid infrastructure for NDAP will be depicted and discussed. We will describe the use of SRB in building a collaborative environment for Data Grid Services of NDAP. In the environment, many data intensive applications are developed. We also describe our integration experience in building applications of NDAP. For each application we characterize the essential data virtualization services provided by the SRB for distributed data management.

Primary author

Mr Wei-Long Ueng (Academia SINICA Grid Computing Centre, Taiwan)

Co-authors

Mr Eric Yen (Academia SINICA Grid Computing Centre, Taiwan) Mr Hui-Min Lin (Academia SINICA Grid Computing Centre, Taiwan)

Presentation materials