Speaker
Karsten Schwank
(DESY)
Description
We report on the status of the data preservation project at DESY for the
HERA experiments and present the latest design of the storage which is a
central element for bit-preservation. The HEP experiments based at the
HERA acceleerator at DESY collected large and unique datasets during the
period 1992 to 2007. In addition, corresponding Monte Carlo simulation
datasets were produced, which are significantly larger by volume and still
being added to as the final analyses are completed. As part of the ongoing
DPHEP data preservation efforts at DESY, these data sets must be
transferred into storage systems that guarantee a reliable long term
access. At the same time, given that the experiments are still active,
easy access to the data must be guaranteed for the coming years.
The long term storage system is two-fold: an archive part where the
data exists on two tape copies and an online part where the full dataset
can be kept available and allows easy access to all HERA data. The archive
and online parts are physically separate. The demanding aspect of this
data is not only the size of about 1PB but also the large number (about 4
million) of files and the broad range of file sizes from a few KB to a few
hundred of GB. To achieve a high level of reliability, we use the dCache
distributed storage solution and make use of its replication capabilities
and tape interfaces. We describe the dCache installation with tape backend
that is used as mass storage together with an newly introduced small files
service that allows for the automatic creation of tape friendly container
files, containing many single (small) files. From the user's point of
view, this is done in a fully transparent way in terms of creation and
access to the data.
Authors
Dirk Kruecker
(DESY)
Karsten Schwank
(DESY)