Feb 13 – 17, 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

Managing small files in Mass Storage systems using Virtual Volumes

Feb 15, 2006, 2:40 PM
20m
D405 (Tata Institute of Fundamental Research)

D405

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
oral presentation Computing Facilities and Networking Computing Facilities and Networking

Speaker

Prof. Manuel Delfino Reznicek (Port d'Informació Científica)

Description

Efficient hierarchical storage management of small size files continues to be a challenge. Storing such files directly on tape-based tertiary storage leads to extremely low operational efficiencies. Commercial tape virtualization products are few, expensive and only proven in mainframe environments. Asking the users to deal with the problem by “bundling” their files leads to a plethora of solutions with high maintenance costs. Part of the problem is that data processing environments have evolved towards the illusion of an infinite file store with a subdirectory structure, eliminating the concept of a volume, be it physical or logical. Research has been undertaken to deal with these issues at the data center level, but the outcome is quite simple and can be used in general. Results are presented of prototype implementations of a paradigm termed "Virtual Volumes", which combines standard operating system tools such as symbolic links and auto-mounters together with techniques to represent a volume as a file such as the ISO 9660 specification. Virtual Volumes allow large number of files to be handled as a single item in tertiary tape storage systems, whilst maintaining the infinite file store illusion towards the user by mounting these single items as branches within a file system. Whereas a totally general implementation of Virtual Volumes would require quite complex coding, the prototypes presented are optimized for the Write-Once-Read-Many (WORM) environment often found in scientific data applications. The choice to base the Virtual Volume implementation on standard operating system tools and techniques means that it can be easily combined with already existing or future tools used in HEP experiments and Grid Infrastructures. Examples are given of handling HEP data using Virtual Volumes integrated into the existing data frameworks of several experiments.

Primary author

Prof. Manuel Delfino Reznicek (Port d'Informació Científica)

Co-authors

Dr Andreu Pacheco (Institut de Física d'Altes Energies) Prof. Emilio Hernández (Universidad Simón Bolívar) Ms Esther Acción (Port d'Informació Científica)

Presentation materials