CHEP04

Name: CHEP04
Start: 2004-09-27T08:30:00+02:00
End: 2004-10-01T18:00:00+02:00
Location: Interlaken, Switzerland

27 September 2004 to 1 October 2004

Interlaken, Switzerland

Europe/Zurich timezone

LHC data files meet mass storage and networks: going after the lost performance

29 Sept 2004, 10:00

Coffee (Interlaken, Switzerland)

Coffee

Interlaken, Switzerland

Board: 64

poster Track 4 - Distributed Computing Services Poster Session 2

L. Tuura (NORTHEASTERN UNIVERSITY, BOSTON, MA, USA)

Experiments frequently produce many small data files for reasons beyond their control, such as output splitting into physics data streams, parallel processing on large farms, database technology incapable of concurrent writes into a single file, and constraints from running farms reliably. Resulting data file size is often far from ideal for network transfer and mass storage performance. Provided that time to analysis does not significantly deteriorate, files arriving from a farm could easily be merged into larger logical chunks, for example by physics stream and file type within a configurable time and size window. Uncompressed zip archives seem an attractive candidate for such file merging and are currently tested by the CMS experiment. We describe the main components now in use: the merging tools, tools to read and write zip files directly from C++, plug-ins to the database system, mass-storage access optimisation, consistent handling of application and replica metadata, and integration with catalogues and other grid tools. We report on the file size ratio obtained in the CMS 2004 data challenge and observations and analysis on changes to data access as well as estimated impact on network usage.

L. Tuura (NORTHEASTERN UNIVERSITY, BOSTON, MA, USA) T. Barrass (Bristol University, UK) V. Innocente (CERN, PH/SFT)

paper

slides

CHEP04

LHC data files meet mass storage and networks: going after the lost performance

Coffee

Interlaken, Switzerland

Speaker

Description

Authors

Presentation materials