Jack Cranshaw (Argonne National Laboratory (US))
Choices in persistent data models and data organization have significant performance ramifications for data-intensive scientific computing. In experimental high energy physics, organizing file-based event data for efficient per-attribute retrieval may improve the I/O performance of some physics analyses but hamper the performance of processing that requires full-event access. In-file data organization tuned for serial access by a single process may be less suitable for opportunistic sub-file-based processing on distributed computing resources. Unique I/O characteristics of high-performance computing platforms pose additional challenges. This paper describes work in the ATLAS experiment at the Large Hadron Collider to provide an I/O framework and tools for persistent data organization to support an increasingly heterogenous array of data access and processing models.
Collaboration ATLAS (ATLAS)