Speaker
Jack Cranshaw
(Argonne National Laboratory (US))
Description
Choices in persistent data models and data organization have significant performance ramifications for data-intensive scientific computing.
In experimental high energy physics, organizing file-based event data for efficient per-attribute retrieval may improve the I/O performance of some physics analyses
but hamper the performance of processing that requires full-event access.
In-file data organization tuned for serial access by a single process may be less suitable for opportunistic sub-file-based processing on distributed computing resources.
Unique I/O characteristics of high-performance computing platforms pose additional challenges.
This paper describes work in the ATLAS experiment at the Large Hadron Collider to provide an I/O framework and tools for persistent data organization
to support an increasingly heterogenous array of data access and processing models.
Primary author
Collaboration ATLAS
(ATLAS)