Speaker
Christopher Jung
(KIT - Karlsruhe Institute of Technology (DE))
Description
Modern data processing increasingly relies on data locality for performance and scalability, whereas the common HEP approaches aim for uniform resource pools with minimal locality, recently even across site boundaries.
To combine advantages of both, the High Performance Data Analysis (HPDA) Tier 3 concept opportunistically establishes data locality via coordinated caches.
In accordance with HEP Tier 3 activities, the design incorporates two major assumptions:
1. Only a fraction of data is accessed regularly and thus the deciding factor for overall throughput.
2. Data access may fallback to non-local, making permanent local data availability an
inefficient resource usage strategy.
Based on this, the HPDA design generically extends available storage hierarchies into the batch system.
Using the batch system itself for scheduling file locality, an array of independent caches on the worker nodes is dynamically populated with high-profile data.
Cache state information is exposed to the batch system both for managing caches and scheduling jobs.
As a result, users directly work with a regular, adequately sized storage system.
However, their automated batch processes are presented with local replications of data whenever possible.
We highlight the potential and limitations of currently available technologies in light of HEP Tier 3 activities, showcase the current design and implementation of the HPDA data locality, and present first experiences with our prototype.
Primary author
Max Fischer
(KIT - Karlsruhe Institute of Technology (DE))
Co-authors
Christopher Jung
(KIT - Karlsruhe Institute of Technology (DE))
Eileen Kuhn
(KIT - Karlsruhe Institute of Technology (DE))
Gunter Quast
(KIT - Karlsruhe Institute of Technology (DE))
Manuel Giffels
(KIT - Karlsruhe Institute of Technology (DE))