Speaker
Vakho Tsulaia
(Lawrence Berkeley National Lab. (US))
Description
High performance computing facilities present unique challenges and opportunities for HENP event processing. The massive scale of many HPC systems means that fractionally small utilizations can yield large returns in processing throughput. Parallel applications which can dynamically and efficiently fill any scheduling opportunities the resource presents benefit both the facility (maximal utilization) and the (compute-limited) science. The ATLAS Yoda system provides this capability to HENP-like event processing applications by implementing event-level processing in an MPI-based master-client model that integrates seamlessly with the more broadly scoped ATLAS Event Service. Fine grained, event level work assignments are intelligently dispatched to parallel workers to sustain full utilization on all cores, with outputs streamed off to destination object stores in near real time with similarly fine granularity, such that processing can proceed until termination with full utilization. The system offers the efficiency and scheduling flexibility of preemption without requiring the application actually support or employ checkpointing. We will present the new Yoda system, its motivations, architecture, implementation, and applications in ATLAS data processing at several US HPC centers.
Author
Paolo Calafiura
(Lawrence Berkeley National Lab. (US))
Co-authors
Danila Oleynik
(Joint Inst. for Nuclear Research (RU))
Paul Nilsson
(Brookhaven National Laboratory (US))
Dr
Peter Van Gemmeren
(Argonne National Laboratory (US))
Sergey Panitkin
(Brookhaven National Laboratory (US))
Tadashi Maeno
(Brookhaven National Laboratory (US))
Dr
Torre Wenaus
(Brookhaven National Laboratory (US))
Vakho Tsulaia
(Lawrence Berkeley National Lab. (US))
Wen Guan
(University of Wisconsin (US))