Conference on Computing in High Energy and Nuclear Physics

Name: Conference on Computing in High Energy and Nuclear Physics
Start: 2024-10-19T08:00:00+02:00
End: 2024-10-25T18:30:00+02:00
Location: No location set

19–25 Oct 2024

Europe/Zurich timezone

Contact Program Chairs

chep2024-pc@cern.ch

Checkpoint-Restart for HPC

Not scheduled

18m

Talk Track 7 - Computing Infrastructure Parallel (Track 7)

Dr Madan Timalsina (NERSC/LBNL)

This presentation delves into the implementation and optimization of checkpoint-restart mechanisms in High-Performance Computing (HPC) environments, with a particular focus on Distributed MultiThreaded CheckPointing (DMTCP). We explore the use of DMTCP both within and outside of containerized environments, emphasizing its application on NERSC Perlmutter, a cutting-edge supercomputing system. The discussion highlights the benefits of checkpoint-restart (C/R) techniques in managing complex, long-duration computations, showcasing the efficiency and reliability of these methods. Based on Geant4, a crucial tool for High Energy and Nuclear Physics, these techniques have been thoroughly tested and have passed the assessments. We further examine the integration of HPC containers, such as Shifter and Podman-HPC, which enhance computational task management and ensure consistent performance across various environments. Through real-world application examples, we illustrate the advantages of DMTCP in multi-threaded and distributed computing scenarios. Additionally we present the methods and results, demonstrating the impact of C/R on resource utilization, the future directions of this research, and its potential across various scientific domains.

Dr Madan Timalsina (NERSC/LBNL)

Dr Johannes Blaschke (NERSC/LBNL) Dr Lisa Gerhardt (NERSC/LBNL) Dr Nicholas Tyler (NERSC/LBNL) Urjoshi Sinha (NERSC/LBNL) William Arndt, (NERSC/LBNL)

There are no materials yet.

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Checkpoint-Restart for HPC

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Speaker

Description

Author

Co-authors

Presentation materials