Speaker
Description
Lattice QCD (LQCD) is a well-established non-perturbative approach to solving the quantum chromodynamics (QCD) theory of quarks and gluons. It is understood that future LQCD calculations will require exascale computing capacities and workload management system (WMS) in order to manage them efficiently.
In this talk we will discuss the use of the PanDA WMS for LQCD simulations. The PanDA WMS was developed by the ATLAS Experiment at the LHC to manage data analysis and detector simulations on distributed and heterogeneous computing resources which include hundreds of Grid and Cloud sites, as well as HPC machines. Currently PanDA is also used for projects and experiments outside of ATLAS.
For this project we have created a prototype on Titan supercomputer at the Oak Ridge Leadership Computing Facility (OLCF). In order to provide communication with PanDA server as well as job submissions to the local batch system we have deployed dedicated PanDA edge services on Titan’s data transfer nodes. The system was tested with realistic LQCD workloads submitted via PanDA server instance running in the Docker container at the OLCF. In our talk we will present results of these tests and discuss future plans for extending current setup to other HPC sites.