Speaker
Jan Balewski
(Lawrence Berkeley National Lab. (US))
Description
PDSF, the Parallel Distributed Systems Facility, has been in continuous operation since 1996 serving high-energy and nuclear physics research. It is currently a tier-1 site for STAR, a tier-2 site for ALICE, and a tier-3 site for ATLAS. We are in the process of migrating the PDSF workload from the existing commodity cluster to the Cori Cray XC40 system. Docker containers enable running the PDSF software stack on different hardware and OS. We will discuss challenges of using highly scalable Cori resources when thousands of user jobs can start within a second and easily saturate IO resources, CVMFS, or external database connectivity.
Desired length | 15 |
---|
Author
Jan Balewski
(Lawrence Berkeley National Lab. (US))
Co-authors
Georg Rath
(Lawrence Berkeley National Laboratory)
Jeff Porter
(Lawrence Berkeley National Lab. (US))
Rei Lee
(Lawrence Berkeley National Laboratory)
Tony Quan
(LBL)