Automation and Job Management for LZ Simulations at NERSC

The LUX-ZEPLIN (LZ) experiment is a world-leading direct dark matter detection experiment, implementing a dual-phase Xe Time Projection Chamber (TPC) design. The success of the experiment necessitates an in-depth characterization of the pertinent backgrounds, which in turn implies a heavy simulations burden. In this talk, I will present the infrastructure that was developed to allocate and manage the simulations workload on Perlmutter, NERSC’s most recent HPC facility. The pipeline includes a system to automatically generate production configurations based on requests from the simulations team, along with utilites to monitor job progress and success. A RabbitMQ queue is used to coordinate job dispatchement amongst a selection of workers running on specially allocated compute nodes, allowing for fine-grained control over the use of computational resources available.

Jacopo Siniscalco

