9-13 July 2018
Sofia, Bulgaria
Europe/Sofia timezone

Experience running IceCube simulation workloads on the Titan supercomputer

9 Jul 2018, 11:00
15m
Hall 7 (National Palace of Culture)

Hall 7

National Palace of Culture

presentation Track 3 – Distributed computing T3 - Distributed computing

Speaker

David Schultz (University of Wisconsin-Madison)

Description

IceCube Neutrino Observatory is a neutrino detector located at the South Pole. Here we present experiences acquired when using HTCondor to run IceCube’s GPU simulation worksets on the Titan supercomputer. Titan is a large supercomputer geared for High Performance Computing (HPC). Several factors make it challenging to use Titan for IceCube’s High Throughput Computing (HTC) workloads: (1) Titan is designed for MPI applications, (2) Titan scheduling policies heavily favor very large resource reservations, (3) Titan compute nodes run a customized version of Linux, (4) Titan compute nodes cannot access outside network. In contrast, IceCube’s simulation workloads consist of large numbers of relatively small independent jobs intended to run in standard Linux environments, and may require connectivity to public networks. Here we present how we leveraged HTCondor batch scheduler within Singularity containers to provide an HTC-friendly interface to Titan suitable for IceCube’s GPU workloads.

Primary authors

Vladimir Brik (University of Wisconsin at Madison) David Schultz (University of Wisconsin-Madison) Gonzalo Merino (IceCube)

Presentation Materials