Speaker
Description
The Instituto de Física Corpuscular (IFIC) is a joint research center of the Spanish National Research Council (CSIC) and the University of Valencia, focused on fundamental physics, from particle physics to cosmology. It hosts over 400 researchers, engineers, and technical staff working on national and international projects.
In this talk, we will present how IFIC manages two compute clusters, GLUON (CPU) and Artemisa (GPU), using HTCondor. These clusters serve both internal users and external collaborators, and support a wide range of workloads, from classical simulations to deep learning applications. We will describe the general architecture of each pool, our strategies for efficient GPU and CPU resource allocation, the management of usage policies and priorities, as well as some lessons learned from operating a hybrid infrastructure.
Additionally, we will describe how we handle parallel jobs over InfiniBand in GLUON alongside traditional serial jobs through HTCondor’s vanilla universe.
| Desired slot length | 20 minutes |
|---|---|
| Speaker release | Yes |