Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

9–13 Jul 2018
Sofia, Bulgaria
Europe/Sofia timezone

Managing an heterogeneous scientific computing cluster with cloud-like tools: ideas and experience

10 Jul 2018, 16:00
1h
Sofia, Bulgaria

Sofia, Bulgaria

National Culture Palace, Boulevard "Bulgaria", 1463 NDK, Sofia, Bulgaria
Poster Track 7 – Clouds, virtualization and containers Posters

Speaker

Stefano Bagnasco (Istituto Nazionale di Fisica Nucleare, Torino)

Description

Current computing paradigms often involve concepts like microservices, containerisation and, of course, Cloud Computing.
Scientific computing facilities, however, are usually conservatively managed through plain batch systems and as such can cater to a limited range of use cases. On the other side, scientific computing needs are in general orthogonal to each other in several dimensions.
We have been operating the Open Computing Cluster for Advanced data Manipulation (OCCAM), a multi-purpose heterogeneous HPC cluster, for more than one year adopting a cloud-like paradigm. Each computing application is mapped to a dynamically expandable virtual farm, tuned and configured to the application’s needs and able to access special hardware like GPU accelerators or low-latency networks as needed, thus delivering computational frameworks that are well consolidated within the community (for a smooth end user experience) while leveraging modern computing paradigms.
By using mostly mainstream software tools like Docker (used throughout our architecture to run both service and computational tasks), Calico for virtual network management, Mesos and Marathon for orchestration and by exploiting some of the work made in the context of the INDIGO-DataCloud project, we aimed at minimising the development and maintenance effort, while using a high-quality software stack.
In this work we present the status of the system, operational experience, lessons learnt, and our outlook for further development. We will also present some preliminary performance comparisons between containerized and bare-metal scientific computing applications in an HPC environment.

Primary authors

Stefano Bagnasco (Istituto Nazionale di Fisica Nucleare, Torino) Sara Vallero (Universita e INFN Torino (IT)) Stefano Lusso (INFN-TO) Mr Matteo Concas (INFN e Politecnico di Torino (IT)) Prof. Marco Aldinucci (Dept. of Computer Science, University of Torino ) Dr Sergio Rabellino (Dept. of Computer Science, University of Torino )

Presentation materials