Mar 6 – 8, 2023
Europe/Zurich timezone

From 1VM+1LUN to k8s+Ceph - the uneasy way...

Mar 8, 2023, 9:15 AM
15m
Presentation Scalable Storage Backends for Cloud, HPC and Global Science Scalable Storage Backends

Speaker

Krzysztof Wadówka (PSNC)

Description

From 1VM+LUN to k8s+S3 - an uneasy way…

Since 2015 PSNC has provided a sync & share service for science and academia in Poland, based on the Seafile software. We started small by running a 1VM+1LUN setup and community version of the software, integrated with the local PSNC’s LDAP. In 2016 we began to build a fully-fledged setup based on a cluster of application servers, background jobs servers, DB servers, and a dedicated 2-servers GPFS cluster as a storage backend, that become operational in 2017 and serves most of our users until today. We are currently migrating our service towards the most modern and fancy setup based on k8s and Ceph/S3.

In our presentation, we discuss the experiences and observations on the impact that changing cloud and storage technologies and infrastructure compute/storage infrastructures’ features have on your system, services, data, and users, while you are trying to follow the trends in the system architectures, services deployment approaches, management practices, etc. We also discuss the pros and cons of the simplified 1-VM setup vs the bare-metal multiserver infrastructure vs the fully containerized setup with lots of automation.

Surprisingly (or not) we faced the unobvious and uneasy-to-accept fact, that the ‘ancient’ simplistic setup of our sync & share system required the least effort to keep it up, and caused almost no operational issues, faults & failures over 8+ years of operation, while the complexity of the management processes such as data and users migration, systems and application upgrades grows non-linearly with growing the level of ‘fanciness’ and ‘intelligence’ of the infrastructure and application setup. Obviously, it would not be fair to say that the capabilities of a 1VM+1LUN platform intended to serve ~500 users with <40TB data and ~8 million files are comparable to a fully-fledged clustered setup for 1000s of users with ~1PB storage and quarter billion files.

Therefore we will explain the reasoning behind the design decisions made since the start of the minimalistic service, where the HA features were based on a rock-solid 1VM + hypervisor + orchestration platform :), through its extension to a full bare metal setup comprising almost a whole rack of servers and disk arrays junk ;), until the cutting-edge setup based on top-down software-defined infrastructure including k8s-fuelled containers, software-defined storage back-end (Ceph with S3 gateways) and SDN user for network mgmt.

We will also discuss the impact that particular decisions had on the complexity of the system management with a special focus on the sync & share application and underlying operating systems and platforms (DB engines) upgrades. For this purpose we will provide a deep dive into the process of upgrading Seafile 7.x to Seafile 9 along with the required operating systems updates and users' and users’ data migration from GPFS-based POSIX-speaking backend to a Ceph-based S3/object storage backend.

We will also overview our efforts on the preparation of a fully containerized setup allowing us to deploy, manage and maintain an arbitrary number of testing, development, and production Seafile instances, ensuring full coverage of system manageability, high availability, scalability, high performance, and data security.

Primary authors

Presentation materials