Seafile - great but vendor lock-in? Evaluating Seafile openess in practice

29 Jan 2020, 14:35
20m
Presentation Scalable Storage Backends for Cloud, HPC and Global Science Scalable Storage Backends for Cloud, HPC and Global Science

Speaker

Maciej Brzezniak (PSNC Poznan Poland)

Description

Sync&share systems are widely used at universities and commercial institutions in order to address data storage and sharing as well as data synchronisation needs. Academic users mostly use open source solutions, while companies, especially SMEs prefer commercial products with paid support.
PSNC decided to use Seafile, a scalable, purpose-made, reliable and performant sync&share system. The main motivation for choosing Seafile was its high performance, low overheads and known reliability. PSNC has built the local pilot service in 2015 based on the community version of the software and
expanded it in 2016 to the coutry-wide production system, box.pionier.net.pl, using Seafile Pro, deployed in a fully redundant setup with two application servers, database cluster and cluster file system.
PSNC made the decision with awareness that using Seafile in an academic context my bring also challenges such as possible vendor lock-in including difficulties while opting-out from the paid version, obscurity of the code and more complicated integration with systems and applications around etc.
In our presentation we analyse the openness of Seafile in the context of data and meta-data migration. We will also shortly discuss Seafile server API, that can be used to integrate Seafile into software stacks. PSNC maintained two instances of the sync&share service since 2015. Over 2018 we made preparatory efforts in order to integrate both instances, which included migrating user’s data (organised in ‘libraries’) from the old system, based on community version to the new version based on Seafile Pro. While Seafile provides basic tools for exporting data from the system, exporting meta-data such as public share links to data objects and information on sharing data among named users is not supported by this tool.
In our presentation we will discuss features of the more advanced data migration tool we have developed at PSNC and tested on the demostration and production instances of the services. We will also discuss the experience related to the data migration process specific to Seafile as well as share general comments on the massive data migration. We will also show that the Seafile’s internal data and
meta-data organisation enables exploring relationships among data objects and users, including data sharing and other user-level meta-data as well as extract this important information for use of out Seafile.
While the migration tool is still a work in progress (e.g. for now we can only migrate external sharing links, user-level shares are not yet supported), the fact that we can export both data and important meta-data demonstrates that internal architecture is transparent enough to enable Seafile to be used as a technical component of the long-term service. By developing the tool for comprehensive data and meta-data migration we further decreasesd the lock-in risk related to vendor-specific data organisation.
Another possible approach to data migration in large scale is to use the Seafile API, that is documented and exposes complete functionality needed to build Seafile clients and user interfaces. It is in fact used to implement all Seafile clients including web interface, GUI, CLI and mobile clients and virtual drives. In
particular it could be used to access data objects and eplore sharing information. However developing migration solution based on API requires more effort and decision on taking this approach requires more analysis and goes beyond the capabilities of the system administrator team.

Primary authors

Maciej Brzezniak (PSNC Poznan Poland) Krzysztof Wadówka (PSNC) Eugeniusz Pokora (PSNC) Norbert Meyer (Unknown)

Presentation materials