26–30 Aug 2024
Aachen, Germany
Europe/Brussels timezone

File synchronization between Linux systems in Python with yarsync

29 Aug 2024, 09:25
20m
Aachen, Germany

Aachen, Germany

Erholungs-Gesellschaft Reihstraße 13, 52062 Aachen

Speaker

Yaroslav Nikitenko

Description

Yet Another Rsync is a Python wrapper around a well-established Linux tool rsync with a simple and familiar interface of git. Python allows us to create a higher-level instrument, which is safer and sometimes more efficient than the original binary.

While many data analysts today heavily use databases and rely on cloud computing, other approaches have also their benefits. Many data kinds are difficult to represent in relational databases or it takes time to do that. Files in a user-defined format become a simpler and more general solution, which is often less expensive and error prone. Linux servers take a considerable share today, and many data analysts also use Linux as a good programming environment. Our approach is inspired by data analysis workflow in HEP. We shall tell about creating data repositories with yarsync, relevant rsync features and how the tool will assist against possible problems in data synchronization.

Author

Presentation materials