For many scientific projects, data management is an increasingly complicated challenge. The number of data-intensive instruments generating unprecedented volumes of data is growing and their accompanying workflows are becoming more complex. Their storage and computing resources are heterogeneous and are distributed at numerous geographical locations belonging to different administrative domains and organizations. These locations do not necessarily coincide with the places where data is produced nor where data is stored, analyzed by researchers, or archived for safe long-term storage. To fulfill these needs, the data management system Rucio has been developed to allow the high-energy physics experiment ATLAS to manage its large volumes of data in an efficient and scalable way.
But ATLAS is not alone, and several diverse scientific projects have started evaluating, adopting, and adapting the Rucio system for their own needs. As the Rucio community has grown many improvements have been introduced, customisations have been added, and many bugs have been fixed. Additionally, new dataflows have been investigated and operational experiences have been documented. In this article we collect and compare the common successes, pitfalls, and oddities which arose in the evaluation efforts of multiple diverse experiments, and compare them with the ATLAS experience. This includes the high-energy physics experiments CMS and Belle II, the neutrino experiment DUNE, as well as the LIGO and SKA astronomical observatories.
|Consider for promotion||No|