25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

A Rucio-Based Global Data Lake for the SKA Regional Centre Network

25 May 2026, 16:51
18m
Chulalongkorn University

Chulalongkorn University

Oral Presentation Track 1 - Data and metadata organization, management and access Track 1 - Data and metadata organization, management and access

Speaker

James Collinson (SKAO)

Description

The Square Kilometre Array (SKA) telescopes, currently under construction in South Africa and Australia, are due to enter Science Verification at the end of 2026. From this point, these interferometers will generate an increasing volume of data, with the science data processors eventually producing of order 1 PB per day of science-ready data products. Managing this archive across the globally federated SKA Regional Centre Network (SRCNet) of data centres is a key challenge in enabling timely and reliable access to SKA data for the astronomy community.

To address this, the SRCNet data lake is built around Rucio and FTS as its core data management and transfer technologies, complemented by a suite of auxiliary services tailored to SKA-specific requirements. In this talk, we outline the SRCNet data lake use case and describe the key challenges encountered when adapting Rucio to this environment. We highlight the supporting services developed and summarise the full data lifecycle, from ingestion at SKA Observatory interfaces, through global replication and distribution, to staging for scientific processing at SRCNet sites.

In contrast to traditional High Energy Physics workflows, where data access is typically organised around predefined datasets, astronomy requires multi-mission discovery with tight integration between physical replica management and rich, standards-based metadata systems, alongside support for proprietary data embargoes. We discuss the mechanisms implemented to manage and expose science metadata within the data lake and to control access to embargoed data products, both increasingly important requirements for large-scale distributed astronomical archives. These experiences are expected to be relevant to the wider community considering Rucio-based data lakes with complex metadata and federation requirements.

Author

Co-authors

Presentation materials

There are no materials yet.