Speaker
Description
The Square Kilometre Array (SKA) telescopes, currently under construction in South Africa and Australia, are due to enter Science Verification at the end of 2026. From this point, these interferometers will generate an increasing volume of data, with the science data processors eventually producing of order 1 PB per day of science-ready data products. Managing this archive across the globally federated SKA Regional Centre Network (SRCNet) of data centres is a key challenge in enabling timely and reliable access to SKA data for the astronomy community.
To address this, the SRCNet data lake is built around Rucio and FTS as its core data management and transfer technologies, complemented by a suite of auxiliary services tailored to SKA-specific requirements. In this talk, we outline the SRCNet data lake use case and describe the key challenges encountered when adapting Rucio to this environment. We highlight the supporting services developed and summarise the full data lifecycle, from ingestion at SKA Observatory interfaces, through global replication and distribution, to staging for scientific processing at SRCNet sites.
In contrast to traditional High Energy Physics workflows, where data access is typically organised around predefined datasets, astronomy requires multi-mission discovery with tight integration between physical replica management and rich, standards-based metadata systems, alongside support for proprietary data embargoes. We discuss the mechanisms implemented to manage and expose science metadata within the data lake and to control access to embargoed data products, both increasingly important requirements for large-scale distributed astronomical archives. These experiences are expected to be relevant to the wider community considering Rucio-based data lakes with complex metadata and federation requirements.