Speaker
Description
In this talk we are going to describe the operational and conceptual design of the bulk archive management system involved in prototyping activities of the Cherenkov Telescope Array Observatory (CTAO) Bulk Archive. This particular archive in the CTA Observatory takes care of storage and management of the lower data level products coming from the Cherenkov telescopes, incuding their cameras, auxiliary subsystems and simulations. Scientific raw data produced from the two CTAO telescope sites, one in the Northern hemisphere and the second in the Southern, will be transferred to four off-site data centers where they will be accessed and automatically reduced to higher level data products. This Archive system will provide a set of tools based on the OAIS (Open Archive) standards, including a data transfer system, a general and replicated catalog to be queried, an easy interface to retrieve and access data as well as a customized and versatile data organization depending on the user requirements.
We have already developed the first version of the user’s interface based on Rucio package which can perform three basic functions according to the CTA Observatory Requirements: file ingestion, search and retrieval. This version has been deployed and tested successfully in a DESY (Zeuthen) test-cluster with a pre-installed Kubernetes framework.
• a newly recorded data file is ingested as a replica into Rucio cluster using a JSON schema file as input and it automatically acquires a unique Physical File Name (PFN) provided by Rucio; a second replica can be created upon demand as a back up for safety reasons
• an already ingested file can be searched through its DID («scope»:«filename») or its unique PFN
• finally, an already ingested file can be retrieved and saved locally using the existing interface again only through its filename
Currently, we are working on a database kubernetes module in order to store all relevant metadata information scanned during the ingestion, in a dedicated external non relation database. This module will extend the Release 0 BDMS functionalities and allow searching and retrieving archived data by their physical information. The database technology currently used for this scope is a replicated and sharded cluster of RethinkDB. We are about to build it on a K8s instance in order to test developed interfaces in the next coming weeks.