Speakers
Description
In the current ATLAS Distributed Computing model, available disk capacity is insufficient to store even a single complete copy of all data actively in use. Consequently, tape systems serve not only as long-term backups but also as primary data sources. Efficient utilization of tapes at the ATLAS scale requires specialized orchestration mechanisms, as tape access is inherently slower and operationally more complex than disk access. Once data are staged from tape, they must be efficiently shared among all sites requiring them and, when likely to be reused, temporarily retained on disk to avoid redundant recalls. To address these challenges, the Data Carousel system was developed to coordinate large-scale tape staging across the distributed infrastructure. Its core functionality includes automated creation, sharing, retention, and deletion of staging rules based on dataset usage; dynamic staging profiles to balance tape load; dashboards and alert mechanisms for real-time monitoring; and both manual and automated recovery procedures for common tape issues and downtimes. In this paper, we describe the overall architecture of the Data Carousel, provide detailed usage statistics, and present a recent comprehensive refactoring of the system that significantly expands its scope. The refactored implementation integrates more closely with other Distributed Data Management activities, improves scalability and reliability, and prepares the system for future challenges of Run 4 and the HL-LHC era.