Speaker
Description
The High Luminosity upgrade to the LHC (HL-LHC) is expected to generate scientific data on the scale of the multiple exabytes. To address this unprecedented data storage challenge, the ATLAS experiment launched the Data Carousel project in 2018, which entered production in 2020. In the Data Carousel workflow, jobs receive input data from tapes seamlessly for user payloads. It represents a fundamental shift from the traditional archival-only model toward a production system executing tens of thousands of tape recalls across multiple sites on a daily basis. A key challenge in the Data Carousel model is how to achieve high tape bandwidth utilization during recall operations, through sustained stream reads and optimized tape mounts. This requires intelligent grouping of files that are likely to be recalled together, so called “smart writing”. To implement smart writing, sites depend on archival metadata provided by the experiment to supply grouping hints. In this paper, we present our recent analysis of tape archival metadata using the ATLAS Run3 recall history, highlighting patterns and correlations that can inform future data-placement and grouping strategies.