19–25 Oct 2024
Europe/Zurich timezone

Data Placement Optimization for ATLAS in a Multi-Tiered Storage System within a Data Center

24 Oct 2024, 14:06
18m
Room 1.B (Medium Hall B)

Room 1.B (Medium Hall B)

Talk Track 1 - Data and Metadata Organization, Management and Access Parallel (Track 1)

Speakers

Carlos Fernando Gamboa (Brookhaven National Laboratory (US)) Carlos Fernando Gamboa (Department of Physics-Brookhaven National Laboratory (BNL)-Unkno)

Description

Scientific experiments and computations, especially in High Energy Physics, are generating and accumulating data at an unprecedented rate. Effectively managing this vast volume of data while ensuring efficient data analysis poses a significant challenge for data centers, which must integrate various storage technologies. This paper proposes addressing this challenge by designing a multi-tiered storage model that employs diverse storage technologies tailored to different data needs, thereby addressing data classification, placement, and migration.
While users and administrators manually optimize storage by migrating data based on simple rules derived from human knowledge, decisions, and basic usage statistics, evaluating the placement of data in different storage classes with I/O-intensive workloads remains a complex task. To overcome this challenge and address existing limitations, we have developed a precise data popularity prediction model utilizing state-of-the-art AI/ML techniques. This model is crafted from the analysis of ATLAS data and access patterns. It enables us to migrate infrequently accessed data to more economical storage media, such as tape drives, while storing frequently accessed data on faster yet costlier storage media like HDD or SSD. This strategic approach ensures data is placed optimally into the appropriate storage classes, thereby maximizing storage capacity while minimizing data access latency for end-users. Additionally, we provide insights and explore potential implementations of an autonomous multi-tiered storage system on the storage infrastructure at BNL, leveraging dCache technology. Furthermore, we will discuss the outcomes and compare different implementation strategies.

Primary authors

Carlos Fernando Gamboa (Brookhaven National Laboratory (US)) Carlos Fernando Gamboa (Department of Physics-Brookhaven National Laboratory (BNL)-Unkno) Imran Latif (Brookhaven National Laboratory) James Leonardi (Brookhaven National Laboratory) Qiulan Huang (Brookhaven National Laboratory (US)) Shinjae Yoo Vincent Garonne (Brookhaven National Laboratory (US))

Presentation materials