Conference on Computing in High Energy and Nuclear Physics

Name: Conference on Computing in High Energy and Nuclear Physics
Start: 2024-10-19T08:00:00+02:00
End: 2024-10-25T18:30:00+02:00
Location: No location set

19–25 Oct 2024

Europe/Zurich timezone

Contact Program Chairs

chep2024-pc@cern.ch

Data Placement Optimization for ATLAS in a Multi-Tiered Storage System within a Data Center

24 Oct 2024, 14:06

18m

Room 1.B (Medium Hall B)

Talk Track 1 - Data and Metadata Organization, Management and Access Parallel (Track 1)

Carlos Fernando Gamboa (Brookhaven National Laboratory (US)) Carlos Fernando Gamboa (Department of Physics-Brookhaven National Laboratory (BNL)-Unkno)

Scientific experiments and computations, especially in High Energy Physics, are generating and accumulating data at an unprecedented rate. Effectively managing this vast volume of data while ensuring efficient data analysis poses a significant challenge for data centers, which must integrate various storage technologies. This paper proposes addressing this challenge by designing a multi-tiered storage model that employs diverse storage technologies tailored to different data needs, thereby addressing data classification, placement, and migration.
While users and administrators manually optimize storage by migrating data based on simple rules derived from human knowledge, decisions, and basic usage statistics, evaluating the placement of data in different storage classes with I/O-intensive workloads remains a complex task. To overcome this challenge and address existing limitations, we have developed a precise data popularity prediction model utilizing state-of-the-art AI/ML techniques. This model is crafted from the analysis of ATLAS data and access patterns. It enables us to migrate infrequently accessed data to more economical storage media, such as tape drives, while storing frequently accessed data on faster yet costlier storage media like HDD or SSD. This strategic approach ensures data is placed optimally into the appropriate storage classes, thereby maximizing storage capacity while minimizing data access latency for end-users. Additionally, we provide insights and explore potential implementations of an autonomous multi-tiered storage system on the storage infrastructure at BNL, leveraging dCache technology. Furthermore, we will discuss the outcomes and compare different implementation strategies.

Carlos Fernando Gamboa (Brookhaven National Laboratory (US)) Carlos Fernando Gamboa (Department of Physics-Brookhaven National Laboratory (BNL)-Unkno) Imran Latif (Brookhaven National Laboratory) James Leonardi (Brookhaven National Laboratory) Qiulan Huang (Brookhaven National Laboratory (US)) Shinjae Yoo Vincent Garonne (Brookhaven National Laboratory (US))

Data Placement Optimization for ATLAS in a Multi-Tiered Storage System within a Data Center.pdf

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Data Placement Optimization for ATLAS in a Multi-Tiered Storage System within a Data Center

Room 1.B (Medium Hall B)

Speakers

Description

Authors

Presentation materials

Choose timezone

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Speakers

Description

Authors

Presentation materials