ACAT 2024

Name: ACAT 2024
Start: 2024-03-11T08:00:00-04:00
End: 2024-03-15T14:30:00-04:00
Location: Charles B. Wang Center, Stony Brook University

11–15 Mar 2024

Charles B. Wang Center, Stony Brook University

US/Eastern timezone

Contact

acat-loc2024@cern.ch

AI-based Data Popularity, Placement Optimization for a Novel Multi-tiered Storage System at BNL/SDCC Facility

14 Mar 2024, 17:10

20m

Lecture Hall 2 ( Charles B. Wang Center, Stony Brook University )

Lecture Hall 2

Charles B. Wang Center, Stony Brook University

100 Circle Rd, Stony Brook, NY 11794

Oral Track 2: Data Analysis - Algorithms and Tools Track 2: Data Analysis - Algorithms and Tools

Qiulan Huang (Brookhaven National Laboratory (US))

Scientific experiments and computations, particularly in Nuclear Physics (NP) and High Energy Physics (HEP) programs, are generating and accumulating data at an unprecedented rate. Big data presents opportunities for groundbreaking scientific discoveries. However, managing this vast amount of data cost-effectively while facilitating efficient data analysis within a large-scale, multi-tiered storage architecture poses a significant challenge for the Scientific Data and Computing Center (SDCC).
The storage team is currently addressing optimization challenges related to data classification, placement, and migration in the existing multi-tier storage system. While users and administrators manually optimize storage by migrating data based on simple rules derived from human knowledge, decisions, and basic usage statistics, evaluating the placement of data in different storage classes with I/O-intensive workloads remains a complex task.
To overcome the aforementioned challenge and address existing limitations, we have developed a precise data popularity prediction model utilizing state-of-the-art AI/ML techniques. Additionally, we have designed a data placement policy engine based on data popularity, allowing us to migrate infrequently accessed data to more economical storage media, such as tape drives, while storing frequently accessed data on faster yet costlier storage media like HDD or SSD. This strategy optimally places data into the proper storage classes, maximizing storage capacity while minimizing data access latency for end users. This paper delves into the analysis of the data, demonstration patterns, tag files. Specifically, we detail the design and development of an accurate AI/ML prediction model to forecast future data popularity, based on an analysis of access patterns, facilitating optimal data movement and placement. Additionally, we provide insights into the implementation of a policy engine and data placement tool to execute automated migration actions. Finally, the evaluation of different strategies is illustrated, including those involving AI/ML models,etc.

Qiulan Huang (Brookhaven National Laboratory (US))

Mr James Leonardi (Brookhaven National Laboratory) Dr Vincent Garonne (Brookhaven National Laboratory) Dr Shinjae Yoo (Brookhaven National Laboratory)

AI-based Data Popularity, Placement Optimization for a Tiered Storage architecture at BNL:SDCC Facility.pdf

ACAT 2024

Contact

AI-based Data Popularity, Placement Optimization for a Novel Multi-tiered Storage System at BNL/SDCC Facility

Lecture Hall 2

Charles B. Wang Center, Stony Brook University

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

ACAT 2024

Contact

Speaker

Description

Author

Co-authors

Presentation materials