Speakers
Description
Scientific experiments and computations, especially in High Energy Physics, are generating and accumulating data at an unprecedented rate. Effectively managing this vast volume of data while ensuring efficient data analysis poses a significant challenge for data centers, which must integrate various storage technologies. This paper proposes addressing this challenge by designing a multi-tiered storage model that employs diverse storage technologies tailored to different data needs, thereby addressing data classification, placement, and migration.
While users and administrators manually optimize storage by migrating data based on simple rules derived from human knowledge, decisions, and basic usage statistics, evaluating the placement of data in different storage classes with I/O-intensive workloads remains a complex task. To overcome this challenge and address existing limitations, we have developed a precise data popularity prediction model utilizing state-of-the-art AI/ML techniques. This model is crafted from the analysis of ATLAS data and access patterns. It enables us to migrate infrequently accessed data to more economical storage media, such as tape drives, while storing frequently accessed data on faster yet costlier storage media like HDD or SSD. This strategic approach ensures data is placed optimally into the appropriate storage classes, thereby maximizing storage capacity while minimizing data access latency for end-users. Additionally, we provide insights and explore potential implementations of an autonomous multi-tiered storage system on the storage infrastructure at BNL, leveraging dCache technology. Furthermore, we will discuss the outcomes and compare different implementation strategies.