Nov 4 – 8, 2019
Adelaide Convention Centre
Australia/Adelaide timezone

Data migration strategy based on file heat prediction with deep learning methods

Nov 5, 2019, 5:15 PM
Riverbank R8 (Adelaide Convention Centre)

Riverbank R8

Adelaide Convention Centre

Oral Track 4 – Data Organisation, Management and Access Track 4 – Data Organisation, Management and Access


Shiyuan Fu Shiyuan Fu


As a data-intensive computing application, high-energy physics requires storage and computing for large amounts of data at the PB level. Performance demands and data access imbalances in mass storage systems are increasing. Specifically, on one hand, traditional cheap disk storage systems have been unable to handle high IOPS demand services. On the other hand, a survey found that only a very small number of files have been active in storage for a period of time. Tiered storage architectures, such as tape, disk or solid state drives were used to reduce hardware purchase costs and power consumption.

As the amount of stored data grows, tiered storage requires data management software to migrate less active data to lower cost storage devices. Thus an automated data migration strategy is needed. At present, automatic data migration strategies such as LRU, CLOCK, 2Q, GDSF, LFUDA, FIFO, etc., are usually based on files’ recent access mode(such as file access frequency, etc.), are mainly used to resolve data migration between memory and disk. They need to run in the operating system kernel, so the rules are relatively simple. For file access mode does not take file life cycle trend into account, some regularly accessed files are often not predicted accurately. In addition, file history access records are not considered.

Data access requests are not completely random. They are driven by the behavior of users or programs. There must be association between different files that are accessed consecutively. This paper proposes a method of file access heat prediction. Data heat trend is used as the basis for migration to a relatively low-cost storage device. Due to the limitations of traditional models, it is difficult to achieve good results in predicting at such nonlinear scenes. This paper attempts to use the deep learning algorithm model to predict the evolution trend of data access heat. This paper discussed the implementation of some initial parts of the system, in particular the trace collector and the LSTM model. Then some preliminary experiments are conducted with these parts.

Consider for promotion No

Primary authors

Zhenjing Cheng (INSTITUE OF HIGH ENERGY PHYSICS) Lu Wang (Computing Center,Institute of High Energy Physics, CAS) Yaodong Cheng (IHEP) Gang CHEN (INSTITUTE OF HIGH ENERGY PHYSICS) Qingbao Hu (IHEP) Haibo li (Institute of High Energy Physics Chinese Academy of Science) Shiyuan Fu Shiyuan Fu

Presentation materials