Prof. Gang CHEN (INSTITUTE OF HIGH ENERGY PHYSICS)Dr Wenjing Wu (IHEP, CAS)
The limitation of scheduling modules and the gradual addition of disk pools in distributed storage systems often result in imbalances among their disk pools in terms of both available space and number of files. This can cause various problems to the storage system such as single point of failure, low system throughput and imbalanced resource utilization and system loads. An algorithm named Fuzzy Pool Balance (FPB) is proposed here to solve this problem. The input of FPB is the current file distribution among disk pools and the output is a file migration plan indicating what files are to be migrated to which pools. FPB uses an array to classify the files by their sizes. The file classification array is dynamically calculated with a defined threshold named Tmax which defines the allowed available space deviations of disk pools. File classification is the basis of file migration. FPB also defines the Immigration Pool (IP) and Emigration Pool (EP) according to the available space of the disk pools and File Quantity Ratio (FQR) which indicates the percentage of each category of files in each disk pool, so files with higher FQR in an EP will be migrated to IP(s) with a lower FQR of this file category. To verify this algorithm, we implemented FPB on an ATLAS Tier2 dCache production system which hosts 12 distributed disk pools with 300TB of storage space. The results show that FPB can achieve a very good balance among the disk pools, and a tradeoff between available space and file quantity can be achieved by adjusting the threshold value Tmax and the correction factor to the average FQR.
Dr Wenjing Wu (IHEP, CAS)
Prof. Gang CHEN (INSTITUTE OF HIGH ENERGY PHYSICS)