21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Name: 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)
Start: 2015-04-13T09:00:00+09:00
End: 2015-04-17T16:00:00+09:00
Location: OIST

13–17 Apr 2015

OIST

Asia/Tokyo timezone

Mean PB to Failure -- Initial results from a long-term study of disk storage patterns at the RACF

14 Apr 2015, 16:30

15m

C209 (C209)

C209

oral presentation Track3: Data store and access Track 3 Session

Christopher Hollowell (Brookhaven National Laboratory)

The RACF (RHIC-ATLAS Computing Facility) has operated a large, multi-purpose dedicated computing facility since the mid-1990's, serving a worldwide, geographically diverse scientific community that is a major contributor to various HEPN projects. A central component of the RACF is the Linux-based worker node cluster that is used for both computing and data storage purposes. It currently has nearly 50,000 computing cores and over 23 PB of storage capacity distributed over 12,000+ (non-SSD) disk drives. The majority of the 12,000+ disk drives provides a cost-effective solution for dCache/xRootd-managed storage, and a key concern is the reliability of this solution over the lifetime of the hardware, particularly as the number of disk drives and the storage capacity of individual drives grow. We report initial results of a long-term study to measure lifetime PB read/written to disk drives in the worker node cluster. We discuss the historical disk drive mortality rate, disk drive manufacturers' published MPBTF (Mean PB to Failure) data and how they are correlated to our results. The results helps the RACF understand the productivity and reliability of its storage solutions and has implications for other highly-available storage systems (NFS, GPFS, CVMFS, etc) with large I/O requirements.

Dr Tony Wong (Brookhaven National Laboratory)

Mr Alexandr Zaytsev (Brookhaven National Laboratory (US)) Christopher Hollowell (Brookhaven National Laboratory) Costin Caramarcu (Brookhaven National Laboratory (US)) Mr Tejas Rao (Brookhaven National Laboratory) William Strecker-Kellogg (Brookhaven National Lab)

Slides

Mean_PB_to_Failure.pdf

Mean_PB_to_Failure.pptx

21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Mean PB to Failure -- Initial results from a long-term study of disk storage patterns at the RACF

C209

C209

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Speaker

Description

Author

Co-authors

Presentation materials