10–14 Oct 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

Evolution of the Ceph Based Storage Systems at the RACF

11 Oct 2016, 15:30
1h 15m
San Francisco Marriott Marquis

San Francisco Marriott Marquis

Poster Track 4: Data Handling Posters A / Break

Speaker

Alexandr Zaytsev (Brookhaven National Laboratory (US))

Description

Ceph based storage solutions and especially object storage systems based on it are now well recognized and widely used across the HEP/NP community. Both object storage and block storage layers of Ceph are now supporting production ready services for HEP/NP experiments at many research organizations across the globe, including CERN and Brookhaven National Laboratory (BNL), and even the Ceph file system (CephFS) storage layer is now used for that purpose at the RHIC and ATLAS Computing Facility (RACF) at BNL for more than a year. This contribution gives a detailed status report and the foreseen evolution path for the 1 PB scale (by usable capacity, taking into account the internal data redundancy overhead) Ceph based storage system provided with Amazon S3 complaint RADOS gateways, OpenStack Swift to Ceph RADOS API interfaces, and dCache/xRootD over CephFS gateways that is operated in RACF since 2013. The system is currently consisting of two Ceph clusters deployed on top of a heterogeneous set of RAJD arrays altogether containing more than 3.8k 7.2krpm HDDs (one cluster with iSCSI / 10 GbE storage interconnect and another one - with 4 Gb/s Fibre Channel storage interconnect) each provided with an independent IPoIB / 4X FDR Infiniband based fabrics for handling the internal storage traffic. The plans are being made to further increase the scale of this installation up to 5.0k 7.2krpm HDDs and 2 PB of usable capacity before the end of 2016. We also report the performance and stability characteristics observed with our Ceph based storage systems over the last 3 years, and lessons learnt from this experience. The prospects of tighter integration of the Ceph based storage systems with the BNL ATLAS dCache storage infrastructure and the work being done to achieve it are discussed as well.

Primary Keyword (Mandatory) Object stores
Secondary Keyword (Optional) Storage systems
Tertiary Keyword (Optional) Distributed data handling

Authors

Alexandr Zaytsev (Brookhaven National Laboratory (US)) Hironori Ito (Brookhaven National Laboratory (US))

Co-authors

Tejas Rao (Brookhaven National Laboratory) Tony Wong (Brookhaven National Laboratory) Xin Zhao (Brookhaven National Laboratory (US))

Presentation materials