17-21 October 2016
Experience of Development and Deployment of a Large-Scale Ceph-Based Data Storage System at RAL

19 Oct 2016, 14:25
Building 50 Auditorium (LBNL)

Berkeley, CA 94720
Bruno Canning (RAL)


A new data storage system, Echo, has been developed as a replacement for CASTOR disk-only storage of LHC data at the RAL Tier-1 for the past two years. This presentation will share the RAL experience of developing and deploying a new, ceph-based storage service at the 13 PB scale to the standard required for production use.

This is the first new service that we have developed at this scale for some time and ceph is a very different technology from our existing storage solution. This presentation will explore the changes required to accommodate such a service: from the location of servers in the data centre; development of the network topology and the effect this has on data placement; the design and construction of a system that is more manageable, maintainable and upgradable by a system administrator; the adaptation of existing software in order to support LHC VO workflows and the implementation of new software to support industry standard protocols for both LHC VOs and other user communities. I will also discuss the changes brought by the deployment of a new OS major version and the change from sysVinit to systemd for process management, the changes to monitoring and alerting required to support the continuous operation of the service and the risks and impacts of transitioning to this technology.

