10–14 Oct 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

Application of StoRM+Lustre storage system in IHEP’s distributed computing

11 Oct 2016, 15:30
1h 15m
San Francisco Marriott Marquis

San Francisco Marriott Marquis

Poster Track 4: Data Handling Posters A / Break

Speaker

Dr Tian Yan (Institution of High Energy Physics, Chinese Academy of Science)

Description

The distributed computing system in Institute of High Energy Physics (IHEP), China, is based on DIRAC middleware. It integrates about 2000 CPU cores and 500 TB storage contributed by 16 distributed cites. These sites are of various type, such as cluster, grid, cloud and volunteer computing. This system went into production status in 2012. Now it supports multi-VO and serves three HEP experiments: BESIII, CEPC and JUNO.

Several kinds of storage element (SE) are used in IHEP’s distributed computing system, such as dCache, BeStMan and StoRM. In IHEP site, a dCache SE with 128 TB storage capacity served as central grid storage since 2012. The local Lustre storage at IHEP hosts about 4PB data for the above three experiments. Physics data, such as random trigger data and DST data, were uploaded to this dCache SE manually and transferred to remote SEs. Output data of jobs were uploaded to this SE by job wrapper, and then downloaded to local Lustre storage by end user.

To integrate grid storage and local Lustre storage, a scheme of StoRM+Lustre storage system was deployed and tested since 2014. StoRM is a lightweight, scalable, flexible and SRMv2 compliant storage resource manager for disk based storage system. It works on each POSIX file systems, and can take advantage of high performance storage systems based on cluster file system like Lustre. StoRM support both standard Grid access and direct access on data, and it relies on the underlying file system structure to identify the physical data position, instead of querying any databases. These features help us to integrate the grid storage in distributed computing and high capacity Lustre storage systems in each site.

With such a StoRM+Lustre architecture, in which StoRM plays as a role of frontend to the Grid environment, while Lustre as a backend of local accessible, massive and high-performance storage, users and jobs will feel a nearly unified storage interface. Both local and remote users/jobs exchange data with Lustre storage essentially, without manually data movement between a grid SE and local storage system. Moreover, this architecture can used to expose physics data in local Lustre to remote sites, therefore it’s a convenient way of sharing data between geographically distributed Lustre file systems.

A StoRM+Lustre instance has been setup at IHEP site, with 66 TB storage capacity. We performed several tests in the past year to assure its performance and reliability, including extensive data transfer test, massive distributed job I/O test, and large-scale concurrency pressure test. A performance and pressure monitoring system was developed for these tests. The testing result is positive. This instance has already been in production since Jan 2015. It shows good reliability in the past months, and plays an important role in Monte-Carlo production as well as data transferring between IHEP and remote sites.

Primary Keyword (Mandatory) Distributed data handling
Secondary Keyword (Optional) Computing middleware
Tertiary Keyword (Optional) Storage systems

Primary author

Dr Tian Yan (Institution of High Energy Physics, Chinese Academy of Science)

Co-authors

Prof. Weidong Li (Institute of High Energy Physics, Chinese Academy of Sciences) Mr Xianghu Zhao (Institute of High Energy Physics, Chinese Academy of Sciences) Mr Xiaofei Yan (Institute of High Energy Physics, Chinese Academy of Sciences) Dr Xiaomei Zhang (Institute of High Energy Physics, Chinese Academy of Sciences)

Presentation materials