21–25 Aug 2017
University of Washington, Seattle
US/Pacific timezone

Design and implementation of data cache and access system across remote sites

21 Aug 2017, 17:10
20m
Auditorium (Alder Hall)

Auditorium

Alder Hall

Oral Track 1: Computing Technology for Physics Research Track 1: Computing Technology for Physics Research

Speakers

Yaodong CHENG (IHEP, Beijing) Yaodong Cheng (Chinese Academy of Sciences (CN))

Description

Distributed computing system is widely used in high energy physics such as WLCG. Computing job is usually scheduled to the site where the input data was pre-staged in using file transfer system. It will lead to some problems including low CPU utility for some small sites lack of storage capacity. Futhermore, It is not flexible in dynamic cloud computing environment. Virtual machines will be created in different cloud platforms on demand. VM need access data immediately after it is created. Cloud platforms may be located in different places for example commercial clouds such as EC2, or private clouds such as CERNCloud and IHEPCloud. It is not possible to stage in data to all cloud platforms before the VM is created. So we designed and implemented a remote data access system based on streaming and cache mechanism. The goal of the system is to export data in one site to remote sites like the behavior of NFS exporting data from one host to others. The system is called LEAF, which means it is one extension of one site storage system. LEAF system is composed of three components, including storage gateway, cache daemon and client module. Storage gateway is deployed in main site which exports specified data repositories. Data repositories is a list of local directory or file system space such as HDFS or EOS. Cache daemon is deployed in remote site which receives requests from client module and then get data from storage gateway in main site. Data is trasferred using high performance HTTP connection supported by tornado web framework. Client module is implemented as a file system based on FUSE. Most of file system semantics are handled in local cache daemon, so LEAF system can achieve better performance than directly mounted file system at remote site. The testbed was deployed at two sites in about 2000KM distance. The test result showed LEAF has about 5 times faster than traditinal file system such as EOS. The paper will describe the architecuture, key technologies, implementation, use cases and performance evaluation of LEAF system.

Primary author

Yaodong CHENG (IHEP, Beijing)

Presentation materials

Peer reviewing

Paper