Prof. Jorge Rodiguez (Florida International University)Dr Yujun Wu (University of Florida)
The CMS experiment is expected to produce a few Peta Bytes of data a year and distribute them globally. Within the CMS computing infrastructure, most user analyses and the production of the Monte Carlo events will be carried out at some 50 CMS Tier-2 sites. The way how to store the data and to allow physicists to access them efficiently has been a challenge, especially for Tier-2 sites with limited storage resources. The CMS experiment, including other LHC experiments, has been using the dCache for successfully managing and distributing large amount of data. However, lacking the POSIX file access capability and being relatively slow in access using the dCache dcap protocol, there are some issues with large number of users trying to access the same files simultaneously. In this paper, we present our new implementation to continue to utilize the dCache as the frontend for data management and distribution and use the Lustre filesystem as the backend to provide users with the direct POSIX file access without going through the dCache file read protocol. The implementation fully utilizes the dCache HSM interface with additional functionalities for mapping files between the dCache and the Lustre file system. Running simple IO intensive ROOT file dumper user analysis jobs shows that the process time with the data through the Lustre filesystem is over 60% faster than that with the same data stored in the dCache. Furthermore, Lustre allows users to mount the filesystem remotely and this also provides an alternative way for data access by regional T3 sites. We believe this implementation will bring both an efficient file access technique and flexibility of the data hosting in an environment where the storage resources are limited.
Dr Yujun Wu (University of Florida)