Speaker
Dr
Eric HJORT
(Lawrence Berkeley National Laboratory)
Description
This paper describes the integration of Storage Resource Management (SRM) technology
into the grid-based analysis computing framework of the STAR experiment at RHIC.
Users in STAR submit jobs on the grid using the STAR Unified Meta-Scheduler (SUMS)
which in turn makes best use of condor-G to send jobs to remote sites. However, the
result of each job may be sufficiently large that existing solutions to transfer data
back to the initiator site have not proven reliable enough in a user analysis mode or
would lock the computing resource (batch slot) while the transfer is in effect.
Using existing SRM technology, tailored for optimized and reliable transfer, is the
best natural approach for STAR, which is already relying on such technology for
massive (bulk) data transfer. When jobs complete the output files are returned to the
local site by a 2-step transfer utilizing a Disk Resource Manager (DRM) service
running at each site. The first transfer is a local transfer from the worker node
(WN) where the job is executed to a DRM cache local to the node, the second transfer
is from the WN local DRM cache to the initiator site DRM. The advantages of this
method include SRM management of transfers to prevent gatekeeper overload, release of
the remote worker node after initiating the second transfer (delegation) so that the
computation and data transfer can proceed concurrently, and seamless mass storage
access as needed by using a Hierarchical Resource Manager (HRM) to access HPSS.
Primary authors
Dr
Eric HJORT
(Lawrence Berkeley National Laboratory)
Dr
Jerome LAURET
(BROOKHAVEN NATIONAL LABORATORY)
Mr
Levente HAJDU
(BROOKHAVEN NATIONAL LABORATORY)
Co-authors
Dr
Alex SIM
(Lawrence Berkeley National Laboratory)
Dr
Arie SHOSHANI
(Lawrence Berkeley National Laboratory)
Dr
Doug OLSON
(Lawrence Berkeley National Laboratory)