Speaker
Dr
Sanjay Padhi
(UCSD)
Description
With the evolution of various grid federations, the Condor glide-ins represent a key
feature in providing a homogeneous pool of resources using late-binding technology.
The CMS collaboration uses the glide-in based Workload Management System, glideinWMS,
for production (ProdAgent) and distributed analysis (CRAB) of the data. The Condor
glide-in daemons traverse to the worker nodes, submitted via Condor-G. Once activated,
they preserve the Master-Worker relationships, with the worker first validating the
execution environment on the worker node before pulling the jobs sequentially until
the expiry of their lifetimes. The combination of late-binding and validation
significantly reduces the overall failure rate visible to CMS physicists.
We discuss the extensive use of the glideinWMS since the computing challenge, CCRC08,
in order to prepare for the forthcoming LHC data-taking period. The key features
essential to the success of large-scale production and analysis at CMS resources
across major grid federations, including EGEE, OSG and NorduGrid are outlined. Use of
glide-ins via the CRAB server mechanism and ProdAgent as well as first hand experience
of using the next generation CREAM computing element within the CMS framework is also
discussed.
Authors
Dr
Burt Holzman
(FNAL)
Dr
Eric Vaandering
(FNAL)
Prof.
Frank Wuerthwein
(UCSD)
Dr
Haifeng Pi
(UCSD)
Dr
Igor Sfiligoi
(FNAL)
Dr
Oliver Gutsche
(FNAL)
Dr
Sanjay Padhi
(UCSD)