Speaker
Mr
Dave Evans
(Fermi National Laboratory)
Description
The CMS production system has undergone a major architectural upgrade from its
predecessor, with the goals of reducing the operations manpower requirement and
preparing for the large scale production required by the CMS physics plan.
This paper discusses the CMS Monte Carlo Workload Management architecture. The
system consist of 3 major components: ProdRequest, ProdAgent, and ProdMgr and can be
deployed in various distributed configurations to prevent and minimize single points
of failures. The user and request management interaction will take place on the
ProdRequest level. ProdAgents are responsible for job submission and tracking over
multiple Grid and Farm computing resources.
The ProdAgents themselves consist of autonomous components and communicate via
asynchronous messages, thereby enhancing the robustness of the ProdAgent. Delayed and
queued message functionality enables the ProdAgent to adequately deal with 3rd party
component interaction (CMS catalogs, transfer systems) even when these components go
offline for a while. ProdMgr provides the accounting functionality of the system
keeping track of request progress and dividing the work between ProdAgents which
request it. Various complementary (self) monitoring systems provide end-2-end
monitoring of the system to track down (potential) problems.
Submitted on behalf of Collaboration (ex, BaBar, ATLAS) | CMS |
---|
Primary authors
Ms
Alessandra Fanfani
(INFN Sezione di Bologna and University of Bologna)
Mr
Carlos Kavka
(INFN Sezione di Trieste)
Mr
Dave Evans
(Fermi National Laboratory)
Mr
Dave Mason
(Fermi National Laboratory)
Mr
Frank van Lingen
(CALIFORNIA INSTITUTE OF TECHNOLOGY)
Mr
Giulio Eulisse
(North Eastern University)
Mr
Giuseppe Codispoti
(INFN Sezione di Bologna and University of Bologna)
Mr
Jose Hernandez
(CIEMAT)
Mr
Nicola De Filippis
(INFN - Sezione di Bari)
Mr
Peter Elmer
(Princeton University)
Mr
William Bacchi
(INFN Sezione di Bologna and University of Bologna)