Mr Dave Evans (Fermi National Laboratory)
The CMS production system has undergone a major architectural upgrade from its predecessor, with the goals of reducing the operations manpower requirement and preparing for the large scale production required by the CMS physics plan. This paper discusses the CMS Monte Carlo Workload Management architecture. The system consist of 3 major components: ProdRequest, ProdAgent, and ProdMgr and can be deployed in various distributed configurations to prevent and minimize single points of failures. The user and request management interaction will take place on the ProdRequest level. ProdAgents are responsible for job submission and tracking over multiple Grid and Farm computing resources. The ProdAgents themselves consist of autonomous components and communicate via asynchronous messages, thereby enhancing the robustness of the ProdAgent. Delayed and queued message functionality enables the ProdAgent to adequately deal with 3rd party component interaction (CMS catalogs, transfer systems) even when these components go offline for a while. ProdMgr provides the accounting functionality of the system keeping track of request progress and dividing the work between ProdAgents which request it. Various complementary (self) monitoring systems provide end-2-end monitoring of the system to track down (potential) problems.
|Submitted on behalf of Collaboration (ex, BaBar, ATLAS)||CMS|
Ms Alessandra Fanfani (INFN Sezione di Bologna and University of Bologna) Mr Carlos Kavka (INFN Sezione di Trieste) Mr Dave Evans (Fermi National Laboratory) Mr Dave Mason (Fermi National Laboratory) Mr Frank van Lingen (CALIFORNIA INSTITUTE OF TECHNOLOGY) Mr Giulio Eulisse (North Eastern University) Mr Giuseppe Codispoti (INFN Sezione di Bologna and University of Bologna) Mr Jose Hernandez (CIEMAT) Mr Nicola De Filippis (INFN - Sezione di Bari) Mr Peter Elmer (Princeton University) Mr William Bacchi (INFN Sezione di Bologna and University of Bologna)