The Production Management System of LHCb was heavily used in the recent simulation data production and real data processing runs. It has shown good scalability properties managing up to 30K of simultaneously running jobs. Since both user and production jobs are passing through the same central Task Queue, it allows to manage efficienltly and accurately the relative priorities of user and production jobs including different policies for different user groups and various production activities.
With the start of the LHC operation and production of data from real proton-proton collisions the data flow becomes constant and necessitates versatile monitoring and management tools to be used by the production managers and members of the LHCb computing shifts. It is important that shifters are notified promptly of any abnormal situations and have tools to quickly identify problems and react accordingly. This is achieved by generating alarms and notifications to the shifters.
In the demo we intend to provide a view of the LHCb Production System with the DIRAC Web Portal. Operations and monitoring tools for the whole LHCb data processing chain will be presented.
The LHCb experiment is using DIRAC middleware for all the activities related to distributed data processing. DIRAC provides a layer between the grid resources and services an the LHCb production system in order to increase the overall efficiency as seen by the LHCb users.
The WorkloadManagement System of DIRAC based on the Pilot Job paradigm is used to manage the payloads coming from both users and production managers. On top of it a higher level Production Management System is built. The Production Management system automates the tasks of data processing job submission according to predefined scenarios. The production job scheduling is data driven and allows to distribute the tasks to the Tier-1 centers according to the LHCb Computing Model.
The Data Management System of LHCb is also built in the DIRAC framework and allows for automated massive data transfers definition and scheduling. It uses FTS WLCG service for the transfers and provides means for failure recovery.
The DIRAC Web Portal provides secure access to all the DIRAC services. It allows to monitor the ongoing activities and behavior of various subsystems.
Justification for delivering demo and/or technical requirements (for demos)
This contribution presents a real interactive system best illustrated with an interactive demonstration. The demo session needs a computer connected to the network with a large monitor.
Conclusions and Future Work
The LHCb Data Production System includes both Production and Data Management services and tools all built in the same DIRAC framework. The System is successfully used in the recent processing of the real LHC data. The experience shows the necessity of better support of the production managers activity. The future development will be devoted to better interactivity with the system through the Web portal, more convenient aggregation of the information necessary to solve particular problems, more focued notification system to allow early spotting of the problems.
|Keywords||distributed computing, grid, LHC, DIRAC, data management|
|URL for further information||http://dirac.cern.ch|