12–16 Apr 2010
Uppsala University
Europe/Stockholm timezone

LHCb Data Production System based on the DIRAC project

14 Apr 2010, 17:40
10m
Aula (Uppsala University)

Aula

Uppsala University

Demonstration Software services exploiting and/or extending grid middleware (gLite, ARC, UNICORE etc) Demo Session 2

Speakers

Dr Andrei TSAREGORODTSEV (CNRS-IN2P3-CPPM, MARSEILLE)Dr Stuart Paterson (CERN)

Description

LHCb is one of the four experiments at the LHC collider are CERN. It has started recently to record the data coming from the real proton-proton collisions. The data processing chain is managed using the tools provided by the DIRAC project. All the operations starting from data transfers from the exprimental area up to the final user analysis distributed in all the LHCb Tier-1 centers are automated. MC data production is also managed within the same framework. The production controls as well as visualization of the monitoring information are performed through the DIRAC Web Portal.

Justification for delivering demo and/or technical requirements (for demos)

This contribution presents a real interactive system best illustrated with an interactive demonstration. The demo session needs a computer connected to the network with a large monitor.

Detailed analysis

The LHCb experiment is using DIRAC middleware for all the activities related to distributed data processing. DIRAC provides a layer between the grid resources and services an the LHCb production system in order to increase the overall efficiency as seen by the LHCb users.
The WorkloadManagement System of DIRAC based on the Pilot Job paradigm is used to manage the payloads coming from both users and production managers. On top of it a higher level Production Management System is built. The Production Management system automates the tasks of data processing job submission according to predefined scenarios. The production job scheduling is data driven and allows to distribute the tasks to the Tier-1 centers according to the LHCb Computing Model.
The Data Management System of LHCb is also built in the DIRAC framework and allows for automated massive data transfers definition and scheduling. It uses FTS WLCG service for the transfers and provides means for failure recovery.
The DIRAC Web Portal provides secure access to all the DIRAC services. It allows to monitor the ongoing activities and behavior of various subsystems.

Impact

The Production Management System of LHCb was heavily used in the recent simulation data production and real data processing runs. It has shown good scalability properties managing up to 30K of simultaneously running jobs. Since both user and production jobs are passing through the same central Task Queue, it allows to manage efficienltly and accurately the relative priorities of user and production jobs including different policies for different user groups and various production activities.
With the start of the LHC operation and production of data from real proton-proton collisions the data flow becomes constant and necessitates versatile monitoring and management tools to be used by the production managers and members of the LHCb computing shifts. It is important that shifters are notified promptly of any abnormal situations and have tools to quickly identify problems and react accordingly. This is achieved by generating alarms and notifications to the shifters.
In the demo we intend to provide a view of the LHCb Production System with the DIRAC Web Portal. Operations and monitoring tools for the whole LHCb data processing chain will be presented.

Conclusions and Future Work

The LHCb Data Production System includes both Production and Data Management services and tools all built in the same DIRAC framework. The System is successfully used in the recent processing of the real LHC data. The experience shows the necessity of better support of the production managers activity. The future development will be devoted to better interactivity with the system through the Web portal, more convenient aggregation of the information necessary to solve particular problems, more focued notification system to allow early spotting of the problems.

URL for further information http://dirac.cern.ch
Keywords distributed computing, grid, LHC, DIRAC, data management

Primary authors

Adrian Casajus Ramo (University of Barcelona) Alexey Zhelezov (University of Heidelberg) Dr Andrei TSAREGORODTSEV (CNRS-IN2P3-CPPM, MARSEILLE) Dr Andrew Cameron Smith (CERN) Matvey Sapunov (CNRS-IN2P3-CPPM, MARSEILLE) Dr Ricardo Graciani Diaz (University of Barcelona) Dr Stuart Paterson (CERN) Zoltan Mathe (University College Dublin)

Presentation materials