Speaker
Maxim Potekhin
(Brookhaven National Laboratory (US))
Description
The ATLAS Production System is the top level workflow manager which translates physicists' needs for production level processing into actual workflows executed across about a hundred processing sites used globally by ATLAS. As the production workload increased in volume and complexity in recent years (the ATLAS production tasks count is above one million, with each task containing hundreds or thousands of jobs) there is a need to upgrade the Production System to meet the challenging requirements of the next LHC run while minimizing the operating costs. Providing a front-end and a management layer for petascale data processing and analysis, the new Production System contains generic subsystems that can be used in a wider range of applications. The main subsystems are the Database Engine for Tasks (DEfT) and the Job Execution and Definition Interface (JEDI). Based on users' requests, the DEfT subsystem manages inter-dependent groups of tasks (Meta-Tasks) and generates corresponding data processing workflows. The JEDI subsystem dynamically translates the task definitions from DEfT into workload jobs executed in the PanDA Workload Management System.
We present the requirements, design parameters, object model and concrete solutions utilized in building the DEfT subsystem, such as Component Based Software Engineering. We also explain how the use of standard software modules and data formats led to reduction of development and maintenance costs.
Primary author
Maxim Potekhin
(Brookhaven National Laboratory (US))
Co-authors
Alexandre Vaniachine
(ATLAS)
Dr
Alexei Klimentov
(Brookhaven National Laboratory (US))
Dmitri Golubkov
(Institute for High Energy Physics (IHEP)-Unknown-Unknown)
Kaushik De
(University of Texas at Arlington (US))