Speaker
Dr
James Shank
(Boston University)
Description
We describe experiences and lessons learned from over a year of nearly continuous
running of managed production on Grid3 for the ATLAS data challenges. Two major
phases of production were peformed: the first, large scale GEANT based Monte Carlo
simulations ("DC2") were followed by extensive production for the ATLAS "Rome"
physics workshop incorporating several new job types (digitization, reconstruction,
pileup and user analysis). We will describe the systems used to run production on
such a massive scale, which involved over 20 Grid3 sites, which successfully
completed over 250k jobs and produced over 50TB of physics data. The production
system consisting of a supervisor, executor and data management system will be
described. Analysis of performance of various systems will be presented. Several
critical points of failure were uncovered including scalability of Grid services for
job submission and reliable file transfer, and gaining access to remote resources
efficiently. These lessons have been incorporated into the design principles for the
next generation production system, Panda.
Primary authors
Dr
James Shank
(Boston University)
Dr
Kaushik De
(University of Texas at Arlington)
Co-authors
Mr
David Joffe
(Southern Methodist University)
Dr
Davide Costanzo
(Brookhaven National Laboratory)
Dr
Ian Hinchliffe
(Lawrence Berkeley Laboratory)
Mr
Jerry Gerialtowski
(Argonne National Laboratory)
Dr
Marco Mambelli
(University of Chicago)
Dr
Mark Sosebee
(UNIVERSITY OF TEXAS AT ARLINGTON)
Dr
Nurcan Ozturk
(UNIVERSITY OF TEXAS AT ARLINGTON)
Dr
Robert Gardner
(University of Chicago)
Dr
Taeksu Shin
(Hampton University)
Dr
Tomasz Wlodek
(Brookhaven National Laboratory)
Dr
Vaniachine Alexandre
(Argonne National Laboratory)
Mr
Vassilios Vassilakopoulos
(Hampton University)
Dr
Wensheng Deng
(Brookhaven National Laboratory)
Dr
Xin Zhao
(Brookhaven National Laboratory)
Dr
Yuri Smirnov
(Brookhaven National Laboratory)