21–25 May 2012
New York City, NY, USA
US/Eastern timezone

PD2P : PanDA Dynamic Data Placement for ATLAS

24 May 2012, 17:50
25m
Eisner & Lubin Auditorium (Kimmel Center)

Eisner & Lubin Auditorium

Kimmel Center

Parallel Distributed Processing and Analysis on Grids and Clouds (track 3) Distributed Processing and Analysis on Grids and Clouds

Speaker

Tadashi Maeno (Brookhaven National Laboratory (US))

Description

The PanDA Production and Distributed Analysis System is the ATLAS workload management system for processing user analysis, group analysis and production jobs. In 2011 more than 1400 users have submitted jobs through PanDA to the ATLAS grid infrastructure. The system processes more than 2 million analysis jobs per week. Analysis jobs are routed to sites based on the availability of relevant data and processing resources, taking account of the nonuniform distribution of CPU and storage resources in the ATLAS grid. The data distribution has to be optimized to fit the resource distribution, and also has to be dynamically changed to meet rapidly evolving requirements for analysis use cases. The PanDA Dynamic Data Placement (PD2P) system has been developed to cope with difficulties of data placement for ATLAS. PD2P is an intelligent subsystem of PanDA to distribute data by taking the following factors into account: popularity, locality, the usage pattern of the data, the distribution of CPU and storage resources, network topology between sites, site operation downtime and reliability, and so on. We will describe the design of the new system, its performance during the past year of data taking, dramatic improvements it has brought about in the efficient use of storage and processing resources, associated reductions in average wait time for user analysis jobs, and plans for the future.

Primary author

Co-authors

Kaushik De (University of Texas at Arlington (US)) Sergey Panitkin (Brookhaven National Laboratory (US)) Tadashi Maeno (Brookhaven National Laboratory (US))

Presentation materials