21-27 March 2009
Prague
Europe/Prague timezone

Pilot Factory - a Condor-based System for Scalable Pilot Job Generation in the Panda WMS Framework

26 Mar 2009, 08:00
1h
Prague

Prague

Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
Board: Thursday 064
poster Grid Middleware and Networking Technologies Poster session

Speaker

Dr Maxim Potekhin (BROOKHAVEN NATIONAL LABORATORY)

Description

The Panda Workload Management System is designed around the concept of the Pilot Job - a "smart wrapper" for the payload executable, that can probe the environment on the remote worker node before pulling down the payload from the server and executing it. Such design allows for improved logging and monitoring capabilities as well as flexibility in Workload Management. In the Grid environment (such as the Open Science Grid), Panda Pilot Jobs are submitted to remote sites via mechanisms that ultimately rely on Condor-G. As our experience has shown, in cases where a large number of Panda jobs are simultaneously routed to a particular remote site, the increased load on the head node of the cluster, which is caused by the Pilot Job sumbission, may lead to overall lack of scalability. We have developed a Condor-inspired solution to this problem, which is using the schedd-based glidein, whose mission is to redirect pilots to the native batch system. Once a glidein schedd is installed and running, it can be utilized exactly the same way as local schedds and therefore, from the user's perspective, Pilots thus submitted are quite similar to jobs submitted to the local Condor pool.
Presentation type (oral | poster) poster

Primary authors

Barnett Chiu (BROOKHAVEN NATIONAL LABORATORY) Dr Maxim Potekhin (BROOKHAVEN NATIONAL LABORATORY)

Presentation Materials