Speaker
Dr
Jose Caballero
(Brookhaven National Laboratory (BNL))
Description
Worker nodes on the grid exhibit great diversity, making it difficult to offer uniform processing resources. A pilot job architecture, which probes the environment on the remote worker node before pulling down a payload job, can help. Pilot jobs become smart wrappers, preparing an appropriate environment for job execution and providing logging and monitoring capabilities.
PanDA (Production and Distributed Analysis), an ATLAS and OSG workload management system, follows this design. However, in the simplest (and most efficient) pilot submission approach of identical pilots carrying the same identifying grid proxy, end-user accounting by the site can only be done with application-level information (PanDA maintains its own end-user accounting), and end-user jobs run with the identity and privileges of the proxy carried by the pilots, which may be seen as a security risk.
To address these issues, we have enabled Panda to use gLExec, a tool provided by EGEE which runs payload jobs under an end-user's identity. End-user proxies are pre-staged in a credential caching service, MyProxy, and the information needed by the pilots to access them is stored in the Panda DB. gLExec then extracts from the user's proxy the proper identity under which to run.
We describe the deployment, installation, and configuration of gLExec, and how PanDA components have been augmented to use it. We describe how difficulties were overcome, and how security risks have been mitigated. Results are presented from OSG and EGEE Grid environments performing ATLAS analysis using PanDA and gLExec.
Author
Dr
Jose Caballero
(Brookhaven National Laboratory (BNL))
Co-authors
Mr
John Hover
(Brookhaven National Laboratory (BNL))
Dr
Maarten Litmaath
(CERN)
Dr
Maxim Potekhin
(Brookhaven National Laboratory (BNL))
Dr
Paul Nilsson
(University of Texas at Arlington)
Tadashi Maeno
(BNL)
Dr
Torre Wenaus
(Brookhaven National Laboratory (BNL))
Dr
Xin Zhao
(Brookhaven National Laboratory (BNL))