21–25 May 2012
New York City, NY, USA
US/Eastern timezone

Employing peer-to-peer software distribution in ALICE Grid Services to enable opportunistic use of OSG resources

22 May 2012, 17:50
25m
Eisner & Lubin Auditorium (Kimmel Center)

Eisner & Lubin Auditorium

Kimmel Center

Parallel Distributed Processing and Analysis on Grids and Clouds (track 3) Distributed Processing and Analysis on Grids and Clouds

Speakers

Iwona Sakrejda Jeff Porter (Lawrence Berkeley National Lab. (US))

Description

The ALICE Grid infrastructure is based on AliEn, a lightweight open source framework built on Web Services and a Distributed Agent Model in which job agents are submitted onto a grid site to prepare the environment and pull work from a central task queue located at CERN. In the standard configuration, each ALICE grid site supports an ALICE-specific VO box as a single point of contact between the site and the ALICE central services. VO box processes monitor site utilization and job requests (ClusterMonitor), monitor dynamic job and site properties (MonaLisa), perform job agent submission (CE) and deploy job-specific software (PackMan). In particular, requiring a VO box at each site simplifies deployment of job software, done onto a shared file system at the site, and adds redundancy to the overall Grid system. ALICE offline computing, however, has also implemented a peer-to-peer method (based on BitTorrent) for downloading job software directly onto each worker node as needed. By utilizing both this peer-to-peer deployment model and job agent submission onto remote Open Science Grid (OSG) Compute Elements, we are able relax the site VO box requirement and run jobs opportunistically on independent OSG resources from a single VO box. In this paper, we will describe the implementation of the peer-to-peer method and the full configuration of the setup. We will cover the deployment of this configuration at NERSC utilizing a VO box at PDSF and an OSG gatekeeper on the NERSC Carver system from which we can directly compare the performance to that of a standard ALICE Grid installation. We will also describe our experience with wider deployments.

Primary authors

Iwona Sakrejda Jeff Porter (Lawrence Berkeley National Lab. (US))

Co-authors

Presentation materials