13–17 Feb 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

A generic approach to job tracking for distributed computing: the STAR approach

14 Feb 2006, 14:20
20m
AG 80 (Tata Institute of Fundamental Research)

AG 80

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
oral presentation Distributed Event production and processing Distributed Event production and Processing

Speaker

Dr Valeri FINE (BROOKHAVEN NATIONAL LABORATORY)

Description

Job tracking, i.e. monitoring bundle of jobs or individual job behavior from submission to completion, is becoming very complicated in the heterogeneous Grid environment. This paper presents the principles of an integrating tracking solution based on components already deployed at STAR, none of which are experiment specific: a Generic logging layer and the STAR Unified Meta-Scheduler (SUMS). The first component is a "generic logging layer" built on the top of the logger family derived from the Jakarta "log4j" project that includes the "log4cxx", "log4c" and "log4perl" packages. These layers provide consistency across packages, platforms and frameworks. SUMS is a "generic" gateway to user batch-mode analysis and allows the user to describe tasks based on an abstract job description language (SUMS’s architecture was designed around a module plug-and-play philosophy and is therefore not experiment specific). We discuss how the tracking layer utilizes a unique ID generated by SUMS for each task it handles and the set of jobs it creates; how it is used for creating and updating Job records in the administrative database along with other vital job related information. Our approach does not require users to introduce any additional key to identify and associate the job with the database tables as the tree structure of information is handled automatically. Representing (sets of) jobs in a database makes easy to implement management, scheduling, and query operations as the user may list all previous jobs and get the details of status, time submitted, started, finished, etc…

Primary authors

Dr Jerome LAURET (BROOKHAVEN NATIONAL LABORATORY) Dr Valeri FINE (BROOKHAVEN NATIONAL LABORATORY)

Co-authors

Dr Dmitry ARKHIPKIN (Particle Physics Laboratory - Dubna) Mr Levente HAJDU (BROOKHAVEN NATIONAL LABORATORY)

Presentation materials