27 September 2004 to 1 October 2004
Interlaken, Switzerland
Europe/Zurich timezone

Distributed Tracking, Storage, and Re-use of Job State Information on the Grid

29 Sep 2004, 10:00
Coffee (Interlaken, Switzerland)


Interlaken, Switzerland

Board: 40
poster Track 4 - Distributed Computing Services Poster Session 2




The Logging and Bookkeeping service tracks job passing through the Grid. It collects important events generated by both the grid middleware components and applications, and processes them at a chosen L&B server to provide the job state. The events are transported through secure reliable channels. Job tracking is fully distributed and does not depend on a single information source, the robustness is achieved through speculative job state computation in case of reordered, delayed or lost events. The state computation is easily adaptable to modified job control flow. The events are also passed to the related Job Provenance service. Its purpose is a long-term storage of information on job execution, environment, and the executable and input sandbox files. The data can be used for debugging, post-mortem analysis, or re-running jobs. The data are kept by the job-provenance storage service in a compressed format, accessible on per-job basis. A complementary index service is able to find particular jobs according to configurable criteria, e.g. submission time or "tags" assigned by the user. A user client to support job re-execution is planned. Both the L&B and Job Provenance index server provide web-service interfaces for querying. Those interfaces comply with the On-demand producer specification of the R-GMA infrastructure. Hence R-GMA capabilities can be utilized to perform complex distributed queries across multiple servers. Also, aggregate information about job collections can be easily provided. The L&B service was deployed in the EU DataGrid and Cern LCG projects, the Job Provenance will be deployed in the EGEE project.

Primary authors

A. GUARISE (INFN Torino) A. Gianelle (INFN Padova, Italy) A. Krenek (CESNET, CZECH REPUBLIC) A. MARASCHINI (DATAMAT) A. Terracina (DATAMAT, Italy) Mr D. Kouril (CESNET, CZECH REPUBLIC) D. Rebatto (INFN Milano, Italy) E. Ronchieri (INFN Cnaf, Italy) F. GIACOMINI (INFN Cnaf) F. Pacini (DATAMAT, Italy) F. Prelz (INFN Milano, Italy) G. Avellino (DATAMAT, Italy) G. Patania (INFN Torino, Italy) Mr J. Pospisil (CESNET, CZECH REPUBLIC) J. Sitera (CESNET, CZECH REPUBLIC) J. Skrabal (CESNET, CZECH REPUBLIC) L. Matyska (CESNET, CZECH REPUBLIC) L. Zangrando (INFN Padova, Italy) M. MARCHI (INFN Milano) M. MEZZADRI (INFN Milano) M. MORDACCHINI (INFN Padova) M. MULAC (CESNET) M. Pappalardo (INFN Catania, Italy) Mr M. Ruda (CESNET, CZECH REPUBLIC) M. Sgaravatto (INFN Padova, Italy) M. Vocu (CESNET, CZECH REPUBLIC) P. Andreetto (INFN Padova, Italy) S. Andreozzi (INFN Cnaf, Italy) S. Beco (DATAMAT, Italy) S. Borgia (INFN Padova, Italy) S. MONFORTE (INFN Catania) V. Ciaschini (INFN Cnaf, Italy) Mr Z. Salvet (CESNET, CZECH REPUBLIC)

Presentation Materials