Speaker
Dr
Torsten Harenberg
(University of Wuppertal)
Description
Today, one of the major challenges in science is the processing of large datasets.
The LHC experiments will produce an enormous amount of results that are
stored in databases or files. These data are processed by a large
number of small jobs that read only chunks.
Existing job monitoring tools inside the LHC Computing Grid (LCG) provide
just limited functionality to the user.
These are either command line tools delivering simple text strings
for every job or the provided information is very limited.
Other tools like GridIce focus on the monitoring of the
infrastructure rather than the user application/job.
In contrast to these concept, we developed the Python-based "Job
execution Monitor".
Typically, the first thing to be executed
on a worker node is not a binary executable, but a script file which
sets up the environment (including environment variables and loading
of data from a storage element, a tasks known to be critical). It is the goal of the Job
Execution Monitor to monitor the execution of such critical commands
and report their success or failure to the user.
The core module of the Job Execution Monitor is the script wrapper.
To gain detailed information about the job
execution, a given script file (bash or python) is
executed command by command. After each
command, the complete environment is checked and logged.
Together with the other components of this system, an expert system
tries to classify the reason for a failure. An integration into the Global
Grid User Support is planned.
Submitted on behalf of Collaboration (ex, BaBar, ATLAS) | D-Grid |
---|
Author
Dr
Torsten Harenberg
(University of Wuppertal)
Co-authors
Dr
David Meder-Marouelli
(University of Wuppertal)
Mr
Markus Mechtel
(University of Wuppertal)
Prof.
Peer Ueberholz
(Niederrhein University of Applied Sciences)
Prof.
Peter Mättig
(University of Wuppertal)
Dr
Stefan Borovac
(University of Wuppertal)