Speaker
Valeria Bartsch
(FERMILAB / University College London)
Description
SAM is a data handling system that provides Fermilab HEP experiments of D0, CDF and
MINOS with the means to catalog, distribute and track the usage of their collected
and analyzed data. Annually, SAM serves petabytes of data to physics groups
performing data analysis, data reconstruction and simulation at various computing
centers across the world. Given the volume of the detector data, a typical physics
analysis job consumes terabytes of information during several days of running at a
job execution site. At any stage of that process, non systematic failures may occur,
leaving a fraction of the original dataset unprocessed. To ensure convergence to
completion of the computation request, a facility user has to employ a procedure to
identify pieces of data that need to be re-analyzed in a manner that guarantees
completeness without duplication in the final result. It is common that these issues
are addressed by analyzing the output of the job. Such an approach is fragile, since
it depends critically on the (changeable) output file format, and time-consuming. The
approach that is reported in this article saves the user's time and ensures
consistency in results. We present an automated method that uses SAM data handling to
formalize distributed data analysis by defining a transaction based model of the
physics analysis job work cycle to enable robust recovery of the unprocessed data.
Primary author
Mr
Andrew Baranovski
(FNAL)
Co-authors
Mr
Adam Lyon
(FNAL)
Mr
Doug Benjamin
(FNAL)
Mr
Elliot Lipeles
(FNAL)
Mr
Igor Sfiligoi
(FNAL)
Mr
Krzysztof Genser
(FNAL)
Ms
Valeria Bartsch
(FNAL)