Speaker
Dr
Douglas Smith
(Stanford Linear Accelerator Center)
Description
The BaBar high energy experiment has been running for many years now,
and has resulted in a data set of over a petabyte in size, containing
over two million files. The management of this set of data has to
support the requirements of further data production along with a
physics community that has vastly different needs. To support these
needs the BaBar bookkeeping system was developed, and within this
datasets are defined for data access and use. Datasets are defined in
such a way to keep data separate for the hundreds of concurrent
analyses, produced from many production cycles, and to keep similar
data together for any specific use. In the development of this system,
data has been modeled as a flow of information, that constantly
changes. This system has been in use now for many years, and has been
very successful in meeting these disparate needs. The methods for
defining and managing datasets which will undergo constant changes will
be discussed. The needs of production also require the distribution of
data to computing centers, and the control of production with datasets
will be mentioned. With the needs of a constantly changing dataset,
the ability to analyze data from a known state, and then add to the
analysis changes in the dataset at a future time will also be
presented.
Submitted on behalf of Collaboration (ex, BaBar, ATLAS) | BaBar Computing Group |
---|
Primary author
Dr
Douglas Smith
(Stanford Linear Accelerator Center)
Co-author
Dr
Tim Adye
(Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, United Kingdom)