Sep 2 – 9, 2007
Victoria, Canada
Europe/Zurich timezone
Please book accomodation as soon as possible.

Data management in BaBar

Sep 3, 2007, 5:10 PM
20m
Lecture (Victoria, Canada)

Lecture

Victoria, Canada

oral presentation Distributed data analysis and information management Distributed data analysis and information management

Speaker

Dr Douglas Smith (Stanford Linear Accelerator Center)

Description

The BaBar high energy experiment has been running for many years now, and has resulted in a data set of over a petabyte in size, containing over two million files. The management of this set of data has to support the requirements of further data production along with a physics community that has vastly different needs. To support these needs the BaBar bookkeeping system was developed, and within this datasets are defined for data access and use. Datasets are defined in such a way to keep data separate for the hundreds of concurrent analyses, produced from many production cycles, and to keep similar data together for any specific use. In the development of this system, data has been modeled as a flow of information, that constantly changes. This system has been in use now for many years, and has been very successful in meeting these disparate needs. The methods for defining and managing datasets which will undergo constant changes will be discussed. The needs of production also require the distribution of data to computing centers, and the control of production with datasets will be mentioned. With the needs of a constantly changing dataset, the ability to analyze data from a known state, and then add to the analysis changes in the dataset at a future time will also be presented.
Submitted on behalf of Collaboration (ex, BaBar, ATLAS) BaBar Computing Group

Primary author

Dr Douglas Smith (Stanford Linear Accelerator Center)

Co-author

Dr Tim Adye (Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, United Kingdom)

Presentation materials