Data management in BaBar
Presented by Dr. Douglas SMITH on 3 Sep 2007 from 17:10 to 17:30
Type: oral presentation
Track: Distributed data analysis and information management
The BaBar high energy experiment has been running for many years now, and has resulted in a data set of over a petabyte in size, containing over two million files. The management of this set of data has to support the requirements of further data production along with a physics community that has vastly different needs. To support these needs the BaBar bookkeeping system was developed, and within this datasets are defined for data access and use. Datasets are defined in such a way to keep data separate for the hundreds of concurrent analyses, produced from many production cycles, and to keep similar data together for any specific use. In the development of this system, data has been modeled as a flow of information, that constantly changes. This system has been in use now for many years, and has been very successful in meeting these disparate needs. The methods for defining and managing datasets which will undergo constant changes will be discussed. The needs of production also require the distribution of data to computing centers, and the control of production with datasets will be mentioned. With the needs of a constantly changing dataset, the ability to analyze data from a known state, and then add to the analysis changes in the dataset at a future time will also be presented.
BaBar Computing Group