14–18 Oct 2013
Amsterdam, Beurs van Berlage
Europe/Amsterdam timezone

BESIII physical analysis on hadoop platform

17 Oct 2013, 13:30
22m
Graanbeurszaal (Amsterdam, Beurs van Berlage)

Graanbeurszaal

Amsterdam, Beurs van Berlage

Oral presentation to parallel session Distributed Processing and Data Handling A: Infrastructure, Sites, and Virtualization Distributed Processing and Data Handling A: Infrastructure, Sites, and Virtualization

Speaker

Dr Gongxing Sun (INSTITUE OF HIGH ENERGY PHYSICS)

Description

This paper brings the idea of MapReduce parallel processing to BESIII physical analysis, gives a new data analysis system structure based on HADOOP framework; Optimizes the process of data processing, by establish an event level metadata(TAG) database and do event pre-selection based on TAGs, significantly reduce the number of events that need to do further analysis by 2-3 classes, which reduces the I/O volume and improves the efficiency of data analysis jobs; The event storage structure in DST files are re-organized to optimize the selective reading pattens with event pre-selection. Designs the MapReduce models for TAG generation, TAG based event pre-selection and event analysis, and develop proper MapReduce libs that fit for the ROOT framework to do things such as data splitting, event fetching and result merging. An 8-nodes cluster is used for system test, the testing result shows that the new system shortens the data analyzing time by 80%, and the cluster system shows great scalability when adding more worker nodes.

Primary author

Dr Gongxing Sun (INSTITUE OF HIGH ENERGY PHYSICS)

Co-authors

Mr Dongsong Zang (Chinese Academy of Sciences (CN)) Mr Jing Huo (IHEP) Ms xiaofeng lei (IHEP)

Presentation materials