20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

Name: 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)
Start: 2013-10-14T09:00:00+02:00
End: 2013-10-18T13:00:00+02:00
Location: Amsterdam, Beurs van Berlage

14–18 Oct 2013

Amsterdam, Beurs van Berlage

Europe/Amsterdam timezone

CHEP2013 Logistics Management

info@chep2013.org

Running a typical ROOT HEP analysis on Hadoop/MapReduce

17 Oct 2013, 14:14

22m

Graanbeurszaal (Amsterdam, Beurs van Berlage)

Graanbeurszaal

Amsterdam, Beurs van Berlage

Oral presentation to parallel session Distributed Processing and Data Handling A: Infrastructure, Sites, and Virtualization Distributed Processing and Data Handling A: Infrastructure, Sites, and Virtualization

Mr Stefano Alberto Russo (Universita degli Studi di Udine (IT))

Hadoop/MapReduce is a very common and widely supported distributed computing framework. It consists in a scalable programming model named MapReduce, and a locality-aware distributed file system (HDFS). Its main feature is to implement data locality: through the fusion of computing and storage resources and thanks to the locality-awareness of HDFS, the computation can be scheduled on the nodes where data resides, therefore completely avoiding network bottlenecks and congestion. Thanks to this feature a Hadoop cluster can be easily scaled out. If one takes into account also its wide diffusion and support, it is clear that managing and processing huge amounts of data becomes an easier task. The main difference between the original goal of Hadoop and High Energy Physics (HEP) analyses is that the first consists in analysing plain text files with simple programs, while in the latter data is highly structured and complex programs are required to access the physics information and perform the analysis. We investigated how a typical HEP analysis can be performed using Hadoop/MapReduce, in a way which is completely transparent to ROOT, the data and the user. The method we propose relies on a "conceptual middleware" that allows to run ROOT without any modification, to store the data on HDFS in its original format, and to let the user deal with Hadoop/MapReduce in a classical way which does not require any specific knowledge about this new model. Hadoop's features, and in particular data locality, are fully exploited. The developed workflow and solutions can be easily adopted for almost any HEP analysis code, and in general for any complex code working on binary data relying on independent sub-problems. The proposed approach has been tested on a real case, an analysis to measure the top quark pair production cross section with the ATLAS experiment. The test worked as expected, bringing great benefits in terms of reducing by several orders of magnitude the network transfers required to perform the analysis with respect to a classic computing model.

Mr Stefano Alberto Russo (Universita degli Studi di Udine (IT))

Marina Cobal (Universita degli Studi di Udine (IT)) Michele Pinamonti (Universita degli Studi di Udine (IT))

notes

Demo-code-and-quick-start-guide

Slides

RootOnHadoop-CHEP2013Slides-Final.pdf

20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

CHEP2013 Logistics Management

Running a typical ROOT HEP analysis on Hadoop/MapReduce

Graanbeurszaal

Amsterdam, Beurs van Berlage

Speaker

Description

Primary author

Co-authors

Presentation materials

Choose timezone

20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

CHEP2013 Logistics Management

Speaker

Description

Primary author

Co-authors

Presentation materials

Share this page

Direct link

Social networks

Calendaring