Speaker
Wahid Bhimji
(University of Edinburgh (GB))
Description
“Big Data” is no longer merely a buzzword, but is business-as-usual in the private sector. High Energy Particle Physics is often cited as the archetypal Big Data use case, however it currently shares very little of the toolkit used in the private sector or other scientific communities.
We present the initial phase of a programme of work designed to bridge this technology divide by both performing real HEP analysis workflows using predominately industry “Big Data” tools, formats and techniques, and the reverse: to perform real industry tasks with HEP tools. In doing so it will improve interoperation of those tools, reveal strengths and weakness and enable efficiencies within both communities.
The first phase of this work performs key elements of an LHC Higgs Analysis using very common Big Data tools. These elements include data serialization, filtering and data mining. They are performed with a range of tools chosen not just for performance but also for ease-of-use, maturity and size of user community. This includes technologies such as Protocol Buffers, Hadoop and Python Scikit and for each element we make comparisons with the same analysis performed using current HEP tools such as ROOT, Proof and TMVA.
Author
Wahid Bhimji
(University of Edinburgh (GB))
Co-authors
Andrew John Washbrook
(University of Edinburgh (GB))
Timothy Michael Bristow
(University of Edinburgh (GB))