10-14 October 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

Developing and optimizing applications for the Hadoop environment

13 Oct 2016, 15:15
GG A+B (San Francisco Mariott Marquis)


San Francisco Mariott Marquis

Oral Track 5: Software Development Track 5: Software Development


Prasanth Kothuri (CERN)


This contribution is about sharing our recent experiences of building Hadoop based application. Hadoop ecosystem now offers myriad of tools which can overwhelm new users, yet there are successful ways these tools can be leveraged to solve problems. We look at factors to consider when using Hadoop to model and store data, best practices for moving data in and out of the system and common processing patterns, at each stage relating with the real world experience gained while developing such application. We share many of the design choices, tools developed and how to profile a distributed application which can be applied for other scenarios as well. In conclusion, the goal of the presentation is to provide guidance to architect Hadoop based application and share some of the reusable components developed in this process.

Secondary Keyword (Optional) Software development process and tools
Primary Keyword (Mandatory) Data processing workflows and frameworks/pipelines

Primary author


Daniel Lanza Garcia (Ministere des affaires etrangeres et europeennes (FR)) Joeri Hermans (Universiteit Maastricht (NL)) Kacper Surdy (CERN) Zbigniew Baranowski (CERN)

Presentation Materials