14–18 Oct 2013
Amsterdam, Beurs van Berlage
Europe/Amsterdam timezone

Sequential Data access with Oracle and Hadoop: a performance comparison

15 Oct 2013, 17:48
20m
Administratiezaal (Amsterdam, Beurs van Berlage)

Administratiezaal

Amsterdam, Beurs van Berlage

Oral presentation to parallel session Data Stores, Data Bases, and Storage Systems Data Stores, Data Bases, and Storage Systems

Speaker

Zbigniew Baranowski (CERN)

Description

The Hadoop framework has proven to be an effective and popular approach for dealing with “Big Data” and, thanks to its scaling ability and optimised storage access, Hadoop Distributed File System-based projects such as MapReduce or HBase are seen as candidates to replace traditional relational database management systems whenever scalable speed of data processing is a priority. But do these projects deliver in practice? Does migrating to Hadoop’s “shared nothing” architecture really improve data access throughput? And, if so, at what cost? We answer these questions—addressing cost/performance as well as raw performance—based on a performance comparison between an Oracle-based relational database and Hadoop's distributed solutions like MapReduce or HBase for sequential data access. A key feature of our approach is the use of an unbiased data model as certain data models can significantly favour one of the technologies tested.

Author

Co-authors

Presentation materials