As the demand for analyzing data sets of increasing variety and scale continues to explode, the software options for performing this analysis are beginning to proliferate. On one hand, traditional relational database technology has been extended so that database systems can approach large scale deployments by using a "shared-nothing" architecture . At the same time, MapReduce-based options, such as the open source Hadoop framework are becoming increasingly popular, and there have been a plethora of research publications in the past three years that demonstrate how MapReduce can be used to accelerate and scale various data analysis tasks.
Both relational databases and MapReduce-based options have strengths and weaknesses that a practitioner must be aware of before selecting an analytical data management platform. In this talk, I describe some experiences in using these systems, and the advantages and disadvantages of the popular implementations of these systems. I then discuss a hybrid system that we built at Yale University, and are now commercializing (the Yale project was called HadoopDB, the company is called Hadapt), that attempts to combine the advantages of both types of platforms.
Daniel Abadi is an Asst. Professor at Yale Uiversity where he does research in database system architecture and implementation, cloud computing. Before joining Yale, he spent four years at MIT where he received his Ph.D. He is best known for his research in column-store database systems (the C-Store project), high performance transactional systems (the H-Store project), and Hadoop (the HadoopDB project). Abadi has been a recipient of a Churchill Scholarship, an NSF CAREER Award, a Sloan Research Fellowship, the 2008 SIGMOD Jim Gray Doctoral Dissertation Award, and the 2007 VLDB best paper award. His research on HadoopDB is currently being commercialized by Hadapt, where Abadi also serves as chief scientist.
He blogs at http://dbmsmusings.blogspot.com and tweets at @daniel_abadi.