CERN Computing Colloquium

Scientific Data Bases at Scale and SciDB

by Dr Michael Stonebraker (MIT - Massachusetts Institute of Technology - Cambridge MA, USA)

222/R-001 (CERN)



Show room on map


As a general rule, scientists have shunned relational data management systems (RDBMS), choosing instead to “roll their own” on top of file system technology.  We first discuss why file systems are a poor choice for science data storage, especially as data volumes become large and scalability becomes important.
Then, we continue with the reasons why RDBMSs work poorly on most science applications.  These include a data model “impedance mismatch” and missing features. We discuss array DBMSs, and why they are a much better choice for science applications, and use SciDB as an exemplar of this new class of DBMSs.
Most science applications require a mix of data management and complex analytics.  In most cases, the analytics entail a sequence of linear algebra computations.  We discuss the possible ways of integrating a DBMS with statistical calculations, and conclude with the mechanism being used by SciDB.



Dr. Stonebraker has been a pioneer of data base research and technology for more than a quarter of a century.  He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES.  These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years.  More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, and the H-Store transaction processing engine.   Currently, he is working on science-oriented DBMSs, OLTP DBMSs, and scalable data curation.  He is the founder of five venture-capital backed startups, which commercialized his prototypes.  Presently he serves as Chief Technology Officer of VoltDB and Paradigm4, Inc.
Professor Stonebraker is the author of scores of research papers on data base technology, operating systems and the architecture of system software services.  He was awarded the ACM System Software Award in 1992, for his work on INGRES.  Additionally, he was awarded the first annual Innovation award by the ACM SIGMOD special interest group in 1994, and was elected to the National Academy of Engineering in 1997.  He was awarded the IEEE John Von Neumann award in 2005, and is presently an Adjunct Professor of Computer Science at M.I.T, where he is co-director of the new Intel Science and Technology Center focused on big data.



Video in CDS
Organized by

Dirk Duellmann

There is a live webcast for this event