Accelerating Scientific Analysis with SciDB

16 Apr 2015, 11:30
15m
Auditorium (Auditorium)

Auditorium

Auditorium

oral presentation Track2: Offline software Track 2 Session

Speaker

Dr Lisa Gerhardt (LBNL)

Description

SciDB is an open-source analytical database for scalable complex analytics on very large array or multi-structured data from a variety of sources, programmable from Python and R. It runs on HPC, commodity hardware grids, or in a cloud and can manage and analyze terabytes of array-structured data and do complex analytics in-database. We present an overall description of the SciDB framework and describe its implementation at NERSC at Lawrence Berkeley National Laboratory. A case study using SciDB to analyze data from the LUX dark matter detector is described. LUX is a 370 kg liquid xenon time-projection chamber built to directly detect galactic dark matter in an underground laboratory 1 mile under the Black Hills in South Dakota, USA. In the 2013 initial data run, LUX collected 86 million events and wrote 32 TB of data of which only 160 events are retained for final analysis. The data rate for the new dark matter run starting in 2014 is expected to exceed 250 TB / year. We describe how SciDB is used to dramatically streamline the data collection and analysis, and discuss future plans for a large parallel SciDB array at NERSC.

Primary author

Dr Lisa Gerhardt (LBNL)

Co-authors

Dr Carlos Faham (LBNL) Dr Yushu Yao (LBNL)

Presentation materials