Speaker
Kacper Surdy
(CERN)
Description
There are terabytes of data stored in a relational database (Oracle) at CERN which in fact does not need a relational model. Moreover, using a relational database management system very often brings a significant overhead in terms of resource utilization. The problem is notably observable for warehouse-type data sets. At the same time running analytical workloads on such data sets requires large amount of computing power combined with high storage throughput, a combination which can be achieved with a scalable database system. Introducing this kind of system will not only speed up processing but open new possibilities for data mining as well. This presentation will discuss advantages of using distributed architecture like Hadoop for scalable data processing of CERN data sets; such as stored in Oracle: DB LHC logging system, SCADA systems or even LHC experiments events data stored in Ntuples.
Authors
Kacper Surdy
(CERN)
Maciej Grzybek
(Warsaw University of Technology (PL))
Zbigniew Baranowski
(CERN)
Co-authors
Daniel Lanza Garcia
(Univ. Extremadura, Cen. Uni. Merida (ES))
Eric Grancher
(CERN)
Luca Canali
(CERN)