Big Data Analytics as a Service Infrastructure: Challenges, Desired Properties and Solutions

Not scheduled


1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495
poster presentation Track3: Data store and access


Manuel Martin Marquez (CERN)


CERN’s accelerator complex is an extreme data generator, every second an important amount of comprehensively heterogeneous data coming from control equipment and monitoring agents is persisted and needs to be analysed. Over the decades, CERN’s researching and engineering teams have applied different approaches, techniques and technologies. This situation has minimized the necessary collaboration and more relevantly the cross data analytics over different domains. These two factors are essential to unlock hidden insights and correlations between the underlying processes, which enable better and more efficient daily-based accelerators operations and more informed decisions. The proposed Big Data Analytics as a Service Infrastructure aims to: (1) Integrate the existing developments. (2) Centralize and standardize the complex data analytics needs for the wide CERN research and engineering community. (3) Deliver real time and batch data analytics capabilities and (4) provide transparent access and extraction-transformation-load, ETL, mechanisms to the different and critical-mission existing data repositories. This paper reflects the desired properties resulting from the analysis on CERN’s data analytics requirements; the main challenges: technological, collaborative and educational; and finally potential solutions and lessons learned.

Primary author

Presentation Materials