Data analytics Forum

Europe/Zurich
513 1-024 (CERN)

513 1-024

CERN

Manuel Martin Marquez (CERN)
Description
During the past decades, CERN has been gathering and storing enormous amount of data. We cannot obviate that most of the times this process is costly in terms of technical and human resources. It is a fact, however, that the exploitation of the collected data, in other words, the extraction of potential benefits from our data investments, has been pushed into the background or has been placed on the bottom of our priorities. Data is the new soil and therefore it requires nurturing, enriching and managing. Obviously this will require some additional efforts, at the same time it is also clear that those efforts will generate important value. The data analytics forum will aim to change the current situation and demonstrate how small investments in data analytics can lead to big benefits. We will focus on discovering potential problems that can be resolved using analytics techniques, introducing current analytics approaches applied at CERN and motivating best practices.
    • 10:00 10:10
      Data Analytics Forum 10m
      Speaker: Manuel Martin Marquez (CERN)
    • 10:10 10:55
      Introduction to R and in-database analytics using ORE 45m
      R, http://cran.r-project.org is an open source project, which provides a programming language and environment for data manipulation, calculation, analysis and graphical display. Nowadays, due to many reasons such as, a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc) and graphical techniques, R is becoming a kind of the facto standard among statisticians and data miners for developing statistic and analytic software. R can be considered as an integrated framework that includes: • Effective data handling and storage. • Operators for high performance calculations on arrays and matrices. • A large, coherent and integrated collection of intermediate tools for data analysis. • Graphical facilities for data analysis and display either on-screen or on hardcopy. • A well-developed, simple and effective programming language, which includes conditionals, loops, user-defined recursive functions and input and output facilities. One of the most remarkable features in R is its highly extensible nature. R can be easily extended via packages. A huge number of packages ranging from simple statistic models to complex artificial intelligence techniques are available on the R repositories (CRAN).
 In addition, recently Oracle has introduced the Oracle R Enterprise, ORE, as a component of the Oracle Advanced Analytics option. ORE is designed for problems involving large amounts of data and integrates R with Oracle databases. As much data you have as closer you want to perform your analysis form the database. Also ORE improves the original R capabilities in terms of parallelism and scalability, which in opinion of many R users represents the main issue for applying R based approaches on production services. ORE not only improve the aspects describe above but also introduced a real integration within Oracle databases. This integration is translated into the possibility to perform in-database analytics or, in other words, allows transparent access to the data stored in the database and therefore the possibility to execute embedded analysis or statistical computation. In an environment such as CERN where many of the fundamental components are database driven is an essential feature to improve the performance of the data analytics, which, at the same time, is also one of the central requirements for most of the analytics needs.
 The session will introduce some of the basic concepts of R, objects, looping functions, data visualization and graphical representation and finally we will review some more advance functionalities such as in-database analytics using ORE. Also some best practices and tools such as Rstudio or statET will be also introduced.
      Speaker: Manuel Martin Marquez (CERN)
      Slides
    • 10:55 11:25
      Automated High Voltage Trip Recovery based on Complex Event Processing (CEP) 30m
      The Presentation will describe a solution for an automated high voltage trip recovery based event stream and complex event processing.
      Speaker: Evaldas Juska (Fermi National Accelerator Lab. (US))
      Poster
      Slides
    • 11:25 12:00
      Discussion