CERN Accelerating science

Talk
Title Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)
Video
If you experience any problem watching the video, click the download button below
Download Embed
Mp4:Medium
(1000 kbps)
High
(4000 kbps)
More..
Copy-paste this code into your page:
Copy-paste this code into your page to include both slides and lecture:
Author(s) Motesnitsalis, Evangelos (speaker) (CERN)
Corporate author(s) CERN. Geneva
Imprint 2019-03-05. - 1:06:30.
Series (Inverted CSC)
(Inverted CERN School of Computing 2019)
Lecture note on 2019-03-05T13:30:00
Subject category Inverted CSC
Abstract The Large Hadron Collider is scheduled to shut down for a 2 years maintenance period since December 2018. However, the already collected data -which are stored in a dedicated custom storage service- between April 2015 and November 2018, exceed 150 PBs in total. To analyse these data, more and more teams at CERN decide to use Big Data Technologies to perform Physics Analysis and "Data Reduction", i.e. produce smaller reusable datasets for frequent access. These technologies show great potential in speeding up the existing procedures. This lecture will provide an overview of the latest trending big data technologies in the Hadoop and Spark ecosystems with focus on their main architecture characteristics, and then will target a number of important questions: How can we perform Physics Analysis with Big Data Technologies? What are the problems faced? What are the challenges and the available data sources? What are the other domain in which Big Data Analytics are applied at CERN?
Copyright/License © 2019-2024 CERN
Submitted by sebastian.lopienski@cern.ch

 


 Record created 2019-03-11, last modified 2022-11-02


External links:
Download fulltextTalk details
Download fulltextEvent details