Spark - a modern approach for distributed analytics

Talk

Title

Spark - a modern approach for distributed analytics

Video

If you experience any problem watching the video, click the download button below

Mp4:		Medium (800 kbps)	High (2000 kbps)	More..
	512 kbps 1000 kbps

Copy-paste this code into your page:
<iframe width="640" height="360" frameborder="0" src="https://cds.cern.ch/video/2214510?showTitle=true" allowfullscreen></iframe>

Author(s)

Surdy, Kacper (speaker) ; Kothuri, Prasanth (speaker) (CERN)

Corporate author(s)

CERN. Geneva

Imprint

2016-08-03. - Streaming video.

Series

(Workshops)

Lecture note

on 2016-08-03T10:30:00

Subject category

Workshops

Abstract

The Hadoop ecosystem is the leading opensource platform for distributed storing and processing big data. It is a very popular system for implementing data warehouses and data lakes. Spark has also emerged to be one of the leading engines for data analytics. The Hadoop platform is available at CERN as a central service provided by the IT department.

By attending the session, a participant will acquire knowledge of the essential concepts need to benefit from the parallel data processing offered by Spark framework. The session is structured around practical examples and tutorials.

Main topics:

Architecture overview - work distribution, concepts of a worker and a driver
Computing concepts of transformations and actions
Data processing APIs - RDD, DataFrame, and SparkSQL

Submitted by

zbigniew.baranowski@cern.ch

Back to search

Record created 2016-09-09, last modified 2022-11-02

Similar records

External link:

Event details

Add to personal basket
Export as BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

CERN Document Server

Access articles, reports and multimedia content in HEP

Main menu

CERN Accelerating science