Workshops

Spark - a modern approach for distributed analytics

Name: Spark - a modern approach for distributed analytics
Start: 2016-08-03T10:30:00+02:00
End: 2016-08-03T12:00:00+02:00
Location: CERN

by Mr Kacper Surdy, Mr Prasanth Kothuri (CERN)

Wednesday 3 Aug 2016, 10:30 → 12:00 Europe/Zurich

31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105

Show room on map

Description

The Hadoop ecosystem is the leading opensource platform for distributed storing and processing big data. It is a very popular system for implementing data warehouses and data lakes. Spark has also emerged to be one of the leading engines for data analytics. The Hadoop platform is available at CERN as a central service provided by the IT department.

By attending the session, a participant will acquire knowledge of the essential concepts need to benefit from the parallel data processing offered by Spark framework. The session is structured around practical examples and tutorials.

Main topics:

Architecture overview - work distribution, concepts of a worker and a driver
Computing concepts of transformations and actions
Data processing APIs - RDD, DataFrame, and SparkSQL

From the same series

1 2 4

Registration

Participants

Webcast

There is a live webcast for this event

Choose timezone

Spark - a modern approach for distributed analytics

by Mr Kacper Surdy, Mr Prasanth Kothuri (CERN)

31/3-004 - IT Amphitheatre

CERN