Workshops

Hadoop Tutorials - Hadoop Foundations

by Daniel Lanza Garcia (Ministere des affaires etrangeres et europeennes (FR)), Zbigniew Baranowski (CERN)

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

The Hadoop ecosystem is the leading opensource platform for distributed storage and processing of "big data". The Hadoop platform is available at CERN as a central service provided by the IT department.

This tutorial organized by the IT Hadoop service, aims to introduce the main concepts about Hadoop technology in a practical way and is targeted to those who would like to start using the service for distributed parallel data processing.

The main topics that will be covered are:

  • Hadoop architecture and available components
  • How to perform distributed parallel processing in order to explore and create reports with SQL (with Apache Impala) on example data.
  • Using a HUE - Hadoop web UI for presenting the results in user friendly way.
  • How to format and/or structure data in order to make data processing more efficient - by using various data formats/containers and partitioning techniques (Avro, Parquet, HBase). Best practices in this area will be also discussed

 

Attendees will have the possibility to access a test Hadoop system where they will be able to perform hands-on exercises. Instructions will be provided by the speakers. To facilitate the preparation of the test environment, please register if you plan to attend.

From the same series
2 3 4
Registration
Participants
Webcast
There is a live webcast for this event