Workshops
Hadoop Tutorials - Hadoop Foundations
by
,
→
Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)
Description
The Hadoop ecosystem is the leading opensource platform for distributed storage and processing of "big data". The Hadoop platform is available at CERN as a central service provided by the IT department.
This tutorial organized by the IT Hadoop service, aims to introduce the main concepts about Hadoop technology in a practical way and is targeted to those who would like to start using the service for distributed parallel data processing.
The main topics that will be covered are:
- Hadoop architecture and available components
- How to perform distributed parallel processing in order to explore and create reports with SQL (with Apache Impala) on example data.
- Using a HUE - Hadoop web UI for presenting the results in user friendly way.
- How to format and/or structure data in order to make data processing more efficient - by using various data formats/containers and partitioning techniques (Avro, Parquet, HBase). Best practices in this area will be also discussed
Attendees will have the possibility to access a test Hadoop system where they will be able to perform hands-on exercises. Instructions will be provided by the speakers. To facilitate the preparation of the test environment, please register if you plan to attend.
Registration
Participants
Webcast
There is a live webcast for this event