Workshops

Machine learning with Spark MLlib

by Antonio Romero Marin (CERN), Joeri Hermans (Universiteit Maastricht (NL)), Manuel Martin Marquez (CERN)

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

Machine learning has become a hot-topic. Spark is showing rapid adoption as an engine and framework for working on machine learning problems at scale. In particular Spark provides distributed computing, integration with the rest of the Hadoop ecosystem and specialized libraries for machine learning (MLlib).

In this tutorial the participant will learn why Apache Spark is a good solution for big data analysis and how to use Apache Spark and Python for machine learning. As an example, we will use the data from the Higgs Boson Machine Learning Challenge published in Kaggle by the ATLAS experiment. The goal of this challenge is to explore the potential of machine learning methods to improve the discovery significance of the experiment.

We will guide the participant through the complete analysis pipeline using Spark's MLlib (Spark's built-in Machine Learning library); starting with data preparation and feature selection, and ending with model evaluation techniques such as cross-validation.

 

From the same series
1 2 3
Registration
Participants
Webcast
There is a live webcast for this event