21–25 Aug 2017
University of Washington, Seattle
US/Pacific timezone

Speeding up prediction performance of the boosting decision trees-based learning models.

24 Aug 2017, 14:00
20m
107 (Alder Hall)

107

Alder Hall

Oral Track 2: Data Analysis - Algorithms and Tools Track 2: Data Analysis - Algorithms and Tools

Speaker

Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))

Description

The result of many machine learning algorithms are computational complex models. And further growth in the quality of the such models usually leads to a deterioration in the applying times. However, such high quality models are desirable to be used in the conditions of limited resources (memory or cpu time).
This article discusses how to trade the quality of the model for the speed of its applying a novel boosted trees algorithm called Catboost. The idea is to combine two approaches: training fewer trees and uniting trees into huge cubes. The proposed method allows for pareto-optimal reduction of the computational complexity of the decision tree model with regard to the quality of the model. In the considered example number of lookups was decreased from 5000 to only 6 (speedup factor of 1000) while AUC score of the model was reduced by less than per mil.

Primary authors

Mr Egor Khairullin (Moscow Institute of Physics and Technology, Yandex School of Data Analysis (RU)) Andrey Ustyuzhanin (Yandex School of Data Analysis (RU))

Presentation materials

Peer reviewing

Paper