ACAT 2010

Name: ACAT 2010
Start: 2010-02-22T08:00:00+01:00
End: 2010-02-27T18:00:00+01:00
Location: Jaipur, India

22–27 Feb 2010

Jaipur, India

Europe/Zurich timezone

Classifying extremely imbalanced data sets

23 Feb 2010, 14:25

25m

Jaipur, India

Parallel Talk Data Analysis - Algorithms and Tools Tuesday, 23 February - Data Analysis - Algorithms and Tools

Markward Britsch (Max-Planck-Institut fuer Kernphysik (MPI)-Unknown-Unknown)

Imbalanced data sets containing much more background than signal instances are very common in particle physics, and will also be characteristic for the upcoming analyses of LHC data. Following up the work presented at ACAT 2008, we use the multivariate technique presented there (a rule growing algorithm with the meta-methods bagging and instance weighting) on much more imbalanced data sets, especially a selection of D0 decays without the use of particle identification. It turns out that the quality of the result strongly depends on the number of background instances used for training. We discuss methods to exploit this in order to improve the results significantly, and how to handle and reduce the size of large training sets without loss of result quality in general. We will also comment on how to take into account statistical fluctuation in receiver operation curves (ROC) for comparing classifier methods.

Markward Britsch (Max-Planck-Institut fuer Kernphysik (MPI)-Unknown-Unknown)

Michael Schmelling (Max-Planck-Institut fuer Kernphysik (MPI)) Nikolai Gagunashvili (University of Akureyri)

Slides

T2_Britsch.pdf

ACAT 2010

Classifying extremely imbalanced data sets

Jaipur, India

Speaker

Description

Author

Co-authors

Presentation materials