Description
Binary decision trees are a widely used tool for supervised classification of high-dimensional data, for example among particle physicists. We present our proposal of the supervised binary divergence decision tree with nested separation method based on kernel density estimation. A key insight we provide is the clustering driven only by a few selected physical variables. The proper selection consists of the variables achieving the maximal divergence measure between two different subclasses of data. Further we apply our method to Monte Carlo data set from the particle accelerator Tevatron at the D0 experiment in Fermilab. We also introduce the modification of statistical tests applicable to weighted data sets in order to test homogeneity of the Monte Carlo simulation and real data.
Primary Keyword (Mandatory) | Algorithms |
---|---|
Secondary Keyword (Optional) | Analysis tools and techniques |
Tertiary Keyword (Optional) | Artificial intelligence/Machine learning |