23–28 Oct 2022
Villa Romanazzi Carducci, Bari, Italy
Europe/Rome timezone

The Federation - A novel machine learning technique applied on data from the Higgs Boson Machine Learning Challenge

25 Oct 2022, 17:40
20m
Sala Europa (Villa Romanazzi)

Sala Europa

Villa Romanazzi

Oral Track 2: Data Analysis - Algorithms and Tools Track 2: Data Analysis - Algorithms and Tools

Speaker

Maximilian Mucha (University of Bonn (DE))

Description

The Federation is a new machine learning technique for handling large amounts of data in a typical high-energy physics analysis. It utilizes Uniform Manifold Approximation and Projection (UMAP) to create an initial low-dimensional representation of a given data set, which is clustered by using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). These clusters can then be used for a federated learning approach, in which we separately train a classifier on the data of each individual cluster. As a requirement for this approach, we need to apply an imbalanced learning method to the data in the found clusters before the training. By using a Dynamic Classifier Selection method, the Federation can then make predictions for the whole data set. As a proof of concept for this novel technique, open data from the Higgs Boson Machine Learning Challenge is used and comparisons to results from established methods will be presented. We also investigated the issue of handling missing values and the jet-count feature for this data.

Significance

First application of UMAP and imbalanced learning methods in high-energy particle physics.

Primary author

Maximilian Mucha (University of Bonn (DE))

Co-author

Eckhard Von Torne (University of Bonn (DE))

Presentation materials

Peer reviewing

Paper