21–25 Aug 2017
University of Washington, Seattle
US/Pacific timezone

Deep Learning Method for Inferring Cause of Data Anomalies

22 Aug 2017, 16:00
45m
The Commons (Alder Hall)

The Commons

Alder Hall

Poster Track 2: Data Analysis - Algorithms and Tools Poster Session

Speaker

Fedor Ratnikov (Yandex School of Data Analysis (RU))

Description

Daily operation of a large-scale experiment is a resource consuming task, particularly from perspectives of routine data quality monitoring. Typically, data comes from different channels (subdetectors or other subsystems) and the global quality of data depends on the performance of each channel. In this work, we consider the problem of prediction which channel has been affected by anomalies in the detector behaviour.
We introduce a generic deep learning model and prove, that, under reasonable assumptions, the model learns to identify 'channels' affected by an anomaly. Such model could be used for data quality manager cross-check and assistance and identifying good channels in anomalous data samples.
The main novelty of the method is that the model does not require ground truth labels for each channel, only global flag is used. This effectively distinguishes the model from classical classification methods.
Evaluation of the method on data collected by the CERN CMS experiment is presented.

Primary author

Maxim Borisyak (Yandex School of Data Analysis (RU))

Co-authors

Denis Derkach (Yandex School of Data Analysis (RU)) Andrey Ustyuzhanin (Yandex School of Data Analysis (RU)) Olga Koval (Yandex) Fedor Ratnikov (Yandex School of Data Analysis (RU))

Presentation materials

Peer reviewing

Paper