CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Towards automation of data quality system for CERN CMS experiment

13 Oct 2016, 15:30

1h 15m

San Francisco Marriott Marquis

Poster Track 7: Middleware, Monitoring and Accounting Posters B / Break

Maxim Borisyak (National Research University Higher School of Economics (HSE) (RU); Yandex School of Data Analysis (RU))

Daily operation of a large scale experimental setup is a challenging task both in terms of maintenance and monitoring. In this work we describes an approach for automated Data Quality system. Based on the Machine Learning methods it can be trained online on manually-labeled data by human experts. Trained model can assist data quality managers filtering obvious cases (both good and bad) and asking for further estimation only of fraction of poorly-recognizable datasets.

The system is trained on CERN open data portal data published by CMS experiment. We demonstrate that our system is able to save at least 20% of person power without increase in pollution (false positive) and loss (false negative) rates. In addition, for data not labeled automatically system provides its estimates and hints for a possible source of anomalies which leads to overall improvement of data quality estimations speed and higher purity of collected data.

Primary Keyword (Mandatory)	Monitoring
Secondary Keyword (Optional)	Artificial intelligence/Machine learning

Maxim Borisyak (National Research University Higher School of Economics (HSE) (RU); Yandex School of Data Analysis (RU))

Andrey Ustyuzhanin (National Research University Higher School of Economics (HSE) (RU); Yandex School of Data Analysis (RU)) Dmitry Smolyakov (Yandex School of Data Analysis (RU)) Dr Jean-Roch Vlimant (California Institute of Technology (US)) Maria Stenina (Yandex (RU)) Maurizio Pierini (CERN)

Highlights-259.pdf

Poster-259.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Towards automation of data quality system for CERN CMS experiment

San Francisco Marriott Marquis

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Speaker

Description

Author

Co-authors

Presentation materials