24th International Conference on Computing in High Energy & Nuclear Physics

Name: 24th International Conference on Computing in High Energy & Nuclear Physics
Start: 2019-11-04T08:00:00+10:30
End: 2019-11-08T13:00:00+10:30
Location: Adelaide Convention Centre

4–8 Nov 2019

Adelaide Convention Centre

Australia/Adelaide timezone

Contact us

Machine Learning Pipelines for HEP Using Big Data Tools Applied to Improving Event Filtering

5 Nov 2019, 15:30

Hall F (Adelaide Convention Centre)

Hall F

Adelaide Convention Centre

Poster Track 6 – Physics Analysis Posters

Marco Zanetti (Universita e INFN, Padova (IT))

This work addresses key technological challenges in the preparation of data pipelines for machine learning and deep learning at scale of interest for HEP. A novel prototype to improve the event filtering system at LHC experiments, based on a classifier trained using deep neural networks has recently been proposed by T. Nguyen et al. https://arxiv.org/abs/1807.00083. This presentation covers how we implemented the data pipeline to train the neural network classifier using solutions from the Apache Spark and Big Data ecosystem, integrated with tools, software, and platforms common in the HEP environment. Data preparation and feature engineering make use of PySpark, Spark SQL and Python code run via Jupyter notebooks. We will discuss key integrations and libraries that make Apache Spark able to ingest data stored using ROOT and its integration EOS/XRootD protocol. The presentation will cover the neural network models used, defined using the Keras API, and how the models have been trained in a distributed fashion on Spark clusters using BigDL and Analytics Zoo. We will discuss the implementation, the results of the distributed training, and overall the lessons learned on using Big Data tools to implement an end-to-end ML pipeline.

Consider for promotion	No

Marco Zanetti (Universita e INFN, Padova (IT)) Matteo Migliorini (Universita e INFN, Padova (IT)) Luca Canali (CERN)

There are no materials yet.

24th International Conference on Computing in High Energy & Nuclear Physics

Contact us

Machine Learning Pipelines for HEP Using Big Data Tools Applied to Improving Event Filtering

Hall F

Adelaide Convention Centre

Speaker

Description

Authors

Presentation materials