HEPiX Spring 2019 Workshop

Name: HEPiX Spring 2019 Workshop
Start: 2019-03-25T08:00:00-07:00
End: 2019-03-29T23:35:00-07:00
Location: SDSC Auditorium

25–29 Mar 2019

SDSC Auditorium

America/Los_Angeles timezone

Organisers

hepix-2019spring-support@hepix.org

Evolution of interactive data analysis for HEP at CERN – SWAN, Kubernetes, Apache Spark and RDataFrame

27 Mar 2019, 14:50

25m

E-B 212 (SDSC Auditorium)

E-B 212

SDSC Auditorium

10100 Hopkins Drive La Jolla, CA 92093-0505

Computing & Batch Services Computing & Batch Systems

Piotr Mrowczynski (CERN)

This talk is focused on recent experiences and developments in providing data analytics platform SWAN based on Apache Spark for High Energy Physics at CERN.

The Hadoop Service expands its user base for analysts who want to perform analysis with big data technologies - namely Apache Spark – with main users from accelerator operations and infrastructure monitoring. Hadoop Service integration with SWAN Service offers scalable interactive data analysis and visualizations using Jupyter notebooks, with computations being offloaded to compute clusters - on-premise YARN clusters and more recently to cloud-native Kubernetes clusters. The ROOT framework is most widely used tool for high-energy physics analysis. Its integration with SWAN allows physicists to perform web-based interactive analysis using standard tools and libraries, in the cloud.

The first part of presentation will focus on integration of Spark on Kubernetes into SWAN service, which allows to offload computations to elastic, virtualized and container-based infrastructure in the private or public clouds, compared to complex to manage and operate on-premise Hadoop clusters.

The second part will focus on evolutions in exploiting analytics infrastructure - namely new developments in ROOT framework – Distributed RDataFrame - which would allow interactive, parallel and distributed analysis on large physics datasets by transparently exploiting dynamically pluggable resources in SWAN, e.g. Hadoop or Kubernetes clusters.

Piotr Mrowczynski (CERN)

Prasanth Kothuri (CERN) Enric Tejedor (CERN)

Evolution of interactive data analysis for HEP at CERN.pdf

HEPiX Spring 2019 Workshop

Organisers

Evolution of interactive data analysis for HEP at CERN – SWAN, Kubernetes, Apache Spark and RDataFrame

E-B 212

SDSC Auditorium

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

HEPiX Spring 2019 Workshop

Organisers

Speaker

Description

Author

Co-authors

Presentation materials