ACAT 2021

Name: ACAT 2021
Start: 2021-11-29T08:30:00+09:00
End: 2021-12-03T19:30:00+09:00
Location: Virtual and IBS Science Culture Center, Daejeon, South Korea

29 November 2021 to 3 December 2021

Virtual and IBS Science Culture Center, Daejeon, South Korea

Asia/Seoul timezone

ACAT 2021

Distributed RDataFrame: leveraging Dask and latest optimisations

contribution ID 645

Not scheduled

20m

Orange (Gather.Town)

Orange

Gather.Town

Poster Track 2: Data Analysis - Algorithms and Tools Posters: Orange

Vincenzo Eduardo Padulano (Valencia Polytechnic University (ES))

The declarative approach to data analysis provides high-level abstractions for users to operate on their datasets in a much more ergonomic fashion compared to imperative interfaces. ROOT offers such a tool with RDataFrame, which creates a computation graph with the operations issued by the user and executes it lazily only when the final results are queried. It has always been oriented towards parallelisation, with native support for multithreading execution on a single machine.

Recently, RDataFrame has been extended with a Python layer that is capable of steering and executing the RDataFrame computation graph over a set of distributed resources, requiring minimal code changes for an RDataFrame application to run distributedly. The new tool features a modular design, such that it can support multiple backends - a single interface can be then connected to multiple distributed computing frameworks.

Since v6.24, Distributed RDataFrame has already been included in ROOT as an experimental feature, and it is currently under heavy development. This presentation will show the current performance figures when running real analyses with two different computing frameworks: Apache Spark and Dask. Furthermore, the performance optimisations that are being applied to Distributed RDataFrame will be discussed, namely caching, exploitation of RDataFrame native multithreading and compilation of C++ kernels in the distributed worker processes.

Speaker time zone	Compatible with Europe

Vincenzo Eduardo Padulano (Valencia Polytechnic University (ES)) Ivan Donchev Kabadzhov (Albert Ludwig University of Freiburg) Enric Tejedor Saavedra (CERN) Enrico Guiraud (EP-SFT, CERN)

645_poster.pdf

645_poster.png

645_preview.png

ACAT 2021

ACAT 2021

Distributed RDataFrame: leveraging Dask and latest optimisations

Orange

Gather.Town

Speaker

Description

Authors

Presentation materials