Indico celebrates its 20th anniversary! Check our blog post for more information!

10–13 Sept 2018
Academy of Sciences and Arts (Akademija nauka i umjetnosti Bosne i Hercegovine)
Europe/Sarajevo timezone
FUN WITH DATA!

Integrating ROOT I/O with Apache Spark

10 Sept 2018, 14:15
15m
Academy of Sciences and Arts (Akademija nauka i umjetnosti Bosne i Hercegovine)

Academy of Sciences and Arts (Akademija nauka i umjetnosti Bosne i Hercegovine)

7, Bistrik Sarajevo 71000, Bosnia and Herzegovina https://goo.gl/maps/Ct9jKrSER4z

Speaker

Viktor Khristenko (CERN)

Description

The DEEP-EST is the European Project building a new generation of the Modular Supercomputer Architecture (MSA). The MSA is a blueprint for heterogeneous HPC systems supporting high performance compute and data analytics workloads with highest efficiency and scalability.

Within the context of the project, we are working on the JVM based implementation of the ROOT File Format, spark-root/root4j, together with an Apache Spark Data Source. Current implementation allows to directly ingest HEP data, perform stream/batch processing and integrate Machine Learning pipelines with Apache Spark.

In this talk, we first discuss the intricacies and internals of the JVM-based implementation. Interesting examples of "bootstrapping ROOT" File Format will be provided as a proof of the robustness and simplicity of the structure of the format itself.

Furthermore, considering Apache Spark constitutes a query execution engine, comparisons of ROOT/c++ based workloads to Apache Spark based ones will be provided and discussed.

Primary author

Presentation materials