ACAT 2017

Name: ACAT 2017
Start: 2017-08-21T07:45:00-07:00
End: 2017-08-25T18:00:00-07:00
Location: University of Washington, Seattle

21–25 Aug 2017

University of Washington, Seattle

US/Pacific timezone

Need Help?

Exploiting Apache Spark platform for CMS computing analytics

22 Aug 2017, 15:40

20m

Auditorium (Alder Hall)

Auditorium

Alder Hall

Oral Track 1: Computing Technology for Physics Research Track 1: Computing Technology for Physics Research

Marco Meoni (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P)

The CERN IT provides a set of Hadoop clusters featuring more than 5 PB of raw storage. Different open-source user-level tools are installed for analytics purposes. For this reason, since early 2015, the CMS experiment has started to store a large set of computing metadata, including e.g. a massive number of dataset access log.. Several streamers have registered some billions traces from heterogeneous providers. These trace logs represent a valuable yet scarcely investigated set of information that needs to be cleansed, categorized and correlated; in the case of the CMS dataset access information, this work may lead to discover useful patterns to enhance the overall efficiency of the distributed infrastructure in terms of CPU utilization and task completion time. This work presents an evaluation of Apache Spark platform for CMS needs. We demonstrate a few use-cases how to efficiently process metadata information stored on CERN HDFS system in a scalable manner by harnessing a variety of languages of choice. Among them, Scala and Python offer the best approach to CMS use cases for executing extremely I/O intensive queries that leverage in-memory and persistence Spark API as well as assess streamlining predictive models that can learn dataset properties using machine learning approaches.

Prof. Daniele Bonacorsi (University of Bologna) Valentin Y Kuznetsov (Cornell University (US)) Tommaso Boccali (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P) Marco Meoni (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P) Luca Menichetti (CERN)

marcomeoni_96.pdf

Video

exploiting-apache-spark.pdf

ACAT 2017

Need Help?

Exploiting Apache Spark platform for CMS computing analytics

Auditorium

Alder Hall

Speaker

Description

Authors

Presentation materials

Peer reviewing

Paper