PyHEP 2021 (virtual) Workshop

Name: PyHEP 2021 (virtual) Workshop
Start: 2021-07-05T12:00:00+02:00
End: 2021-07-09T21:00:00+02:00
Location: No location set

5–9 Jul 2021

Europe/Zurich timezone

Contact us

pyhep2021-organisation@cern.ch

Computing tag-and-probe efficiencies with Apache Spark and Apache Parquet

5 Jul 2021, 17:40

30m

Notebook talk Plenary Session Monday

Andre Frankenthal (Princeton University (US))

In this talk we demonstrate a new framework developed by the muon physics object group in CMS to compute tag-and-probe (T&P) efficiencies and scale factors by leveraging the power and scalability of Apache Spark clusters. The package, named “spark_tnp”, allows physics analyzers and other users to quickly and seamlessly compute efficiencies for their own custom objects and identification criteria, developed to meet a diverse set of physics goals within the Collaboration. For the backend cluster, we use CERN’s Spark and Hadoop services (“analytix” cluster). The ntuples with event information are produced separately in ROOT and converted to Apache Parquet format, which are then stored at CERN’s Hadoop filesystem (HDFS) facility. The combined leverage of Spark and Parquet files in HDFS enables a substantial speed-up of T&P computations, with custom scale factors derived in a matter of minutes, compared to days in a previous framework. The tutorial itself will focus on a Jupyter notebook example of a T&P computation, using CERN’s SWAN service for easy access to the analytix cluster within an interactive environment (though the package also supports scripted execution for official production).

Andre Frankenthal (Princeton University (US))

PyHEP_2021_MuonPOG.pdf

YouTube Recording

PyHEP 2021 (virtual) Workshop

Contact us

Computing tag-and-probe efficiencies with Apache Spark and Apache Parquet

Speaker

Description

Primary author

Presentation materials

Choose timezone

PyHEP 2021 (virtual) Workshop

Contact us

Speaker

Description

Primary author

Presentation materials

Share this page

Direct link

Social networks

Calendaring