ACAT 2021

Name: ACAT 2021
Start: 2021-11-29T08:30:00+09:00
End: 2021-12-03T19:30:00+09:00
Location: Virtual and IBS Science Culture Center, Daejeon, South Korea

29 November 2021 to 3 December 2021

Virtual and IBS Science Culture Center, Daejeon, South Korea

Asia/Seoul timezone

ACAT 2021

Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data

contribution ID 587

2 Dec 2021, 11:00

20m

S221-A (Virtual and IBS Science Culture Center)

S221-A

Virtual and IBS Science Culture Center

55 EXPO-ro Yuseong-gu Daejeon, South Korea email: library@ibs.re.kr +82 42 878 8299

Oral Track 1: Computing Technology for Physics Research Track 1: Computing Technology for Physics Research

Nick Smith (Fermi National Accelerator Lab. (US))

Query languages for High Energy Physics (HEP) are an ever present topic within the field. A query language that can efficiently represent the nested data structures that encode the statistical and physical meaning of HEP data will help analysts by ensuring their code is more clear and pertinent. As the result of a multi-year effort to develop an in-memory columnar representation of high energy physics data, the numpy, awkward arrays, and uproot python packages present a mature and efficient interface to HEP data. Atop that base, the coffea package adds functionality to launch queries at scale, manage and apply experiment-specific transformations to data, and present a rich object-oriented columnar data representation to the analyst. Recently, a set of Analysis Description Language (ADL) benchmarks has been established to compare HEP queries in multiple languages and frameworks. In this paper we present these benchmark queries implemented within the coffea framework and discuss their readability and performance characteristics. We find that the columnar queries perform as well or better than the implementations given in previous studies.

References

https://arxiv.org/abs/2008.12712

Significance

This paper contains new results not given in any incremental or project status update that also present a fair and accurate performance evaluation of this framework on open data. The results serve to further justify the viability of scientific python ecosystem tools for HEP analysis.

Speaker time zone	Compatible with America

Nick Smith (Fermi National Accelerator Lab. (US)) The CMS Collaboration

ncsmith_acat2021_coffea.pdf

Recording

ACAT 2021

ACAT 2021

Evaluating awkward arrays, uproot, and coffea as a query platform for High Energy Physics data

S221-A

Virtual and IBS Science Culture Center

Speaker

Description

References

Significance

Authors

Presentation materials