Speaker
Description
In the domain of high-energy physics (HEP), query languages in general and SQL in particular have found limited acceptance. This is surprising since HEP data analysis matches the SQL model well: the data is fully structured and queried using mostly standard operators. To gain insights on why this is the case, we perform a comprehensive analysis of six diverse, general-purpose data processing platforms and compare them with ROOT's RDataFrame interface executing the Analysis Description Languages (ADL) benchmark. We identify 16 language features that are useful in implementing typical query patterns found in HEP analyses, categorize them in terms of how essential they are, and analyze how well the different query interfaces implement them. The result of the evaluation is an interesting and rather complex picture of existing solutions: Their query languages vary greatly in how natural and concise HEP query patterns can be expressed but the best-suited languages arguably allow for more elegant query formulations than RDataFrames. At the same time, most of them are also between one and two orders of magnitude slower than that system when tested on large data sets. These observations suggest that, while database systems and their query languages are in principle viable tools for HEP, significant performance improvements are necessary to make them relevant in practice.
Significance
The talk presents the outcome of a collaboration of particle physicists at the University of Washington and database systems researchers at ETH Zurich. The results of the study are completely novel and have not been submitted or published before (except for the reference below).
References
In revision for the Proceedings of the VLDB Endowment Vol. 15 (https://vldb.org/pvldb/vol15-volume-info/). Will be presented at the 48th International Conference on Very Large Data Bases 2022 (VLDB, http://vldb.org/2022/) if accepted. Preprint available on arXiv: https://arxiv.org/abs/2104.12615.
Speaker time zone | Compatible with Europe |
---|