Based on work in the ROOTLINQ project, we’ve re-written a functional declarative analysis language in Python. With a declarative language, the physicist specifies what they want to do with the data, rather than how they want to do it. Then the system translates the intent into actions. Using declarative languages would have numerous benefits for the LHC community, ranging from analysis preservation that goes beyond the lifetimes of experiments or analysis software, to facilitating the abstraction, design, validation, combination, interpretation and overall communication of the contents of LHC analyses. This talk focuses on an ongoing effort to define an analysis language based on queries, designed to loop over structured data including a complete set of unambiguous operations. This project has several implementation goals: 1) Design a syntax that matches how physicists think about event data, 2) Run on different back-end formats, including binary data (xAOD’s from ATLAS, for example), flat TTree’s using RDataFrame, and columnar data in python. This work will further help to understand the differences between Analysis Languages and Data Query Languages in HEP, how hard it is to translate data manipulation from a row-wise-centric layout to a column-wise-centric layout, and, finally, to scale from a small laptop-like environment to a larger cluster. The system currently has all three backends implemented to varying degrees and is being used in a full Run 2 analysis in ATLAS. The plans, goals, design, progress, and pitfalls will be described in this presentation.
|Consider for promotion||Yes|