PyHEP WG topical meeting - RootInteractive

Eduardo Rodrigues (University of Liverpool (GB)), Jim Pivarski (Princeton University), Oksana Shadura (University of Nebraska Lincoln (US))

“Python Module of the Month” topical meetings follow the idea of the Python 3 Module of the Week, but with a spirit adapted to our needs: presentations with a focus on libraries relevant to data analysis in Particle Physics.

They typically take place the first Wednesday of each month, by default at 16h00 CET.

    • 4:00 PM 5:00 PM
      RootInteractive tool for multidimensional statistical analysis, machine learning and analytical model validation 1h

      ALICE, one of the four large experiments at CERN LHC, is a detector for the physics of heavy ions. In a high interaction rate environment, the pile-up of multiple events leads to an environment that requires advanced multidimensional data analysis methods.

      Machine learning (ML) has become popular in multidimensional data analysis in recent years. Compared to the simple, low-dimensional analytical approaches used in the past, it is more difficult to interpret machine learning models and evaluate their uncertainties. On the other hand, oversimplification and reduction of dimensionality in the analysis lead to explanations becoming more complex or wrong.

      Our goal was to provide a tool for dealing with NDimensional problems, to simplify data analysis in many (optimally all relevant) dimensions, to fit and visualize N-dimensional functions including their uncertainties and biases, to validate assumptions and approximations, to define multidimensional "invariant" functions/alarms.

      RootInteractive is a general-purpose tool for multidimensional statistical analysis. We use a declarative programming paradigm, where we build the structure and elements of computer programs and express the logic of a computation without describing its control flow. This approach makes it easy to use for domain experts, students and educators. RootInteractive provides functions for interactive, easily configurable visualization of unbinned and binned data, interactive n-dimensional histogramming/projection, and derived aggregate information extraction on the server (Python/C++) and client (Javascript). We support client/server applications using Jupyter, or we can create a stand-alone client-side application/dashboard.

      Using a combination of lossy and lossless data compression, datasets with, for example, O(10^7) entries x O(10-50) attributes can be analyzed interactively in the standalone application in the O(500 MBy) browser. By applying a suitable representative down-sampling O(10^-2-10^-3) and subsequent re-weighting or pre-aggregation on the server or batch farm, the effective monthly/annual statistics ALICE can be analyzed interactively in many dimensions for calibration/reconstruction validation/QA/QC or statistical/physical analysis.

      Speakers: Marian I Ivanov (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE)), Marian Ivanov (Comenius University (SK))