Conveners
T6 - Machine learning and physics analysis: S1
- Sergei Gleyser
- Andrea Rizzi (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P)
T6 - Machine learning and physics analysis: S2
- Sofia Vallecorsa (Gangneung-Wonju National University (KR))
T6 - Machine learning and physics analysis: S3
- Sofia Vallecorsa (Gangneung-Wonju National University (KR))
T6 - Machine learning and physics analysis: S4
- Sergei Gleyser
T6 - Machine learning and physics analysis: S5
- Andrea Rizzi (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P)
T6 - Machine learning and physics analysis: S6
- Sofia Vallecorsa (Gangneung-Wonju National University (KR))
T6 - Machine learning and physics analysis: S7
- Andrea Rizzi (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P)
The High Luminosity LHC (HL-LHC) represents an unprecedented computing challenge. For the program to succeed the current estimates from the LHC experiments for the amount of processing and storage required are roughly 50 times more than are currently deployed. Although some of the increased capacity will be provided by technology improvements over time, the computing budget is expected to...
In this talk, we will describe the latest additions to the Toolkit for Multivariate Analysis (TMVA), the machine learning package integrated into the ROOT framework. In particular, we will focus on the new deep learning module that contains robust fully-connected, convolutional and recurrent deep neural networks implemented on CPU and GPU architectures. We will present performance of these new...
The Scikit-HEP project is a community-driven and community-oriented effort with the aim of providing Particle Physics at large with a Python scientific toolset containing core and common tools. The project builds on five pillars that embrace the major topics involved in a physicist’s analysis work: datasets, data aggregations, modelling, simulation and visualisation. The vision is to build a...
HIPSTER (Heavily Ionising Particle Standard Toolkit for Event Recognition) is an open source Python package designed to facilitate the use of TensorFlow in a high energy physics analysis context. The core functionality of the software is presented, with images from the MoEDAL experiment Nuclear Track Detectors (NTDs) serving as an example dataset. Convolutional neural networks are selected as...
In the traditional HEP analysis paradigm, code, documentation, and results are separate entities that require significant effort to keep synchronized, which hinders reproducibility. Jupyter notebooks allow these elements to be combined into a single, repeatable narrative. HEP analyses, however, commonly rely on complex software stacks and the use of distributed computing resources,...
Neural networks, and recently, specifically deep neural networks, are attractive candidates for machine learning problems in high energy physics because they can act as universal approximators. With a properly defined objective function and sufficient training data, neural networks are capable of approximating functions for which physicists lack sufficient insight to derive an analytic,...
High Energy Physics experiments often rely on Monte-Carlo event generators. Such generators often contain a large number of parameters and need fine-tuning to closely match experimentally observed data. This task traditionally requires expert knowledge of the generator and the experimental setup as well as vast computing power.Generative Adversarial Networks (GAN) is a powerful method to match...
The certification of the CMS data as usable for physics analysis is a crucial task to ensure the quality of all physics results published by the collaboration. Currently, the certification conducted by human experts is labor intensive and can only be segmented on a run by run basis. This contribution focuses on the design and prototype of an automated certification system assessing data...
Online Data Quality Monitoring (DQM) in High Energy Physics experiment is a key task which, nowadays, is extremely expensive in terms of human resources and required expertise.
We investigate machine learning as a solution for automatised DQM. The contribution focuses on the peculiar challenges posed by the requirement of setting up and evaluating the AI algorithms in the online environment;...
In 2015 ATLAS Distributed Computing started to migrate its monitoring systems away from Oracle DB and decided to adopt new big data platforms that are open source, horizontally scalable, and offer the flexibility of NoSQL systems. Three years later, the full software stack is in place, the system is considered in production and operating at near maximum capacity (in terms of storage capacity...
The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by the researcher to produce the original scientific results in the first place.
REANA (=Reusable Analyses) is a nascent platform enabling researchers to...
We present recent work within the ATLAS collaboration to centrally provide tools to facilitate analysis management and highly automated container-based analysis execution in order to both enable non-experts to benefit from these best practices as well as the collaboration to track and re-execute analyses independently, e.g. during their review phase.
Through integration with the ATLAS GLANCE...
The distributed data management system Rucio manages all data of the ATLAS collaboration across the grid. Automation such as replication and rebalancing are an important part to ensure the minimum workflow execution times. In this paper, a new rebalancing algorithm based on machine learning is proposed. First, it can run independently of the existing rebalancing mechanism and can be...
In the last stages of data analysis, only order-of-magnitude computing speedups translate into increased human productivity, and only if they're not difficult to set up. Producing a plot in a second instead of an hour is life-changing, but not if it takes two hours to write the analysis code. Fortunately, HPC-inspired techniques can result in such large speedups, but unfortunately, they can be...
The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new...
Many analyses on CMS are based on the histogram, used throughout the workflow from data validation studies to fits for physics results. Binned data frames are a generalisation of multidimensional histograms, in a tabular representation where histogram bins are denoted by category labels. Pandas is an industry-standard tool, providing a data frame implementation that allows easy access to "big...
The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception.
First class analysis tools need to be provided to physicists which are easy to use, exploit the bleeding edge hardware technologies and allow to seamlessly express parallelism.
This contribution...
A new event data format has been designed and prototyped by the CMS collaboration to satisfy the needs of a large fraction of Physics Analyses (at least 50%) with a per event size of order 1 Kb. This new format is more than a factor 20x smaller than the MINIAOD format and contains only top level information typically used in the last steps of the analysis. The talk will review the current...
The HEP community is preparing for the LHC’s Run 3 and 4. One of the big challenges for physics analysis will be developing tools to efficiently express an analysis and able to efficiently process the x10 more data expected. Recently, interest has focused on declarative analysis languages: a way of specifying a physicists’ intent, and leaving everything else to the underlying system. The...
Within the field of dark matter direct detection, there has been very little penetration of machine learning. This is primarily due to the difficulty of modeling such low-energy detectors for training sets (the keV energies are $10^{-10}$ smaller than LHC). Xenon detectors have been leading the field of dark matter direct detection for the last decade. The current front runner is XENON1T,...
The Cherenkov Telescope Array (CTA) is the next generation of ground-based gamma-ray telescopes for gamma-ray astronomy. Two arrays will be deployed composed of 19 telescopes in the Northern hemisphere and 99 telescopes in the Southern hemisphere. Observatory operations are planned to start in 2021 but first data from prototypes should be available already in 2019. Due to its very high...
With the accumulation of large datasets at energy of 13 TeV, the LHC experiments can search for rare processes, where the extraction of the signal from the copious and varying Standard Model backgrounds poses increasing challenges. Techniques based on machine learning promise to achieve optimal search sensitivity and signal-to-background ratios for such searches. Taking the search for the...
Data Quality Assurance (QA) is an important aspect of every High-Energy Physics experiment, especially in the case of the ALICE Experiment at the Large Hadron Collider (LHC) whose detectors are extremely sophisticated and complex devices. To avoid processing low quality or redundant data, human experts are currently involved in assessing the detectors’ health during the collisions’ recording....
Data from B-physics experiments at the KEKB collider have a substantial background from $e^{+}e^{-}\to q \bar{q}$ events. To suppress this we employ deep neural network algorithms. These provide improved signal from background discrimination. However, the neural network develops a substantial correlation with the $\Delta E$ kinematic variable used to distinguish signal from background in the...
Experimental science often has to cope with systematic errors that coherently bias data. We analyze this issue on the analysis of data produced by experiments of the Large Hadron Collider at CERN as a case of supervised domain adaptation. The dataset used is a representative Higgs to tau tau analysis from ATLAS and released as part of the Kaggle Higgs ML challenge. Perturbations have been...
This presentation discusses some of the metrics used in HEP and other scientific domains for evaluating the relative quality of binary classifiers that are built using modern machine learning techniques. The use of the area under the ROC curve, which is common practice in the evaluation of diagnostic accuracy in the medical field and has now become widespread in many HEP applications, is...
The application of deep learning techniques using convolutional neu-
ral networks to the classification of particle collisions in High Energy Physics is
explored. An intuitive approach to transform physical variables, like momenta of
particles and jets, into a single image that captures the relevant information, is
proposed. The idea is tested using a well known deep learning framework on a...
An essential part of new physics searches at the Large Hadron Collider (LHC) at CERN involves event classification, or distinguishing signal events from the background. Current machine learning techniques accomplish this using traditional hand-engineered features like particle 4-momenta, motivated by our understanding of particle decay phenomenology. While such techniques have proven useful...
Jet flavour identification is a fundamental component for the physics program of the LHC-based experiments. The presence of multiple flavours to be identified leads to a multiclass classification problem. We present results from a realistic simulation of the CMS detector, one of two multi-purpose detectors at the LHC, and the respective performance measured on data. Our tagger, named DeepJet,...
Measurements of time-dependent CP violation and of $B$-meson mixing at B-factories require a determination of the flavor of one of the two exclusively produced $B^0$ mesons. The predecessors of Belle II, the Belle and BaBar experiments, developed so-called flavor tagging algorithms for this task. However, due to the novel high-luminosity conditions and the increased beam-backgrounds at Belle...
Reconstruction and identification in calorimeters of modern High Energy Physics experiments is a complicated task. Solutions are usually driven by a priori knowledge about expected properties of reconstructed objects. Such an approach is also used to distinguish single photons in the electromagnetic calorimeter of the LHCb detector on LHC from overlapping photons produced from high momentum...
The BESIII detector is a general purpose spectrometer located at BEPCII. BEPCII is a double ring $e^+e^-$ collider running at center of mass energies between 2.0 and 4.6 GeV and reached a peak luminosity of $1\times 10^{33}cm^{-2}s^{-1}$ at $\sqrt{s}$ =3770 MeV.
As an experiment in the high precision frontier of hadron physics, since 2009, BESIII has collected the world's largest data samples...
We show how a novel network architecture based on Lorentz Invariance (and not much else) can be used to identify hadronically decaying top quarks. We compare its performance to alternative approaches, including convolutional neural networks, and find it to be very competitive.
We also demonstrate how this architecture can be extended to include tracking information and show its application to...
In the recent years, several studies have demonstrated the benefit of using deep learning to solve typical tasks related to high energy physics data taking and analysis. Building on these proofs of principle, many HEP experiments are now working on integrating Deep Learning into their workflows. The computation need for inference of a model once trained is rather modest and does not usually...
In the field of High Energy Physics, the simulation of the interaction of particles in the material of calorimeters is a computing intensive task, even more so with complex and fined grained detectors. The complete and most accurate simulation of particle/matter interaction is primordial while calibrating and understanding the detector at the very low level, but is seldomly required at physics...
Measurements in LArTPC neutrino detectors feature high fidelity and result in large event images. Deep learning techniques have been extremely successful in classification tasks of photographs, but their application to LArTPC event images is challenging, due to the large size of the events; two orders of magnitude larger than images found in classical challenges like MNIST or ImageNet. This...
The ROOT Mathematical and Statistical libraries have been recently improved to facilitate the modelling of parametric functions that can be used for performing maximum likelihood fits to data sets to estimate parameters and their uncertainties.
We report here on the new functionality of the ROOT TFormula and TF1 classes to build these models in a convenient way for the users. We show how...
Analyses of multi-million event datasets are natural candidates to exploit the massive parallelisation available on GPUs. This contribution presents two such approaches to measure CP violation and the corresponding user experience.
The first is the energy test, which is used to search for CP violation in the phase-space distribution of multi-body hadron decays. The method relies on a...
In the proton-proton collisions at the LHC, the associate production of the Higgs boson with two top quarks has not been observed yet. This ttH channel allows directly probing the coupling of the Higgs boson to the top quark. The observation of this process could be a highlight of the ongoing Run 2 data taking.
Unlike to supervised methods (neural networks, decision trees, support vector...
In the horizon of the High Luminosity Large Hadron Collider phase (HL-LHC), each proton bunch crossing will bring up to 200 simultaneous proton collisions. Performing the charged particle trajectory reconstruction in such dense environment will be computationally challenging because of the nature of the traditional algorithms used. The common combinatorial Kalman Filter state-of-the-art...
Charged particle tracks registered in high energy and nuclear physics (HENP) experiments are to be reconstructed on the very important stage of physical analysis named the tracking. It consists in joining into clusters a great number of so-called hits produced on sequential co-ordinate planes of tracking detectors. Each of these clusters joins all hits belonging to the same track, one of many...
SHiP is a new proposed fixed-target experiment at the CERN SPS accelerator. The goal of the experiment is to search for hidden particles predicted by models of Hidden Sectors. Track pattern recognition is an early step of data processing at SHiP. It is used to reconstruct tracks of charged particles from the decay of neutral New Physics objects. Several artificial neural networks and boosting...
Since Run II, future development projects for the Large Hadron Collider will constantly bring nominal luminosity increase, with the ultimate goal of reaching a peak luminosity of $5 · 10^{34} cm^{−2} s^{−1}$ for ATLAS and CMS experiments planned for the High Luminosity LHC (HL-LHC) upgrade. This rise in luminosity will directly result in an increased number of simultaneous proton collisions...
The High-Luminosity LHC will see pileup level reaching 200, which will greatly increase the complexity the tracking component of the event reconstruction.
To reach out to Computer Science specialists, a Tracking Machine Learning challenge (trackML) is being set up on Kaggle for the first 2018 semester by a team of ATLAS, CMS and LHCb physicists tracking experts and Computer Scientists,...
The LHCb experiment will undergo a major upgrade for LHC Run-III, scheduled to
start taking data in 2021. The upgrade of the LHCb detector introduces a
radically new data-taking strategy: the current multi-level event filter will
be replaced by a trigger-less readout system, feeding data into a software
event filter at a rate of 40 MHz.
In particular, a new Vertex Locator (VELO) will be...