The goal of this study is to understand the observed differences in ATLAS software performance, when comparing results measured under ideal laboratory conditions with those from ATLAS computing resources on the Worldwide LHC Computing Grid (WLCG). The laboratory results are based on the full simulation of a single ttbar event and use dedicated, local hardware. In order to have a common and...
Ionization of matters by charged particles are the main mechanism for particle identification in gaseous detectors. Traditionally, the ionization is measured by the total energy loss (dE/dx). The concept of cluster counting, which measures the number of clusters per track length (dN/dx), was proposed in the 1970s. The dN/dx measurement can avoid many sources of fluctuations from the dE/dx...
The challenges expected for the HL-LHC era, both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of re-thinking their computing models at many levels. In fact a big chunk of the R&D efforts of the CMS experiment have been focused on optimizing the computing and storage resource utilization for the data analysis, and Run3 could...
The High Energy Physics world will face challenging trigger requests in the next decade. In particular the luminosity increase to 5-7.5 x 1034 cm-2 s-1 at LHC will push the major experiments as ATLAS to exploit the online tracking for their inner detector to reach 10 kHz of events from 1 MHz of Calorimeter and Muon Spectrometer trigger. The project described here is a proposal for a tuned...
Hydra is an AI system employing off-the-shelf computer vision technologies aimed at autonomously monitoring data quality. Data quality monitoring is an essential step in modern experimentation and Nuclear Physics is no exception. Certain failures can be identified through alarms (e.g. electrical heartbeats) while others are more subtle and often require expert knowledge to identify and...
High energy physics experiments are pushing forward the precision measurements and searching for new physics beyond standard model. It is urgent to simulate and generate mass data to meet requirements from physics. It is one of the most popular areas to make good use of existing power of supercomputers for high energy physics computing. Taking the BESIII experiment as an illustration, we...
AtlFast3 is the next generation of high precision fast simulation in ATLAS that is being deployed by the collaboration and was successfully used for the simulation of 7 billion events in Run 2 data taking conditions. AtlFast3 combines a parametrization-based approach known as FastCaloSimV2 and a machine-learning based tool that exploits Generative Adversarial Networks (FastCaloGAN) for the...
The inner tracking system of the CMS experiment, consisting of the silicon pixel and strip detectors, is designed to provide a precise measurement of the momentum of charged particles and to perform the primary and secondary vertex reconstruction. The movements of the individual substructures of the tracker detectors are driven by the change in the operating conditions during data taking....
Accurate reconstruction of charged particle trajectories and measurement of their parameters (tracking) is one of the major challenges of the CMS experiment. A precise and efficient tracking is one of the critical components of the CMS physics program as it impacts the ability to reconstruct the physics objects needed to understand proton-proton collisions at the LHC. In this work, we present...
Building on top of the multithreading functionality that was introduced in Run-2, the CMS software framework (CMSSW) has been extended in Run-3 to offload part of the physics reconstruction to NVIDIA GPUs. The first application of this new feature is the High Level Trigger (HLT): the new computing farm installed at the beginning of Run-3 is composed of 200 nodes, and for the first time each...
High Energy Physics (HEP) has been using column-wise data stored in synchronized containers, such as most prominently ROOT’s TTree, for decades. These containers have proven to be very powerful as they combine row-wise association capabilities needed by most HEP event processing frameworks (e.g. Athena) with column-wise storage, which typically results in better compression and more efficient...
The Belle II experiment has been collecting data since 2019 at the second generation e+/e- B-factory SuperKEKB in Tsukuba, Japan. The goal of the experiment is to explore new physics via high precision measurement in flavor physics. This is achieved by collecting a large amount of data that needs to be calibrated promptly for fast reconstruction and recalibrated thoroughly for the final...
Computing in high energy physics is one kind of typical data-intensive applications, especially some data analysis , which require access to a large amount of data. The traditional computing system adopts the "computing-storage" separation mode, which leads to large data volume move during the computing process, and and also increase transmission delay and network load. Therefore, it can...
The outstanding performances obtained by the CMS experiment during Run1 and Run2 represent a great achievement of seamless hardware and software integration. Among the different software parts, the CMS offline reconstruction software is essential for translating the data acquired by the detectors into concrete objects that can be easily handled by the analyzers. The CMS offline reconstruction...
The landscape of computing power available for the CMS experiment is rapidly evolving, from a scenario dominated by x86 processors deployed at WLCG sites, towards a more diverse mixture of Grid, HPC, and Cloud facilities incorporating a higher fraction of non-CPU components, such as GPUs. Using these facilities’ heterogeneous resources efficiently to process the vast amounts of data to be...
During ATLAS Run 2, in the online track reconstruction algorithm of the Inner Detector (ID), a large proportion of the CPU time was dedicated to the fast track finding. With the proposed HL-LHC upgrade, where the event pile-up is predicted to reach <μ>=200, track finding will see a further large increase in CPU usage. Moreover, only a small subset of Pixel-only seeds is accepted after the...
The production of simulated datasets for use by physics analyses consumes a large fraction of ATLAS computing resources, a problem that will only get worse as increases in the instantaneous luminosity provided by the LHC lead to more collisions per bunch crossing (pile-up). One of the more resource-intensive steps in the Monte Carlo production is reconstructing the tracks in the ATLAS Inner...
With the scale and complexity of High Energy Physics(HEP) experiments increase, researchers are facing the challenge of large-scale data processing. In terms of storage, HDFS, a distributed file system that supports the "data-centric" processing model, has been widely used in academia and industry. This file system can support Spark and other distributed data localization calculations,...
When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus filters using graph neural networks...
The ATLAS detector at CERN measures proton proton collisions at the Large Hadron Collider (LHC) which allows us to test the limits of the Standard Model (SM) of particles physics. Forward moving electrons produced at these collisions are promising candidates for finding physics beyond the SM. However, the ATLAS detector is not construed to measure forward leptons with pseudorapidity $\eta$ of...
As CMS starts the Run 3 data taking, the experiment’s data management software tools along with the monitoring infrastructure have undergone significant upgrades to cope up with the conditions expected in the coming years. The challenges of an efficient, real-time monitoring for the performance of the computing infrastructure or for data distribution are being met using state-of-the-art...
PARSIFAL (PARametrized SImulation) is a software tool originally implemented to reproduce the complete response of a triple-GEM detector to the passage of a charged particle, taking into account the involved physical processes by their simple parametrization and thus in a very fast way.
Robust and reliable software, such as GARFIELD++, is widely used to simulate the transport of electrons...
The particle-flow (PF) algorithm is of central importance to event reconstruction at the CMS detector, and has been a focus of developments in light of planned Phase-2 running conditions with an increased pileup and detector granularity. Current rule-based implementations rely on extrapolating tracks to the calorimeters, correlating them with calorimeter clusters, subtracting charged energy...
Secrets Management is a process where we manage secrets, like certificates, database credentials, tokens, and API keys in a secure and centralized way. In the present CMSWEB (the portfolio of CMS internal IT services) infrastructure, only the operators maintain all services and cluster secrets in a secure place. However, if all relevant persons with secrets are away, then we are left with no...
The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. CMS Tier-0 tasks, such as data repacking and prompt reconstruction, critical for data-taking operations, are executed on a collection of...
Over the past several years, a deep learning model based on convolutional neural networks has been developed to find proton-proton collision points (also known as primary vertices, or PVs) in Run 3 LHCb data. By converting the three-dimensional space of particle hits and tracks into a one-dimensional kernel density estimator (KDE) along the direction of the beamline and using the KDE as an...
Restarting the LHC again after more than 3 years of shutdown, unprecedented amounts of data are expected to be recorded. Even with the WLCG providing a tremendous amount of compute resources to process this data, local resources will have to be used for additional compute power. This, however, makes the landscape in which computing takes place more heterogeneous.
In this contribution, we...
The INFN-CNAF Tier-1 is engaged for years in a continuous effort to integrate its computing centre with more tipologies of computing resources. In particular, the challenge of providing opportunistic access to nonstandard CPU architectures, such as PowerPC or hardware accelerators (GPUs) has been actively exploited. In this work, we describe a solution to transparently integrate access to...
The goal of this study is to understand the observed differences in ATLAS software performance, when comparing results measured under ideal laboratory conditions with those from ATLAS computing resources on the Worldwide LHC Computing Grid (WLCG). The laboratory results are based on the full simulation of a single ttbar event and use dedicated, local hardware. In order to have a common and...
Ionization of matters by charged particles are the main mechanism for particle identification in gaseous detectors. Traditionally, the ionization is measured by the total energy loss (dE/dx). The concept of cluster counting, which measures the number of clusters per track length (dN/dx), was proposed in the 1970s. The dN/dx measurement can avoid many sources of fluctuations from the dE/dx...
The challenges expected for the HL-LHC era, both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of re-thinking their computing models at many levels. In fact a big chunk of the R&D efforts of the CMS experiment have been focused on optimizing the computing and storage resource utilization for the data analysis, and Run3 could...
The High Energy Physics world will face challenging trigger requests in the next decade. In particular the luminosity increase to 5-7.5 x 1034 cm-2 s-1 at LHC will push the major experiments as ATLAS to exploit the online tracking for their inner detector to reach 10 kHz of events from 1 MHz of Calorimeter and Muon Spectrometer trigger. The project described here is a proposal for a tuned...
High energy physics experiments are pushing forward the precision measurements and searching for new physics beyond standard model. It is urgent to simulate and generate mass data to meet requirements from physics. It is one of the most popular areas to make good use of existing power of supercomputers for high energy physics computing. Taking the BESIII experiment as an illustration, we...
AtlFast3 is the next generation of high precision fast simulation in ATLAS that is being deployed by the collaboration and was successfully used for the simulation of 7 billion events in Run 2 data taking conditions. AtlFast3 combines a parametrization-based approach known as FastCaloSimV2 and a machine-learning based tool that exploits Generative Adversarial Networks (FastCaloGAN) for the...
The inner tracking system of the CMS experiment, consisting of the silicon pixel and strip detectors, is designed to provide a precise measurement of the momentum of charged particles and to perform the primary and secondary vertex reconstruction. The movements of the individual substructures of the tracker detectors are driven by the change in the operating conditions during data taking....
Accurate reconstruction of charged particle trajectories and measurement of their parameters (tracking) is one of the major challenges of the CMS experiment. A precise and efficient tracking is one of the critical components of the CMS physics program as it impacts the ability to reconstruct the physics objects needed to understand proton-proton collisions at the LHC. In this work, we present...
Building on top of the multithreading functionality that was introduced in Run-2, the CMS software framework (CMSSW) has been extended in Run-3 to offload part of the physics reconstruction to NVIDIA GPUs. The first application of this new feature is the High Level Trigger (HLT): the new computing farm installed at the beginning of Run-3 is composed of 200 nodes, and for the first time each...
High Energy Physics (HEP) has been using column-wise data stored in synchronized containers, such as most prominently ROOT’s TTree, for decades. These containers have proven to be very powerful as they combine row-wise association capabilities needed by most HEP event processing frameworks (e.g. Athena) with column-wise storage, which typically results in better compression and more efficient...
The Belle II experiment has been collecting data since 2019 at the second generation e+/e- B-factory SuperKEKB in Tsukuba, Japan. The goal of the experiment is to explore new physics via high precision measurement in flavor physics. This is achieved by collecting a large amount of data that needs to be calibrated promptly for fast reconstruction and recalibrated thoroughly for the final...
Computing in high energy physics is one kind of typical data-intensive applications, especially some data analysis , which require access to a large amount of data. The traditional computing system adopts the "computing-storage" separation mode, which leads to large data volume move during the computing process, and and also increase transmission delay and network load. Therefore, it can...
The outstanding performances obtained by the CMS experiment during Run1 and Run2 represent a great achievement of seamless hardware and software integration. Among the different software parts, the CMS offline reconstruction software is essential for translating the data acquired by the detectors into concrete objects that can be easily handled by the analyzers. The CMS offline reconstruction...
The landscape of computing power available for the CMS experiment is rapidly evolving, from a scenario dominated by x86 processors deployed at WLCG sites, towards a more diverse mixture of Grid, HPC, and Cloud facilities incorporating a higher fraction of non-CPU components, such as GPUs. Using these facilities’ heterogeneous resources efficiently to process the vast amounts of data to be...
During ATLAS Run 2, in the online track reconstruction algorithm of the Inner Detector (ID), a large proportion of the CPU time was dedicated to the fast track finding. With the proposed HL-LHC upgrade, where the event pile-up is predicted to reach <μ>=200, track finding will see a further large increase in CPU usage. Moreover, only a small subset of Pixel-only seeds is accepted after the...
The production of simulated datasets for use by physics analyses consumes a large fraction of ATLAS computing resources, a problem that will only get worse as increases in the instantaneous luminosity provided by the LHC lead to more collisions per bunch crossing (pile-up). One of the more resource-intensive steps in the Monte Carlo production is reconstructing the tracks in the ATLAS Inner...
The Phase-II upgrade of the LHC will increase its instantaneous luminosity by a factor of 7 leading to the High Luminosity LHC (HL-LHC). At the HL-LHC, the number of proton-proton collisions in one bunch crossing (called pileup) increases significantly, putting more stringent requirements on the LHC detectors electronics and real-time data processing capabilities.
The ATLAS Liquid Argon...
The ATLAS detector at CERN measures proton proton collisions at the Large Hadron Collider (LHC) which allows us to test the limits of the Standard Model (SM) of particles physics. Forward moving electrons produced at these collisions are promising candidates for finding physics beyond the SM. However, the ATLAS detector is not construed to measure forward leptons with pseudorapidity $\eta$ of...
As CMS starts the Run 3 data taking, the experiment’s data management software tools along with the monitoring infrastructure have undergone significant upgrades to cope up with the conditions expected in the coming years. The challenges of an efficient, real-time monitoring for the performance of the computing infrastructure or for data distribution are being met using state-of-the-art...
PARSIFAL (PARametrized SImulation) is a software tool originally implemented to reproduce the complete response of a triple-GEM detector to the passage of a charged particle, taking into account the involved physical processes by their simple parametrization and thus in a very fast way.
Robust and reliable software, such as GARFIELD++, is widely used to simulate the transport of electrons...
The particle-flow (PF) algorithm is of central importance to event reconstruction at the CMS detector, and has been a focus of developments in light of planned Phase-2 running conditions with an increased pileup and detector granularity. Current rule-based implementations rely on extrapolating tracks to the calorimeters, correlating them with calorimeter clusters, subtracting charged energy...
Secrets Management is a process where we manage secrets, like certificates, database credentials, tokens, and API keys in a secure and centralized way. In the present CMSWEB (the portfolio of CMS internal IT services) infrastructure, only the operators maintain all services and cluster secrets in a secure place. However, if all relevant persons with secrets are away, then we are left with no...
The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. CMS Tier-0 tasks, such as data repacking and prompt reconstruction, critical for data-taking operations, are executed on a collection of...
Over the past several years, a deep learning model based on convolutional neural networks has been developed to find proton-proton collision points (also known as primary vertices, or PVs) in Run 3 LHCb data. By converting the three-dimensional space of particle hits and tracks into a one-dimensional kernel density estimator (KDE) along the direction of the beamline and using the KDE as an...
Restarting the LHC again after more than 3 years of shutdown, unprecedented amounts of data are expected to be recorded. Even with the WLCG providing a tremendous amount of compute resources to process this data, local resources will have to be used for additional compute power. This, however, makes the landscape in which computing takes place more heterogeneous.
In this contribution, we...
The INFN-CNAF Tier-1 is engaged for years in a continuous effort to integrate its computing centre with more tipologies of computing resources. In particular, the challenge of providing opportunistic access to nonstandard CPU architectures, such as PowerPC or hardware accelerators (GPUs) has been actively exploited. In this work, we describe a solution to transparently integrate access to...
Over the past few years, intriguing deviations from the Standard Model predictions have been reported in measurements of angular observables and branching fractions of $B$ meson decays, suggesting the existence of a new interaction that acts differently on the three lepton families. The Belle II experiment has unique features that allow to study $B$ meson decays with invisible particles in the...
Detector modeling and visualization are essential in the life cycle of a High Energy Physics (HEP) experiment. Unity is a professional multi-media creation software that has the advantages of rich visualization effects and easy deployment on various platforms. In this work, we applied the method of detector transformation to convert the BESIII detector description from the offline software...
Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis.
In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame...
We present a revived version of the CERNLIB, the basis for software
ecosystems of most of the pre-LHC HEP experiments. The efforts to
consolidate the CERNLIB are part of the activities of the Data Preservation
for High Energy Physics collaboration to preserve data and software of
the past HEP experiments.
The presented version is based on the CERNLIB version 2006 with numerous...
Identifying and locating proton-proton collisions in LHC experiments (known as primary vertices or PVs) has been the topic of numerous conference talks in the past few years (2019-2021). Efforts to search for a variety of potential architectures have yielded potential candidates for PV-finder. The UNet model, for example, has achieved an efficiency of 98% with a low false-positive rate. These...
The FairRoot software stack is a toolset for the simulation, reconstruction, and analysis of high energy particle physics experiments (currently used i.e. at FAIR/GSI, and CERN). In this work we give insight into recent improvements of Continuous Integration (CI) for this software stack. CI is a modern software engineering method to efficiently assure software quality. We discuss relevant...
After a successful adoption of Rucio following its inception in 2018 as the new data management system, a subsequent step is to advertise this to the users among other stakeholders. In this perspective, one of the objectives is to keep improving the tooling around Rucio. As Rucio introduces a new data management paradigm w.r.t the previous model, we begin by tackling the challenges arising...
In High Energy Physics (HEP) experiment, Data Quality Monitoring (DQM) system is crucial to ensure the correct and smooth operation of the experimental apparatus during the data taking. DQM at Jiangmen Underground Neutrino Observatory (JUNO) will reconstruct raw data directly from JUNO Data Acquisition (DAQ) system and use event visualization tools to show the detector performance for high...
The common ALICE-FAIR software framework ALFA offers a platform for simulation, reconstruction and analysis of particle physics experiments. FairMQ is a module of ALFA that provides building blocks for distributed data processing pipelines, composed out of components communicating via message passing. FairMQ integrates and efficiently utilizes standard industry data transport technologies,...
We evaluate two Generative Adversarial Network (GAN) models developed by the COherent Muon to Electron Transition (COMET) collaboration to generate sequences of particle hits in a Cylindrical Drift Chamber (CDC). The models are first evaluated by measuring the similarity between distributions of particle-level, physical features. We then measure the Effectively Unbiased Fréchet Inception...
The [Mu2e][1] experiment will search for the CLFV neutrinoless coherent conversion of muon to electron, in the field of an Aluminium nucleus. A custom offline event display has been developed for Mu2e using [TEve][2], a ROOT based 3-D event visualisation framework. Event displays are crucial for monitoring and debugging during live data taking as well as for public outreach. A custom GUI...
The CMS software framework (CMSSW) has been recently extended to perform part of the physics reconstruction with NVIDIA GPUs. To avoid writing a different implementations of the code for each back-end the decision was to use a performance portability library and so Alpaka has been chosen as the solution for Run-3.
In the meantime different studies have been performed to test the track...
The CMS collaboration has a growing interest in the use of heterogeneous computing and accelerators to reduce the costs and improve the efficiency of the online and offline data processing: online, the High Level Trigger is fully equipped with NVIDIA GPUs; offline, a growing fraction of the computing power is coming from GPU-equipped HPC centres. One of the topics where accelerators could be...
Description of development of cascades of particles in a calorimeter of a high energy physics experiment relies on precise simulation of particle interactions with matter. It is inherently slow and constitutes a challenge for HEP experiments. Furthermore, with the upcoming high luminosity upgrade of the Large Hadron Collider and a much increased data production rate, the amount of required...
GPU applications require a structure of array (SoA) layout for the data to achieve good memory access performance. During the development of the CMS Pixel reconstruction for GPUs, the Patatrack developers crafted various techniques to optimise the data placement in memory and its access inside GPU kernels. The work presented here gathers, automates and extends those patterns, and offers a...
In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst’s perspective, obtaining highest possible performance is desirable, but recently, some focus has been laid on studying robustness of models to investigate...
In this study, jets with up to 30 particles are modelled using Normalizing Flows with Rational Quadratic Spline coupling layers. The invariant mass of the jet is a powerful global feature to control whether the flow-generated data contains the same high-level correlations as the training data. The use of normalizing flows without conditioning shows that they lack the expressive power to do...
The CMS experiment employs an extensive data quality monitoring (DQM) and data certification (DC) procedure. Currently, this approach consists mainly of the visual inspection of reference histograms which summarize the status and performance of the detector. Recent developments in several of the CMS subsystems have shown the potential of computer-assisted DQM and DC using autoencoders,...
Jiangmen Underground Neutrino Observatory (JUNO), located at the southern part of China, will be the world’s largest liquid scintillator(LS) detector. Equipped with 20 kton LS, 17623 20-inch PMTs and 25600 3-inch PMTs in the central detector, JUNO will provide a unique apparatus to probe the mysteries of neutrinos, particularly the neutrino mass ordering puzzle. One of the challenges for JUNO...
The Particle Flow (PF) algorithm, used for a majority of CMS data analyses for event reconstruction, provides a comprehensive list of final-state state particle candidates and enables efficient identification and mitigation methods for simultaneous proton-proton collisions (pileup). The higher instantaneous luminosity expected during the upcoming LHC Run 3 will impose challenges for CMS event...
Density Functional Theory (DFT) is an extended ab initio method used for calculating the electronic properties of molecules. Considering Hartree Fock methods, the DFT offers appropriate approximations regarding the time calculations. Recently, the DFT method has been used for discovering and analyzing protein interactions by means of calculating the free energies of these macro-molecules from...
Computing resources in the Worldwide LHC Computing Grid (WLCG) have been based entirely on the x86 architecture for more than two decades. In the near future, however, heterogeneous non-x86 resources, such as ARM, POWER and Risc-V, will become a substantial fraction of the resources that will be provided to the LHC experiments, due to their presence in existing and planned world-class HPC...
The Phase-2 upgrade of CMS, coupled with the projected performance of the HL-LHC, shows great promise in terms of discovery potential. However, the increased granularity of the CMS detector and the higher complexity of the collision events generated by the accelerator pose challenges in the areas of data acquisition, processing, simulation, and analysis. These challenges cannot be solved...
The CMS Level-1 Trigger, for its operation during Phase-2 of LHC, will undergo a significant upgrade and redesign. The new trigger system, based on multiple families of custom boards, equipped with Xilinx Ultrascale Plus FPGAs and interconnected with high speed optical links at 25 Gb/s, will exploit more detailed information from the detector subsystems (calorimeter, muon systems, tracker). In...
With the start of run 3 in 2022, the LHC has entered a new period, now delivering higher energy and luminosity proton beams to the Compact Muon Solenoid (CMS) experiment. These increases make it critical to maintain and upgrade the tools and methods used to monitor the rate at which data is collected (the trigger rate). Software tools have been developed to allow for automated rate monitoring,...
Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program.
The low-level abstraction of memory access (LLAMA) is a C++ library that provides a...
We present a machine-learning based method to detect deviations from a reference model, in an almost independent way with respect to the theory assumed to describe the new physics responsible for the discrepancies.
The analysis is based on an Effective Field Theory (EFT) approach: under this hypothesis the Lagrangian of the system can be written as an infinite expansion of terms, where the...
The Belle II experiment at the second generation e+/e- B-factory SuperKEKB has been collecting data since 2019 and aims to accumulate 50 times more data than the first generation experiment, Belle.
To efficiently process these steadily growing datasets of recorded and
simulated data that end up on the order of 100 PB and to support
Grid-based analysis workflows using the DIRAC Workload...
Over the past few years, intriguing deviations from the Standard Model predictions have been reported in measurements of angular observables and branching fractions of $B$ meson decays, suggesting the existence of a new interaction that acts differently on the three lepton families. The Belle II experiment has unique features that allow to study $B$ meson decays with invisible particles in the...
Detector modeling and visualization are essential in the life cycle of a High Energy Physics (HEP) experiment. Unity is a professional multi-media creation software that has the advantages of rich visualization effects and easy deployment on various platforms. In this work, we applied the method of detector transformation to convert the BESIII detector description from the offline software...
RooFit is a toolkit for statistical modeling and fitting used by most experiments in particle physics. Just as data sets from next-generation experiments grow, processing requirements for physics analysis become more computationally demanding, necessitating performance optimizations for RooFit. One possibility to speed-up minimization and add stability is the use of automatic differentiation...
Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis.
In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame...
We present a revived version of the CERNLIB, the basis for software
ecosystems of most of the pre-LHC HEP experiments. The efforts to
consolidate the CERNLIB are part of the activities of the Data Preservation
for High Energy Physics collaboration to preserve data and software of
the past HEP experiments.
The presented version is based on the CERNLIB version 2006 with numerous...
Identifying and locating proton-proton collisions in LHC experiments (known as primary vertices or PVs) has been the topic of numerous conference talks in the past few years (2019-2021). Efforts to search for a variety of potential architectures have yielded potential candidates for PV-finder. The UNet model, for example, has achieved an efficiency of 98% with a low false-positive rate. These...
The FairRoot software stack is a toolset for the simulation, reconstruction, and analysis of high energy particle physics experiments (currently used i.e. at FAIR/GSI, and CERN). In this work we give insight into recent improvements of Continuous Integration (CI) for this software stack. CI is a modern software engineering method to efficiently assure software quality. We discuss relevant...
After a successful adoption of Rucio following its inception in 2018 as the new data management system, a subsequent step is to advertise this to the users among other stakeholders. In this perspective, one of the objectives is to keep improving the tooling around Rucio. As Rucio introduces a new data management paradigm w.r.t the previous model, we begin by tackling the challenges arising...
The common ALICE-FAIR software framework ALFA offers a platform for simulation, reconstruction and analysis of particle physics experiments. FairMQ is a module of ALFA that provides building blocks for distributed data processing pipelines, composed out of components communicating via message passing. FairMQ integrates and efficiently utilizes standard industry data transport technologies,...
We evaluate two Generative Adversarial Network (GAN) models developed by the COherent Muon to Electron Transition (COMET) collaboration to generate sequences of particle hits in a Cylindrical Drift Chamber (CDC). The models are first evaluated by measuring the similarity between distributions of particle-level, physical features. We then measure the Effectively Unbiased Fréchet Inception...
The [Mu2e][1] experiment will search for the CLFV neutrinoless coherent conversion of muon to electron, in the field of an Aluminium nucleus. A custom offline event display has been developed for Mu2e using [TEve][2], a ROOT based 3-D event visualisation framework. Event displays are crucial for monitoring and debugging during live data taking as well as for public outreach. A custom GUI...
The CMS software framework (CMSSW) has been recently extended to perform part of the physics reconstruction with NVIDIA GPUs. To avoid writing a different implementations of the code for each back-end the decision was to use a performance portability library and so Alpaka has been chosen as the solution for Run-3.
In the meantime different studies have been performed to test the track...
The CMS collaboration has a growing interest in the use of heterogeneous computing and accelerators to reduce the costs and improve the efficiency of the online and offline data processing: online, the High Level Trigger is fully equipped with NVIDIA GPUs; offline, a growing fraction of the computing power is coming from GPU-equipped HPC centres. One of the topics where accelerators could be...
Description of development of cascades of particles in a calorimeter of a high energy physics experiment relies on precise simulation of particle interactions with matter. It is inherently slow and constitutes a challenge for HEP experiments. Furthermore, with the upcoming high luminosity upgrade of the Large Hadron Collider and a much increased data production rate, the amount of required...
In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst’s perspective, obtaining highest possible performance is desirable, but recently, some focus has been laid on studying robustness of models to investigate...
In this study, jets with up to 30 particles are modelled using Normalizing Flows with Rational Quadratic Spline coupling layers. The invariant mass of the jet is a powerful global feature to control whether the flow-generated data contains the same high-level correlations as the training data. The use of normalizing flows without conditioning shows that they lack the expressive power to do...
The CMS experiment employs an extensive data quality monitoring (DQM) and data certification (DC) procedure. Currently, this approach consists mainly of the visual inspection of reference histograms which summarize the status and performance of the detector. Recent developments in several of the CMS subsystems have shown the potential of computer-assisted DQM and DC using autoencoders,...
Jiangmen Underground Neutrino Observatory (JUNO), located at the southern part of China, will be the world’s largest liquid scintillator(LS) detector. Equipped with 20 kton LS, 17623 20-inch PMTs and 25600 3-inch PMTs in the central detector, JUNO will provide a unique apparatus to probe the mysteries of neutrinos, particularly the neutrino mass ordering puzzle. One of the challenges for JUNO...
The Particle Flow (PF) algorithm, used for a majority of CMS data analyses for event reconstruction, provides a comprehensive list of final-state state particle candidates and enables efficient identification and mitigation methods for simultaneous proton-proton collisions (pileup). The higher instantaneous luminosity expected during the upcoming LHC Run 3 will impose challenges for CMS event...
Density Functional Theory (DFT) is an extended ab initio method used for calculating the electronic properties of molecules. Considering Hartree Fock methods, the DFT offers appropriate approximations regarding the time calculations. Recently, the DFT method has been used for discovering and analyzing protein interactions by means of calculating the free energies of these macro-molecules from...
Computing resources in the Worldwide LHC Computing Grid (WLCG) have been based entirely on the x86 architecture for more than two decades. In the near future, however, heterogeneous non-x86 resources, such as ARM, POWER and Risc-V, will become a substantial fraction of the resources that will be provided to the LHC experiments, due to their presence in existing and planned world-class HPC...
The CMS Level-1 Trigger, for its operation during Phase-2 of LHC, will undergo a significant upgrade and redesign. The new trigger system, based on multiple families of custom boards, equipped with Xilinx Ultrascale Plus FPGAs and interconnected with high speed optical links at 25 Gb/s, will exploit more detailed information from the detector subsystems (calorimeter, muon systems, tracker). In...
With the start of run 3 in 2022, the LHC has entered a new period, now delivering higher energy and luminosity proton beams to the Compact Muon Solenoid (CMS) experiment. These increases make it critical to maintain and upgrade the tools and methods used to monitor the rate at which data is collected (the trigger rate). Software tools have been developed to allow for automated rate monitoring,...
Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program.
The low-level abstraction of memory access (LLAMA) is a C++ library that provides a...
The Belle II experiment at the second generation e+/e- B-factory SuperKEKB has been collecting data since 2019 and aims to accumulate 50 times more data than the first generation experiment, Belle.
To efficiently process these steadily growing datasets of recorded and
simulated data that end up on the order of 100 PB and to support
Grid-based analysis workflows using the DIRAC Workload...
The high-performance fourth-generation synchrotron radiation light source, e.g., the High Energy Photon Source (HEPS) has been proposed and built successively. The advent of beamlines at fourth-generation synchrotron sources and the advanced detector has made significant progress that push the demand for computing resource at the edge of current workstation capabilities. On the other hand, the...
ROOT TTree has been widely used in the analysis and storage of various high-energy physical experiment data. The event data generated by the experiment is stored in TTree's bunch and further compressed and archived into a standard ROOT format file. At present, ROOT supports the compression storage of TBasket, the buffer of TBranch, using compression algorithms such as zlib, lzma, lz4, zstd,...
Modern high energy physics experiments and similar compute intensive fields are pushing the limits of dedicated grid and cloud infrastructure. In the past years research into augmenting this dedicated infrastructure by integrating opportunistic resources, i.e. compute resources temporarily acquired from third party resource providers, has yielded various strategies to approach this challenge....
A precise measurement of the polarizability of the charged pion provides an important experimental test of our understanding of low-energy QCD. The goal of the Charged Pion Polarizability (CPP) experiment in Hall D at JLab, currently underway, is to make a precision measurement of this quantity through a high statistics study of the γγ → π+π− reaction near 2π threshold. The production of...
The reconstruction of particle trajectories is a key challenge of particle physics experiments as it directly impacts particle reconstruction and physics performances. To reconstruct these trajectories, different reconstruction algorithms are used sequentially. Each of these algorithms use many configuration parameters that need to be fine-tuned to properly account for the...
One way to improve the position and energy resolution in neutrino experiments, is to give parameters with high resolution to the reconstruction method. These parameters, the photon electron(PE) hit time and the expectation of PE count, can be analyzed from the waveforms. We developed a new waveform analysis method called Fast Scholastic Matching Pursuit(FSMP). It is based on Bayesian...
Track reconstruction (or tracking) plays an essential role in the offline data processing of collider experiments. For the BESIII detector working in the tau-charm energy region, plenty of efforts were made previously to improve the tracking performance with traditional methods, such as pattern recognition and Hough transform etc. However, for challenging tasks, such as the tracking of low...
In particle physics, precise simulations are necessary to enable scientific progress. However, accurate simulations of the interaction processes in calorimeters are complex and computationally very expensive, demanding a large fraction of the available computing resources in particle physics at present. Various generative models have been proposed to reduce this computational cost. Usually,...
The Xrootd protocol is used by CMS experiment of LHC to access, transfer, and store data within Worldwide LHC Computing Grid (WLCG) sites running different kinds of jobs on their compute nodes. Its redirector system allows some execution tasks to run by accessing input data that is stored on any WLCG site. In 2029 the Large Hadron Collider (LHC) will start the High-Luminosity LHC (HL-LHC)...
The Jiangmen Underground Neutrino Observatory (JUNO) has a very rich physics program which primarily aims to the determination of the neutrino mass ordering and to the precisely measurement of oscillation parameters. It is under construction in South China at a depth of about 700~m underground. As data taking will start in 2023, a complete data processing chain is developed before the data...
Awkward Array is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms. Auto-differentiation (also known as “autograd” and “autodiff”) is a technique for computing the derivative of a function defined by an algorithm, which requires the derivative of all operations used in that algorithm to be known.
The...
A broad range of particle physics data can be naturally represented as graphs. As a result, Graph Neural Networks (GNNs) have gained prominence in HEP and have increasingly been adopted for a wide array of particle physics tasks, including particle track reconstruction. Most problems in physics involve data that have some underlying compatibility with symmetries. These problems may either...
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced...
The simplest and often most effective way of parallelizing the training of complex Machine Learning models is to execute several training instances on multiple machines, possibly scanning the hyperparameter space to optimize the underlying statistical model and the learning procedure.
Often, such a meta learning procedure is limited by the ability of accessing securely a common database...
In the European Center of Excellence in Exascale Computing "Research on AI- and Simulation-Based Engineering at Exascale" (CoE RAISE), researchers from science and industry develop novel, scalable Artificial Intelligence technologies towards Exascale. In this work, we leverage European High performance Computing (HPC) resources to perform large-scale hyperparameter optimization (HPO),...
The Jiangmen Underground Neutrino Observatory (JUNO) is under construction in South China and will start data taking in 2023. It has a central detector with a 20-kt liquid scintillator, equipped with 17,612 20-inch PMTs (photo-multiplier tubes) and 25,600 3-inch PMTs. The requirement on energy resolution of 3\%@1MeV makes the offline data processing challenging, so several machine learning...
CLUE is a fast and innovative density-based clustering algorithm to group digitized energy deposits (hits) left by a particle traversing the active sensors of a high-granularity calorimeter in clusters with a well-defined seed hit. Outliers, i.e. hits which do not belong to any clusters, are also identified. Its outstanding performance has been proven in the context of the CMS Phase-2 upgrade...
About 90% of the computing resources available to the LHCb experiment has been spent to produce simulated data samples for Run 2 of the Large Hadron Collider. The upgraded LHCb detector will operate at much-increased luminosity, requiring many more simulated events for the Run 3. Simulation is a key necessity of analysis to interpret data in terms of signal and background and estimate relevant...
The Jiangmen Underground Neutrino Observatory (JUNO) is under construction in South China at a depth of about 700~m underground: the data taking is expected to start in late 2023. JUNO has a very rich physics program which primarily aims to the determination of the neutrino mass ordering and to the precisely measurement of oscillation parameters.
The JUNO average raw data volume is expected...
The podio event data model (EDM) toolkit provides an easy way to generate a performant implementation of an EDM from a high level description in yaml format. We present the most recent developments in podio, most importantly the inclusion of a schema evolution mechanism for generated EDMs as well as the "Frame", a thread safe, generalized event data container. For the former we discuss some of...
Machine Learning (ML) applications, which have become quite common tools for many High Energy Physics (HEP) analyses, benefit significantly from GPU resources. GPU clusters are important to fulfill the rapidly increasing demand for GPU resources in HEP. Therefore, the Karlsruhe Institute of Technology (KIT) provides a GPU cluster for HEP accessible from the physics institute via its batch...
The reconstruction of electrons and photons in CMS depends on topological clustering of the energy deposited by an incident particle in different crystals of the electromagnetic calorimeter (ECAL). These clusters are formed by aggregating neighbouring crystals according to the expected topology of an electromagnetic shower in the ECAL. The presence of upstream material (beampipe, tracker and...
The future development projects for the Large Hadron Collider will constantly bring nominal luminosity increase, with the ultimate goal of reaching a peak luminosity of $5 \times 10^{34} cm^{−2} s^{−1}$. This would result in up to 200 simultaneous proton collisions (pileup), posing significant challenges for the CMS detector reconstruction.
The CMS primary vertex (PV) reconstruction is a...
The pyrate framework provides a dynamic, versatile, and memory-efficient approach to data format transformations, object reconstruction and data analysis in particle physics. Developed within the context of the SABRE experiment for dark matter direct detection, pyrate relies on a blackboard design pattern where algorithms are dynamically evaluated throughout a run and scheduled by a central...
The LHCb detector at the LHC is a general purpose detector in the forward region with a focus on studying decays of c- and b-hadrons. For Run 3 of the LHC (data taking from 2022), LHCb will take data at an instantaneous luminosity of 2 × 10^{33} cm−2 s−1, five times higher than in Run 2 (2015-2018). To cope with the harsher data taking conditions, LHCb will deploy a purely software based...
Quantum Computing and Machine Learning are both significant and appealing research fields. In particular, the combination of both has led to the emergence of the research field of quantum machine learning which has recently taken enormous popularity. We investigate in the potential advantages of this synergy for the application in high energy physics, more precisely in the reconstruction of...
One of the most challenging computational problems in the Run 3 of the Large Hadron Collider (LHC) and more so in the High-Luminosity LHC (HL-LHC) is expected to be finding and fitting charged-particle tracks during event reconstruction. The methods used so far at the LHC and in particular at the CMS experiment are based on the Kalman filter technique. Such methods have shown to be robust and...
The Key4hep project aims to provide a turnkey software solution for the full experiment life-cycle, based on established community tools. Several future collider communities (CEPC, CLIC, EIC, FCC, and ILC) have joined to develop and adapt their workflows to use the common data model EDM4hep and common framework. Besides sharing of existing experiment workflows, one focus of the Key4hep project...
LUXE (Laser Und XFEL Experiment) is a proposed experiment at DESY using the electron beam of the European XFEL and a high-intensity laser. LUXE will study Quantum Electrodynamics (QED) in the strong-field regime, where QED becomes non-perturbative. One of the key measurements is the positron rate from electron-positron pair creation, which is enabled by the use of a silicon tracking detector....
The Belle II experiment has been taking data at the SuperKEKB collider since 2018. Particle identification is a key component of the reconstruction, and several detector upgrades from Belle to Belle II were designed to maintain performance with the higher background rates.
We present a method for a data-driven calibration that improves the overall particle identification performance and is...
The size, complexity, and duration of telescope surveys are growing beyond the capacity of traditional methods for scheduling observations. Scheduling algorithms must have the capacity to balance multiple (often competing) observational and scientific goals, address both short-term and long-term considerations, and adapt to rapidly changing stochastic elements (e.g., weather). Reinforcement...
In this work we present the adaptation of the popular clustering algorithm DBSCAN to reconstruct the primary vertex (PV) at the hardware trigger level in collisions at the High-Luminosity LHC. Nominally, PV reconstruction is performed by a simple histogram-based algorithm. The main challenge in PV reconstruction is that the particle tracks need to be processed in a low-latency environment...
Template Bayesian inference via Automatic Differentiation in JuliaLang
Binned template-fitting is one of the most important tools in the High-Energy physics (HEP) statistics toolbox. Statistical models based on combinations of histograms are often the last step in a HEP physics analysis. Both model and data can be represented in a standardized format - HistFactory (C++/XML) and more...
The usage of Deep Neural Networks (DNNs) as multi-classifiers is widespread in modern HEP analyses. In standard categorisation methods, the high-dimensional output of the DNN is often reduced to a one-dimensional distribution by exclusively passing the information about the highest class score to the statistical inference method. Correlations to other classes are hereby omitted.
Moreover, in...
To support the needs of novel collider analyses such as long-lived particle searches, considerable computing resources are spent forward-copying data products from low-level data tiers like CMS AOD and MiniAOD to reduced data formats for end-user analysis tasks. In the HL-LHC era, it will be increasingly difficult to ensure online access to low-level data formats. In this talk, we present a...
The large statistical fluctuations in the ionization energy loss high energy physics process by charged particles in gaseous detectors implies that many measurements are needed along the particle track to get a precise mean, and this represent a limit to the particle separation capabilities that should be overcome in the design of future colliders. The cluster counting technique (dN/dx)...
Due to the massive nature of HEP data, performance has always been a factor in its analysis and processing. Languages like C++ would be fast enough but are often challenging to grasp for beginners, and can be difficult to iterate quickly in an interactive environment . On the other hand, the ease of writing code and extensive library ecosystem make Python an enticing choice for data analysis....
In the past years the CMS software framework (CMSSW) has been extended to offload part of the physics reconstruction to NVIDIA GPUs. This can achieve a higher computational efficiency, but it adds extra complexity to the design of dedicated data centres and the use of opportunistic resources, like HPC centres. A possible solution to increase the flexibility of heterogeneous clusters is to...
HEPD-02 is a new, upgraded version of the High Energy Particle Detector as part of a suite of instruments for the second mission of the China Seismo-Electromagnetic Satellite (CSES-02) to be launched in 2023. Designed and realized by the Italian Collaboration LIMADOU of the CSES program, it is optimized to identify fluxes of charged particles (mostly electrons and protons) and determine their...
In real-time computing facilities - system, network, and security monitoring are core components to run efficiently and effectively. As there are many diverse functions that can go awry, such as load, network, processes, and power issues, having a well-functioning monitoring system is imperative. In many facilities you will see the standard set of tools such as Ganglia, Grafana, Nagios, etc....
We hold these truths to be self-evident: that all physics problems are created unequal, that they are endowed with their unique data structures and symmetries, that among these are tensor transformation laws, Lorentz symmetry, and permutation equivariance. A lot of attention has been paid to the applications of common machine learning methods in physics experiments and theory. However, much...
The ever growing increase of computing power necessary for the storage and data analysis of the high-energy physics experiments at CERN requires performance optimization of the existing and planned IT resources.
One of the main computing capacity consumers in the HEP software workflow is the data analysis. To optimize the resource usage, the concept of Analysis Facility (AF) for Run 3 has...
Particle physics experiments spend large amounts of computational effort on Monte Carlo simulations. Due to the computational expense of simulations, they are often executed and stored in large distributed computing clusters. To lessen the computational cost, physicists have introduced alternatives to speed up the simulation. Generative Adversarial Networks (GANs) are an excellent...
The unprecedented volume of data and Monte Carlo simulations at the HL-LHC will pose increasing challenges for data analysis both in terms of computing resource requirements as well as "time to insight". Precision measurements with present LHC data already face many of these challenges today. We will discuss performance scaling and optimization of RDataFrame for complex physics analyses,...
Neutrino experiments that use liquid argon time projection chamber (LArTPC) detectors are growing bigger and expect to see more neutrinos with next generation beams, and therefore will require more computing resources to reach their physics goals of measuring CP violation in the neutrino sector and exploring anomalies. These resources can be used to their full capacity by incorporating...
Ultra-low mass and high granularity Drift Chambers fulfill the requirements for tracking systems of modern High Energy Physics experiments at the future high luminosity facilities (FCC-ee or CEPC).
\indent We present how, in Helium based gas mixtures, by measuring the arrival times of each individual ionization cluster and by using proper statistical tools, it is possible to perform a bias...
RooFit is a toolkit for statistical modeling and fitting, and together with RooStats it is used for measurements and statistical tests by most experiments in particle physics, particularly the LHC experiments. As the LHC program progresses, physics analyses become more computationally demanding. Therefore, recent RooFit developments were focused on performance optimization, in particular to...
There are established classical methods to reconstruct particle tracks from recorded hits on the particle detectors. Current algorithms do this either by cut in some features, like recorded time of the hits, or by the fitting process. This is potentially error prone and resource consuming. For high noise events, these issues are more critical and this method might even fail. We have been...
The alpaka library is a header-only C++17 abstraction library for development across hardware accelerators (CPUs, GPUs, FPGAs). Its aim is to provide performance portability across accelerators through the abstraction (not hiding!) of the underlying levels of parallelism. In this talk we will show the concepts behind alpaka, how it is mapped to the various underlying hardware models, and show...
In recent years, new technologies and new approaches have been developed in academia and industry to face the necessity to both handle and easily visualize huge amounts of data, the so-called “big data”. The increasing volume and complexity of HEP data challenge the HEP community to develop simpler and yet powerful interfaces based on parallel computing on heterogeneous platforms. Good...
Monte Carlo simulation is a vital tool for all physics programmes of particle physics experiments. Their accuracy and reliability in reproducing detector response is of the utmost importance. For the LHCb experiment, which is embarking on a new data-take era with an upgraded detector, a full suite of verifications has been put in place for its simulation software to ensure the quality of the...
We developed supervised and unsupervised quantum machine learning models for anomaly detection tasks at the Large Hadron Collider at CERN. Current Noisy Intermediate Scale Quantum (NISQ) devices have a limited number of qubits and qubit coherence. We designed dimensionality reduction models based on Autoencoders to accommodate the constraints dictated by the quantum hardware. Different designs...
Compared to LHC Run 1 and Run 2, future HEP experiments, e.g. at the HL-LHC, will increase the volume of generated data by an order of magnitude. In order to sustain the expected analysis throughput, ROOT's RNTuple I/O subsystem has been engineered to overcome the bottlenecks of the TTree I/O subsystem, focusing also on a compact data format, asynchronous and parallel requests, and a layered...
Through its TMVA package, ROOT provides and connects to machine learning tools for data analysis at HEP experiments and beyond. In addition, ROOT provides through its powerful I/O system and RDataFrame analysis tools the capability to efficiently select and query input data from large data sets as typically used in HEP analysis. At the same time, several existing Machine Learning tools exist...
MadMiner is a python module that implements a powerful family of multivariate inference techniques that leverage both matrix element information and machine learning.
This multivariate approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the under-lying physics or detector response.
In this paper, we address some of the...
The feature complexity of data recorded by particle detectors combined with the availability of large simulated datasets presents a unique environment for applying state-of-the-art machine learning (ML) architectures to physics problems. We present the Simplified Cylindrical Detector (SCD): a fully configurable GEANT4 calorimeter simulation which mimics the granularity and response...
To sustain the harsher conditions of the high-luminosity LHC, the CMS Collaboration is designing a novel endcap calorimeter system. The new calorimeter will predominantly use silicon sensors to achieve sufficient radiation tolerance and will maintain highly granular information in the readout to help mitigate the effects of the pile up. In regions characterized by lower radiation levels, small...
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent...
The classification of HEP events, or separating signal events from the background, is one of the most important analysis tasks in High Energy Physics (HEP), and a foundational task in the search for new phenomena. Complex deep learning-based models have been fundamental for achieving accurate and outstanding performance in this classification task. However, the quantification of the...
The size, complexity, and duration of telescope surveys are growing beyond the capacity of traditional methods for scheduling observations. Scheduling algorithms must have the capacity to balance multiple (often competing) observational and scientific goals, address both short-term and long-term considerations, and adapt to rapidly changing stochastic elements (e.g., weather). Reinforcement...
Template Bayesian inference via Automatic Differentiation in JuliaLang
Binned template-fitting is one of the most important tools in the High-Energy physics (HEP) statistics toolbox. Statistical models based on combinations of histograms are often the last step in a HEP physics analysis. Both model and data can be represented in a standardized format - HistFactory (C++/XML) and more...
The usage of Deep Neural Networks (DNNs) as multi-classifiers is widespread in modern HEP analyses. In standard categorisation methods, the high-dimensional output of the DNN is often reduced to a one-dimensional distribution by exclusively passing the information about the highest class score to the statistical inference method. Correlations to other classes are hereby omitted.
Moreover, in...
To support the needs of novel collider analyses such as long-lived particle searches, considerable computing resources are spent forward-copying data products from low-level data tiers like CMS AOD and MiniAOD to reduced data formats for end-user analysis tasks. In the HL-LHC era, it will be increasingly difficult to ensure online access to low-level data formats. In this talk, we present a...
The large statistical fluctuations in the ionization energy loss high energy physics process by charged particles in gaseous detectors implies that many measurements are needed along the particle track to get a precise mean, and this represent a limit to the particle separation capabilities that should be overcome in the design of future colliders. The cluster counting technique (dN/dx)...
Due to the massive nature of HEP data, performance has always been a factor in its analysis and processing. Languages like C++ would be fast enough but are often challenging to grasp for beginners, and can be difficult to iterate quickly in an interactive environment . On the other hand, the ease of writing code and extensive library ecosystem make Python an enticing choice for data analysis....
Cryogenic phonon detectors are used by direct detection dark matter experiments to achieve sensitivity to light dark matter particle interactions. Such detectors consist of a target crystal equipped with a superconducting thermometer. The temperature of the thermometer and the bias current in its readout circuit need careful optimization to achieve optimal sensitivity of the detector. This...
In the past years the CMS software framework (CMSSW) has been extended to offload part of the physics reconstruction to NVIDIA GPUs. This can achieve a higher computational efficiency, but it adds extra complexity to the design of dedicated data centres and the use of opportunistic resources, like HPC centres. A possible solution to increase the flexibility of heterogeneous clusters is to...
HEPD-02 is a new, upgraded version of the High Energy Particle Detector as part of a suite of instruments for the second mission of the China Seismo-Electromagnetic Satellite (CSES-02) to be launched in 2023. Designed and realized by the Italian Collaboration LIMADOU of the CSES program, it is optimized to identify fluxes of charged particles (mostly electrons and protons) and determine their...
We hold these truths to be self-evident: that all physics problems are created unequal, that they are endowed with their unique data structures and symmetries, that among these are tensor transformation laws, Lorentz symmetry, and permutation equivariance. A lot of attention has been paid to the applications of common machine learning methods in physics experiments and theory. However, much...
The ever growing increase of computing power necessary for the storage and data analysis of the high-energy physics experiments at CERN requires performance optimization of the existing and planned IT resources.
One of the main computing capacity consumers in the HEP software workflow is the data analysis. To optimize the resource usage, the concept of Analysis Facility (AF) for Run 3 has...
The unprecedented volume of data and Monte Carlo simulations at the HL-LHC will pose increasing challenges for data analysis both in terms of computing resource requirements as well as "time to insight". Precision measurements with present LHC data already face many of these challenges today. We will discuss performance scaling and optimization of RDataFrame for complex physics analyses,...
Ultra-low mass and high granularity Drift Chambers fulfill the requirements for tracking systems of modern High Energy Physics experiments at the future high luminosity facilities (FCC-ee or CEPC).
\indent We present how, in Helium based gas mixtures, by measuring the arrival times of each individual ionization cluster and by using proper statistical tools, it is possible to perform a bias...
In particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual, complex workflows manually, frequently involving job submission in several stages and interaction with distributed storage systems by hand. This process is not...
RooFit is a toolkit for statistical modeling and fitting, and together with RooStats it is used for measurements and statistical tests by most experiments in particle physics, particularly the LHC experiments. As the LHC program progresses, physics analyses become more computationally demanding. Therefore, recent RooFit developments were focused on performance optimization, in particular to...
There are established classical methods to reconstruct particle tracks from recorded hits on the particle detectors. Current algorithms do this either by cut in some features, like recorded time of the hits, or by the fitting process. This is potentially error prone and resource consuming. For high noise events, these issues are more critical and this method might even fail. We have been...
In recent years, new technologies and new approaches have been developed in academia and industry to face the necessity to both handle and easily visualize huge amounts of data, the so-called “big data”. The increasing volume and complexity of HEP data challenge the HEP community to develop simpler and yet powerful interfaces based on parallel computing on heterogeneous platforms. Good...
Monte Carlo simulation is a vital tool for all physics programmes of particle physics experiments. Their accuracy and reliability in reproducing detector response is of the utmost importance. For the LHCb experiment, which is embarking on a new data-take era with an upgraded detector, a full suite of verifications has been put in place for its simulation software to ensure the quality of the...
We developed supervised and unsupervised quantum machine learning models for anomaly detection tasks at the Large Hadron Collider at CERN. Current Noisy Intermediate Scale Quantum (NISQ) devices have a limited number of qubits and qubit coherence. We designed dimensionality reduction models based on Autoencoders to accommodate the constraints dictated by the quantum hardware. Different designs...
Compared to LHC Run 1 and Run 2, future HEP experiments, e.g. at the HL-LHC, will increase the volume of generated data by an order of magnitude. In order to sustain the expected analysis throughput, ROOT's RNTuple I/O subsystem has been engineered to overcome the bottlenecks of the TTree I/O subsystem, focusing also on a compact data format, asynchronous and parallel requests, and a layered...
Through its TMVA package, ROOT provides and connects to machine learning tools for data analysis at HEP experiments and beyond. In addition, ROOT provides through its powerful I/O system and RDataFrame analysis tools the capability to efficiently select and query input data from large data sets as typically used in HEP analysis. At the same time, several existing Machine Learning tools exist...
MadMiner is a python module that implements a powerful family of multivariate inference techniques that leverage both matrix element information and machine learning.
This multivariate approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the under-lying physics or detector response.
In this paper, we address some of the...
The feature complexity of data recorded by particle detectors combined with the availability of large simulated datasets presents a unique environment for applying state-of-the-art machine learning (ML) architectures to physics problems. We present the Simplified Cylindrical Detector (SCD): a fully configurable GEANT4 calorimeter simulation which mimics the granularity and response...
To sustain the harsher conditions of the high-luminosity LHC, the CMS Collaboration is designing a novel endcap calorimeter system. The new calorimeter will predominantly use silicon sensors to achieve sufficient radiation tolerance and will maintain highly granular information in the readout to help mitigate the effects of the pile up. In regions characterized by lower radiation levels, small...
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent...
The classification of HEP events, or separating signal events from the background, is one of the most important analysis tasks in High Energy Physics (HEP), and a foundational task in the search for new phenomena. Complex deep learning-based models have been fundamental for achieving accurate and outstanding performance in this classification task. However, the quantification of the...