- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
The 21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2022) will take place between Monday 24th and Friday, 28th October, 2022 at the Villa Romanazzi Carducci in Bari, Italy.
The 21st edition of ACAT will — once again — bring together computational experts from a wide range of disciplines, including particle-, nuclear-, astro-, and accelerator-physics as well as high performance computing. Through this unique forum, we will explore the areas where these disciplines overlap with computer science, fostering the exchange of ideas related to cutting-edge computing, data-analysis, and theoretical-calculation technologies.
Information will be provided before August 15th.
The theme of ACAT 2022 will reflect the increasing adoption of AI and ML techniques as standard tools in science and beyond. This use in real production workflows shows both successes and new challenges.
You can sign up for email notifications acat-info@cern.ch by sending email to acat-loc2022@cern.ch! This list is low traffic and will only get you ACAT conference announcements and general information (for this and future conferences in the ACAT series).
Many people are working together to bring you this conference! The organization page has some details. David Britton is the chair of the International Advisory Committee and Axel Naumann is the chair of the Scientific Program Committee. Lucia Silvestris is the chair of the Local Organizing Committee.
Banner, backgrounds and poster photos by Francesco Pepe ©.
The European Processor Initiative (EPI) is an EU-funded project that aims to develop and implement a new family of European processors for high performance computing, artificial intelligence, and a range of emerging application domains. A variety of processor technologies are being implemented as part of EPI. They are divided into two main development lines: the General Purpose Processor (GPP) and the European Processor Accelerator (EPAC).
The first CPU from the GPP line -- Rhea1, a multi-core processor using the Arm Neoverse V1 architecture --, will be commercialised by SiPEARL SAS. The Rhea1 architectural specifications have been determined via co-design using typical HPC applications and benchmarks. Rhea1 will integrate core technologies from several EPI partners and offers unique features in terms of memory architecture, memory bandwidth optimisation, security and power management. Amongst others, it includes High Bandwidth Memory (HBM2) and a scalable network-on-chip (NoC) that enables high-frequency, high-bandwidth data transfers between cores, accelerators, input/output (IO) and shared memory resources.
The EPI accelerator line uses the open-source RISC-V Instruction Set Architecture (ISA) to deliver energy-efficient acceleration for HPC and AI workloads. The EPAC v1.0 test chip is the first proof-of-concept of the EPI accelerator stream, which has fully embraced the open-source philosophy by contributing to the expansion of the RISC-V ecosystem, extending the LLVM compiler codebase and providing new patches, drivers and features for the Linux operating system, OpenMP and MPI. In addition, parts of the accelerator hardware such as the STX (Stencil/Tensor accelerator) have been developed using an open source approach with free licensing on the PULP platform.
The GPP and EPAC streams are complemented by a number of joint activities, including a co-design process to design the EPI processors. Simulations and models of varying levels of detail and precision have been produced to determine the impact of design decisions on the performance of future applications. A benchmark suite containing over 40 applications is used in support of co-design and subsequent evaluation of the EPI processors. The applications are also prepared for use on future EPI systems by adapting and testing them on comparable hardware platforms and emulators.
This talk will describe the main developments of the EPI project and present their current status and roadmap.
Transport phenomena remains nowadays the most challenging unsolved problems in computational physics due to the inherent nature of Navier-Stokes equations. As the revolutionary technology, quantum computing opens a grand new perspective for numerical simulations for instance the computational fluid dynamics (CFD). In this plenary talk, starting with an overview of quantum computing including basic conceptions for instance qubits, quantum gates and circuit, more focus are then put on how to translate the algorithms from the classical computation system to quantum system. The possible quantum algorithms (e.g. partial different equation solver, eigenvalue solvers, etc.) for fluid dynamics are overviewed. Two concrete typical examples are presented with details namely: first one based on lattice Boltzmann method, the second one based on quantum Navier-Stokes algorithm. In the latter method the key process of reducing partial different equations to ordinary differential equations is explained. In the end the advantages of quantum computing are compared with the classical computation, indicating that a large application area for simulating fluid using quantum system is yet coming.
The talk provides a short overview of QT history leading up to current times. Lets have a hard look at where we are in terms of QT and what major pitfalls to expect. The presentation will focus particularly on the issue of the growing talent gap.
The goal of this study is to understand the observed differences in ATLAS software performance, when comparing results measured under ideal laboratory conditions with those from ATLAS computing resources on the Worldwide LHC Computing Grid (WLCG). The laboratory results are based on the full simulation of a single ttbar event and use dedicated, local hardware. In order to have a common and reproducible base to which to compare, thousands of identical ttbar full simulation benchmark jobs were submitted to hundreds of Grid sites using the HammerCloud infrastructure. The impact of the heterogeneous hardware of the Grid sites and the performance difference of different hardware generations is analysed in detail, and a direct, in depth comparison of jobs performed on identical CPU types is also done. The choice of the physics sample used in the benchmark is validated by comparing the performance on each Grid site measured with HammerCloud, weighted by its contribution to the total ATLAS full simulation production output.
Ionization of matters by charged particles are the main mechanism for particle identification in gaseous detectors. Traditionally, the ionization is measured by the total energy loss (dE/dx). The concept of cluster counting, which measures the number of clusters per track length (dN/dx), was proposed in the 1970s. The dN/dx measurement can avoid many sources of fluctuations from the dE/dx measurement, which in the end can potentially have a resolution two times better than the dE/dx.
The dN/dx measurement requires highly efficient reconstruction algorithm. One need to determine the number of peaks associated with the primary electrons in the induced current waveform in a single detection unit. The main challenge of the algorithm is to handle the highly pileup situations of the single peaks and to discriminate the primary peaks from the secondary electrons and noises. A machine learning based algorithm is developed for the cluster counting problem. The algorithm consists of a peak finding algorithm, which aims to find all peaks in the waveform, based on the Recurrent Neural Network (RNN). And a clustering algorithm, which is to determine the number of primary peaks, based on the Convolutional Neural Network (CNN).
In the talk, the basic idea of cluster counting and the reconstruction algorithm based on machine learning will be presented.
The challenges expected for the HL-LHC era, both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of re-thinking their computing models at many levels. In fact a big chunk of the R&D efforts of the CMS experiment have been focused on optimizing the computing and storage resource utilization for the data analysis, and Run3 could provide a perfect benchmark to make studies on new solutions in a realistic scenario. The work that will be shown is focused on the integration and validation phase of an interactive environment for data analysis with the peculiarity of providing a seamless scaling over grid resources at Italian T2s, and possibly opportunistic providers such as HPC. In this approach the integration of new resources has been proved to be exceptionally easy in terms of requirements, thus computing power can be included dynamically in a very effective way. The presentation will firstly focus on an overview of the architectural pillars and the integration challenges. Then the results of a first set of performance measurements will be presented, thanks to a first real user CMS analysis built on top of Root RDataFrame ecosystem that has been successfully executed over such an infrastructure.
The High Energy Physics world will face challenging trigger requests in the next decade. In particular the luminosity increase to 5-7.5 x 1034 cm-2 s-1 at LHC will push the major experiments as ATLAS to exploit the online tracking for their inner detector to reach 10 kHz of events from 1 MHz of Calorimeter and Muon Spectrometer trigger. The project described here is a proposal for a tuned Hough Transform algorithm implementation on FPGA high-end technology, versatile to adapt different tracking situations. The platform developed allows to study different dataset from a software “emulating” the firmware and consequently to the hardware performance and to generate input dataset from ATLAS simulation. Xilinx FPGA have been destined to this implementation, exploiting up to now the VC709 commercial board and its PCI Express Generation 3 technology. The system provides the features to possibly process a 200 pile up event of ATLAS Run4 in the order of 10 µs averagely, with the possibility to run two events at a time. Best efficiency reached are simulated to be > 95 % for single muon tracking. The project plans to be proposed for the Event Filter TDAQ ATLAS Upgrade of Phase-II.
Hydra is an AI system employing off-the-shelf computer vision technologies aimed at autonomously monitoring data quality. Data quality monitoring is an essential step in modern experimentation and Nuclear Physics is no exception. Certain failures can be identified through alarms (e.g. electrical heartbeats) while others are more subtle and often require expert knowledge to identify and diagnose. In the GlueX experiment at Jefferson Laboratory data quality monitoring is a multistep, human in the loop process that begins with shift crews looking at a litany of plots (e.g. occupancy plots) which indicate the performance of detector subsystems. With the sheer complexity of the systems and number of plots needing to be monitored subtle issues can be, and are, missed. During its time in production (over 2 years) Hydra has lightened the load of shift takers of GlueX by autonomously monitoring detector systems. This talk will describe the construction, training, and operation of the Hydra system in GlueX as well as the ongoing work to develop and deploy the system with other experiments at Jefferson Laboratory and beyond.
High energy physics experiments are pushing forward the precision measurements and searching for new physics beyond standard model. It is urgent to simulate and generate mass data to meet requirements from physics. It is one of the most popular areas to make good use of existing power of supercomputers for high energy physics computing. Taking the BESIII experiment as an illustration, we deploy the offline software BOSS into the top-tier supercomputer "Tianhe-II" with the help of Singularity. With very limited internet connection bandwidth and without root privilege, we synchronize and maintain the simulation software up to date through CVMFS successfully, and an acceleration rate in a comparison of HPC and HTC is realized for the same large-scale task. There are two creative ideas to be shared in the community: on one hand, common users constantly meet problems in the real-time internet connection and the conflict of loading locker. We solve these two problems by deployment a squid server and using fuse in memory in each computing node. On the other hand, we provide a MPI python interface for high throughput parallel computation in TianheII. Meanwhile, the program to deal with data output is also specially aligned so that there is no queue issue in the I/O task. The acceleration rate in simulation reaches 80% so far, as we have done the simulation tests up to 15 K processes in parallel.
AtlFast3 is the next generation of high precision fast simulation in ATLAS that is being deployed by the collaboration and was successfully used for the simulation of 7 billion events in Run 2 data taking conditions. AtlFast3 combines a parametrization-based approach known as FastCaloSimV2 and a machine-learning based tool that exploits Generative Adversarial Networks (FastCaloGAN) for the simulation of hadrons.
For the purpose of Run 3, the parametrization of AtlFast3 was fully reworked and many active developments are ongoing to further enhance the quality of fast simulation in ATLAS. This talk will give a brief overview of AtlFast3 with focus on FastCaloSimV2 and outline several improvements with respect to the previous simulator tool AFII. Furthermore, recent advancements in the parametrised simulation, such as the development of a dedicated tune of electromagnetic shower shapes to data are presented.
The inner tracking system of the CMS experiment, consisting of the silicon pixel and strip detectors, is designed to provide a precise measurement of the momentum of charged particles and to perform the primary and secondary vertex reconstruction. The movements of the individual substructures of the tracker detectors are driven by the change in the operating conditions during data taking. Frequent updates in the detector geometry are therefore needed to describe accurately the position, orientation, and curvature of the tracker modules.
The procedure in which new parameters of the tracker geometry are determined is referred to as the alignment of the tracker. The latter is performed regularly during data taking using reconstructed tracks from both collisions and cosmic rays data, and it is further refined after the end of data-taking. The tracker alignment performance corresponding to the ultimate accuracy of the alignment calibration for the legacy reprocessing of the CMS Run 2 data will be presented. The data-driven methods used to derive the alignment parameters and the set of validations that monitor the performance of the physics observables will be reviewed. The first results obtained with the data taken during the year 2021 and the most recent set of results from LHC Run 3 will be presented.
Accurate reconstruction of charged particle trajectories and measurement of their parameters (tracking) is one of the major challenges of the CMS experiment. A precise and efficient tracking is one of the critical components of the CMS physics program as it impacts the ability to reconstruct the physics objects needed to understand proton-proton collisions at the LHC. In this work, we present the tracking performance measured in data where the tag and-probe technique was applied to $Z\longrightarrow \mu^{+}\mu^{-}$ di-muon resonances for all reconstructed muon trajectories and the subset of trajectories in which the CMS Tracker is used to seed the measurement. The performance is assessed using LHC Run 2 at $\sqrt{s}$ = 13 TeV and early LHC Run 3 data at $\sqrt{s}$ = 13.6 TeV.
Building on top of the multithreading functionality that was introduced in Run-2, the CMS software framework (CMSSW) has been extended in Run-3 to offload part of the physics reconstruction to NVIDIA GPUs. The first application of this new feature is the High Level Trigger (HLT): the new computing farm installed at the beginning of Run-3 is composed of 200 nodes, and for the first time each one is equipped with two AMD Milan CPUs and two NVIDIA T4 GPUs. In order to guarantee that the HLT can run on machines without any GPU accelerators - for example as part of the large scale Monte Carlo production running on the grid - the HLT reconstruction has been implemented both for NVIDIA GPUs and for traditional CPUs.
CMS has undertaken a comprehensive validation and commissioning activity to ensure the successful operations of the new HLT farm and the reproducibility of the physics results while using either of the two implementations: some have taken place offline, on dedicated Tier-2 centres equipped with NVIDIA GPUs; other activities ran online during the LHC commissioning period, after installing GPUs on few of the nodes from the Run-2 HLT farm. The final steps were the optimisation of the HLT configuration, after the installation of the new HLT farm.
This contribution will describe the steps taken to validate the GPU-based reconstruction and commission the new HLT farm, leading to the successful data taking activities after the LHC Run-3 start up.
High Energy Physics (HEP) has been using column-wise data stored in synchronized containers, such as most prominently ROOT’s TTree, for decades. These containers have proven to be very powerful as they combine row-wise association capabilities needed by most HEP event processing frameworks (e.g. Athena) with column-wise storage, which typically results in better compression and more efficient support for many analysis use-cases. The downside, however, is that all events (rows) need to contain the same attributes and therefore extending the list of items to be stored, even if needed only for a subsample of events, can be costly in storage and lead to data duplication.
The ATLAS experiment has developed navigational infrastructure to allow storing custom data extensions for subsample of events in separate, but synchronized containers. These extensions can easily be added to ATLAS standard data products (such as DAOD-PHYS or PHYSLITE) avoiding duplication of those core data products, while limiting their size increase. As a proof of principle, a prototype based on the Long Lived Particle search is implemented. Preliminary results concerning the event-size as well as reading/writing performance implications associated with this prototype will be presented.
Augmented data as described above are stored within the same file as the core data. Storing them in dedicated files will be investigated in future, as this could provide more flexibility to store augmentations separate from core data, e.g. certain sites may only want a subset of several augmentations or augmentations can be archived to disk once their analysis is complete.
The Belle II experiment has been collecting data since 2019 at the second generation e+/e- B-factory SuperKEKB in Tsukuba, Japan. The goal of the experiment is to explore new physics via high precision measurement in flavor physics. This is achieved by collecting a large amount of data that needs to be calibrated promptly for fast reconstruction and recalibrated thoroughly for the final reprocessing. To fully automate the calibration process a Python plugin package, b2cal, had been developed based on the open-source Apache Airflow package using Directed Acyclic Graphs (DAGs) to describe the ordering of processes and Flask to provide administration and job submission web pages. Prompt processing and reprocessing are performed at different calibration centers (BNL and DESY, respectively). After calibration, the raw data are reconstructed on the GRID to an analysis-oriented format (mDST), also stored on the GRID, and delivered to the collaborations. This talk will describe the whole procedure, from raw data calibration to mDST production.
Computing in high energy physics is one kind of typical data-intensive applications, especially some data analysis , which require access to a large amount of data. The traditional computing system adopts the "computing-storage" separation mode, which leads to large data volume move during the computing process, and and also increase transmission delay and network load. Therefore, it can effectively alleviate this situation by pushing down some data-intensive tasks from computing node to storage node. The philosophy is that bringing computing as close to the source of data as possible in order to reduce latency and bandwidth use. Generally, storage nodes have computing resources like CPUs, necessary for deploying distributed file system. However, the computing power in storage node is often ignored. This paper designed and implemented a computational storage system based on CERN Open Storage (EOS). The system presents transparently the computational storage functions through standard POSIX file system interface, such as open, read and write. A plugin implemented in EOS storage node (FST) will execute the specified algorithm or program when it finds the special arguments in filename, for example "&CSS=decode". The plugin can read and write file locally in FST, then register new-generated file into EOS name node (MGM). The paper finally give some test results showing that the computational storage mode performs faster and supports more parallel computing tasks than the traditional mode in some applications like raw data decode for LHAASO experiment. Computational storage mode reduces computation time by 37% in single task execution and 72% in the case of 40 tasks in parallel compared with traditional mode.
The outstanding performances obtained by the CMS experiment during Run1 and Run2 represent a great achievement of seamless hardware and software integration. Among the different software parts, the CMS offline reconstruction software is essential for translating the data acquired by the detectors into concrete objects that can be easily handled by the analyzers. The CMS offline reconstruction software needs to be reliable and fast. The long shutdown 2 (LS2) elapsed between LHC Run2 and Run3 has been instrumental in the optimization of the CMS offline reconstruction software and for the introduction of new algorithms reaching a continuous CPU speedup. In order to reach these goals, a continuous benchmarking pipeline has been implemented; CPU timing and memory profiling, using the igprof tool, are performed on a regular basis to monitor the footprint of the new developments and identify the possible areas of performance improvement. The current status and achievement obtained by a continuous benchmarking of CMS experiment offline reconstruction software are described here.
The landscape of computing power available for the CMS experiment is rapidly evolving, from a scenario dominated by x86 processors deployed at WLCG sites, towards a more diverse mixture of Grid, HPC, and Cloud facilities incorporating a higher fraction of non-CPU components, such as GPUs. Using these facilities’ heterogeneous resources efficiently to process the vast amounts of data to be collected in the LHC Run3 and beyond, in the HL-LHC era, is key to CMS’s achieving its scientific goals.
The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. The Submission Infrastructure, together with other elements in the CMS workload management, has been modified in its strategies and enlarged in its scope to make use of these new resources.
In this evolution, key questions such as the optimal level of granularity in the description of the resources, or how to prioritize workflows in this new resource mix must be taken into consideration. In addition, access to many of these resources is considered opportunistic by CMS, thus each resource provider may also play a key role in defining particular allocation policies, diverse from the up-to-now dominant system of pledges. All these matters must be addressed in order to ensure the efficient allocation of resources and matchmaking to tasks to maximize their use by CMS.
This contribution will describe the evolution of the CMS Submission Infrastructure towards a full integration and support of heterogeneous resources according to CMS needs. In addition, a study of the pool of GPUs already available to CMS Offline Computing will be presented, including a survey of their diversity in relation to CMS workloads, and the scalability reach of the infrastructure to support them.
During ATLAS Run 2, in the online track reconstruction algorithm of the Inner Detector (ID), a large proportion of the CPU time was dedicated to the fast track finding. With the proposed HL-LHC upgrade, where the event pile-up is predicted to reach <μ>=200, track finding will see a further large increase in CPU usage. Moreover, only a small subset of Pixel-only seeds is accepted after the fast track finding procedure, essentially discarding the CPU time used on rejected seeds. Therefore, a computationally cheap track candidate seed pre-selection procedure based on approximate track following was designed, which is described in this report. The algorithm uses a parabolic track approximation in the plane perpendicular to the beamline, a combinatorial Kalman filter simplified by a reference-related coordinate system to find the best track candidates. For such candidates, a set of numerical features are created to classify seeds using machine learning techniques, such as Support Vector Machines (SVM) or kernel-based methods. The algorithm was tuned for high identification and rejection of bad seeds, while ensuring no significant loss of track finding efficiency. Current studies focus on implementing the algorithm into the Athena framework for online seed pre-selection, which could be used during Run 3 or potentially be adapted for the ITk geometry for Run 4 of the HL-LHC.
The production of simulated datasets for use by physics analyses consumes a large fraction of ATLAS computing resources, a problem that will only get worse as increases in the instantaneous luminosity provided by the LHC lead to more collisions per bunch crossing (pile-up). One of the more resource-intensive steps in the Monte Carlo production is reconstructing the tracks in the ATLAS Inner Detector (ID), which takes up about 60% of the total detector reconstruction time [1]. This talk discusses a novel technique called track overlay, which substantially speeds up the ID reconstruction. In track overlay the pile-up ID tracks are reconstructed ahead of time and overlaid onto the ID tracks from the simulated hard-scatter event. We present our implementation of this track overlay approach as part of the ATLAS Fast Chain simulation, as well as a method for deciding in which cases it is possible to use track overlay in the reconstruction of simulated data without performance degradation.
[1] ATL-PHYS-PUB-2021-012 (60% refers to Run3, mu=50, including large-radius tracking, p11)
With the scale and complexity of High Energy Physics(HEP) experiments increase, researchers are facing the challenge of large-scale data processing. In terms of storage, HDFS, a distributed file system that supports the "data-centric" processing model, has been widely used in academia and industry. This file system can support Spark and other distributed data localization calculations, researching the application of Hadoop Distributed File System(HDFS) in the field of HEP is the basis for ensuring the application of upper-layer computing in this field. However, HDFS expand the cluster capacity by adding cluster nodes, this way cannot meet the high cost-effective system requirements for the persistence and backup process of massive HEP experimental data. In response to the above problems, researching Hadoop Distributed & Tiered File System(HDTFS) that supports disk-tape storage, taking full advantage of the fast disk access speed and the advantages of large tape storage capacity, low price, and long storage period, to solve the high cost of horizontal expansion of HDFS clusters. The system provides users with a single global namespace, and avoids dependence on external metadata servers to access the data stored on tape. In addition, tape layer resources are managed internally so that users do not have to deal with complex tape storage. The experimental results show that this method can effectively solve the massive data storage of HEP Hadoop cluster.
When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and invested statistical methods including sampling and reweighting to deal with biases introduced by the filtering.
The ATLAS detector at CERN measures proton proton collisions at the Large Hadron Collider (LHC) which allows us to test the limits of the Standard Model (SM) of particles physics. Forward moving electrons produced at these collisions are promising candidates for finding physics beyond the SM. However, the ATLAS detector is not construed to measure forward leptons with pseudorapidity $\eta$ of more than 2.5 with high precision. The ATLAS performance for forward leptons can be improved by enhancing the trigger system. This system selects events of interest in order to not overwhelm the data storage with the information of around 1.7 billion collisions per second. First studies using the Neural Ringer algorithm for selecting forward electrons with $2.5<\eta<3.2$ show promising results. The Neural Ringer using machine learning to analyse detector information to distinguish electromagnetic from hadronic signatures, is being presented. Additionally, its performance on simulated ATLAS Monte Carlo samples in improving the high level trigger for forward electrons will be shown.
As CMS starts the Run 3 data taking, the experiment’s data management software tools along with the monitoring infrastructure have undergone significant upgrades to cope up with the conditions expected in the coming years. The challenges of an efficient, real-time monitoring for the performance of the computing infrastructure or for data distribution are being met using state-of-the-art technologies that are continuously evolving. In this talk, we describe how we set up monitoring pipelines based on a combination of technologies, such as Kubernetes, Spark/Hadoop and other open-source software stacks. We show how the choice of these components is critical for this new generation of services and infrastructure for CMS data management and monitoring. We also discuss how some of the developed monitoring services such as data management monitoring, CPU efficiency monitoring, data-set access and transfers metrics, have been instrumental for taking strategic decisions and increasing the physics harvest through maximal utilization of computing resources available to us.
PARSIFAL (PARametrized SImulation) is a software tool originally implemented to reproduce the complete response of a triple-GEM detector to the passage of a charged particle, taking into account the involved physical processes by their simple parametrization and thus in a very fast way.
Robust and reliable software, such as GARFIELD++, is widely used to simulate the transport of electrons and ions in the gas and all their interactions step by step, but it is CPU-time consuming. The implementation of PARSIFAL code was driven by the need to reduce the processing time, while maintaining the precision of a full simulation.
The software must be initialized with some parameters that can be extracted from the GARFIELD++ simulation, which must be run once-and-for-all. Then it can be run independently to provide a reliable simulation, from the ionization, to diffusion, multiplication, signal induction and electronics, only by sampling from a set of functions which describe the physical effects and depend on the input parameters.
The code has been thoroughly tested on triple-GEM detectors and the simulation was finely tuned to experimental data collected at testbeam.
Recently, PARSIFAL has been extended to another detector in the MPGD family, the micro-RWELL, thanks to the modular structure of the code. The main difference in the treatment of the physical processes is the introduction of the resistive plane and its effect on the formation of the signal. For this purpose, the charge spread on the resistive layer has been described following the work of M. S. Dixit and A. Rankin (NIM A518 (2004) 721-727, NIM A566 (2006) 281-285) and the electronics readout (APV-25) was added to the description.
A fine tuning of the simulation is ongoing to reproduce the experimental data collected during testbeams. A similar strategy already validated for the triple-GEM case is used: the variables of interest for the comparison of the experimental data with simulated results are the cluster charge, cluster size and the position resolution obtained by charge centroid and micro-TPC reconstruction algorithms. In this case, special attention must be paid to the tuning of the resistivity of the resistive layer.
An illustration of the general code, setting the focus on this latest implementation and the first comparison with experimental data from testbeam are the subject of this contribution.
The particle-flow (PF) algorithm is of central importance to event reconstruction at the CMS detector, and has been a focus of developments in light of planned Phase-2 running conditions with an increased pileup and detector granularity. Current rule-based implementations rely on extrapolating tracks to the calorimeters, correlating them with calorimeter clusters, subtracting charged energy and creating neutral particles from significant energy deposits. Such rule-based algorithms can be difficult to extend and may be computationally inefficient under high detector occupancy, while also being challenging to port to heterogeneous architectures in full detail.
In recent years, end-to-end machine learning approaches for event reconstruction have been proposed, including for PF at CMS, with the possible advantage of directly optimising for the physical quantities of interest, being highly reconfigurable to new conditions, while also being a natural fit for deployment on heterogeneous accelerators.
One of the proposed approaches for machine-learned particle-flow (MLPF) reconstruction relies on graph neural networks to infer the full particle content of an event from the tracks and calorimeter clusters based on a training on simulated samples, and has been recently implemented in CMS as a possible future reconstruction R&D direction to fully map out the characteristics of such an approach in a realistic setting.
We discuss progress in CMS towards an improved implementation of the MLPF reconstruction, now optimised on generator-level particle information for the first time to our knowledge, thus paving the way to potentially improving the detector response in terms of physical quantities of interest. We show detailed physics validation with respect to the current PF algorithm in terms of high-level physical quantities such as jet and MET resolution. Furthermore, we discuss progress towards deploying the MLPF algorithm in the CMS software framework on heterogeneous platforms, performing large-scale hyperparameter optimization using HPC systems, as well as the possibilities of making use of explainable artificial intelligence (XAI) to interpret the output.
Secrets Management is a process where we manage secrets, like certificates, database credentials, tokens, and API keys in a secure and centralized way. In the present CMSWEB (the portfolio of CMS internal IT services) infrastructure, only the operators maintain all services and cluster secrets in a secure place. However, if all relevant persons with secrets are away, then we are left with no choice but to contact them to get secrets in case of emergency needs.
In order to overcome this issue, we performed an R&D study for the management of secrets and explored various strategies such as Hashicorp Vault, Github credential manager, and SOPS/age. In this talk, we’ll discuss the process by which CMS investigated these strategies and perform a feasibility analysis of them. We will also underline why CMS chose SOPS as a solution, reviewing how the features of SOPS with age satisfy our needs. We will also discuss how other experiments could adopt our solution.
The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. CMS Tier-0 tasks, such as data repacking and prompt reconstruction, critical for data-taking operations, are executed on a collection of computing resources at CERN, also managed by the CMS Submission Infrastructure.
All this computing power is harnessed via a number of federated resource pools, supervised by HTCondor and GlideinWMS services. Elements such as pilot factories, job schedulers and connection brokers are deployed in HA mode across several “availability zones”, providing stability to our services via hardware redundancy and numerous failover mechanisms.
Given the upcoming start of the LHC Run 3, the Submission Infrastructure stability has been recently tested in a series of controlled exercises, performed without interruption of our services. These tests have demonstrated the resilience of our systems, and additionally provided useful information in order to further refine our monitoring and alarming system.
This contribution will describe the main elements in the CMS Submission Infrastructure design and deployment, along with the performed failover exercises, proving that our systems are ready to serve their critical role in support of CMS activities.
Over the past several years, a deep learning model based on convolutional neural networks has been developed to find proton-proton collision points (also known as primary vertices, or PVs) in Run 3 LHCb data. By converting the three-dimensional space of particle hits and tracks into a one-dimensional kernel density estimator (KDE) along the direction of the beamline and using the KDE as an input feature into a neural network, the model has achieved an efficiency of 98% with a low false positive rate. The success of this method motivates its extension to other experiments, including ATLAS. Although LHCb is a forward spectrometer and ATLAS is a central detector, ATLAS has the necessary characteristics to compute KDEs analogous to the LHCb detector. While the ATLAS detector will benefit from higher precision, the expected number of visible PVs per event will be approximately 10 times that for LHCb, resulting in only slightly altered KDEs. The KDE and a few related input features are fed into the same neural network architectures used to achieve the results for LHCb. We present the development of the input feature and initial results across different network architectures. The results serve as a proof-of-principle that a deep neural network can achieve high efficiency and low false positive rates for finding vertices in ATLAS data.
Restarting the LHC again after more than 3 years of shutdown, unprecedented amounts of data are expected to be recorded. Even with the WLCG providing a tremendous amount of compute resources to process this data, local resources will have to be used for additional compute power. This, however, makes the landscape in which computing takes place more heterogeneous.
In this contribution, we present a solution for dynamically integrating non-HEP resources into existing infrastructures using the COBalD/TARDIS resource manager. By providing all resources through conventional CEs as single point-of-entry, the use of these external resources becomes completely transparent for experiments and users.
In addition, experiences with an existing setup, operated in production since more than a year, extending the German Tier 2 WLCG site operated at RWTH Aachen University with a local HPC cluster will be discussed.
The INFN-CNAF Tier-1 is engaged for years in a continuous effort to integrate its computing centre with more tipologies of computing resources. In particular, the challenge of providing opportunistic access to nonstandard CPU architectures, such as PowerPC or hardware accelerators (GPUs) has been actively exploited. In this work, we describe a solution to transparently integrate access to ppc64 CPUs as also GPUs. This solution has been tested to transparently extend the INFN-T1 Grid computing centre with Power9 based machines and V100 GPUs from the Marconi 100 HPC cluster managed by CINECA. We also discuss further possible improvements and how this will meet requirements and future plans for the new tecnopolo centre, where the CNAF Tier-1 will be hosted soon.
Simulation in High Energy Physics (HEP) places a heavy burden on the available computing resources and is expected to become a major bottleneck for the upcoming high luminosity phase of the LHC and for future Higgs factories, motivating a concerted effort to develop computationally efficient solutions. Methods based on generative machine learning methods hold promise to alleviate the computational strain produced by simulation while providing the physical accuracy required of a surrogate simulator.
In this contribution, an overview of a growing body of work focused on simulating showers in highly granular calorimeters will be reported, which is making significant steps towards realistic fast simulation tools based on deep generative models. Progress on the simulation of both electromagnetic and hadronic showers will be presented, with a focus on the high degree of physical fidelity and computational performance achieved. Additional steps taken to address the challenges faced when broadening the scope of these simulators, such as those posed by multi-parameter conditioning, will also be discussed.
A bright future awaits particle physics. The LHC Run 3 just started, characterised by the most energetic beams ever created by humankind and the most sophisticated detectors. In the next few years we will accomplish the most precise measurements to challenge our present understanding of nature that will, potentially, lead us to prestigious discoveries. However, Run 3 is just the beginning. A rich programme is ahead of us at the HL-LHC, the EIC, and at future colliders, like the FCC. These programs imply a large effort and substantial funding, for example to develop future detector and accelerator technologies, to construct new experiments and facilities, or expanding the scope of the existing ones. This contribution is about the software and computing that will lead us to the full exploitation of such infrastructure, the software and computing that will empower us to make important strides in humanity's understanding of the universe. The HL-LHC, EIC and FCC eras will be taken in consideration in this contribution. We will discuss the role of education, innovation and technology in our preparation for the future. We will also review the current state of the art, discuss ongoing technology evolutions, for instance in hardware and programming languages, and extrapolate most relevant trends into the next decades. Moreover, we'll identify the areas where our efforts could be focussed to boost the progression of particle physics software and computing, as well as the steps we can take to take advantage of veritable revolutions.
Agent-based modeling is a versatile methodology to model complex systems and gain insights into fields as diverse as biology, sociology, economics, finance, and more. However, existing simulation platforms do not always take full advantage of modern hardware and therefore limit the size and complexity of the models that can be simulated.
This talk presents the BioDynaMo platform designed to alleviate these issues, enable large-scale agent-based simulations, and reduce time-to-insight. We will examine BioDynaMo's modular software design and underlying performance optimizations that enable simulations with billions of agents in various research fields.
The ATLAS experiment at the LHC relies critically on simulated event samples produced by the full Geant4 detector simulation software (FullSim). FullSim was the major CPU consumer during the last data-taking year in 2018 and it is expected to be still significant in the HL-LHC era [1, 2]. In September 2020 ATLAS formed a Geant4 Optimization Task Force to optimize the computational performance of FullSim for the Run 3 Monte Carlo campaign. This contribution summarizes the already implemented and upcoming improvements. These include improved features from the core Geant4 software, optimal options in the simulation configuration, simplifications in geometry and magnetic field description and technical improvements in the way ATLAS simulation code interfaces with Geant4. Overall, more than 50% higher throughput is achieved, compared to the baseline simulation configuration used during Run 2.
[1]: ATLAS Collaboration, “ATLAS HL-LHC Computing Conceptual Design Report”, CERN-LHCC-2020-015.
[2]: ATLAS Collaboration, “ATLAS Software and Computing HL-LHC Roadmap”, CERN-LHCC-2022-005.
The ASTRI Mini-Array is a gamma-ray experiment led by Istituto Nazionale di Astrofisica with the partnership of the Instituto de Astrofisica de Canarias, Fundacion Galileo Galilei, Universidade de Sao Paulo (Brazil) and North-West University (South Africa). The ASTRI Mini-Array will consist of nine innovative Imaging Atmospheric Cherenkov Telescopes that are being installed at the Teide Astronomical Observatory (~2400 m a.s.l.) in Tenerife (Canary Islands, Spain). The ASTRI Mini-Array software will cover the entire life cycle of the experiment, including scheduling, operations and data dissemination. The on-site control software will allow the operator to communicate remotely to the array (including automated reaction to critical environmental conditions). Due to the high-speed (10 Gbit/s) networking connection available between Canary Islands and Italy, all data will be delivered every night to the ASTRI dedicated Data Center in Rome for their processing and dissemination. The ASTRI team made experience with ASTRI-Horn, the first Italian dual-mirror Cherenkov telescope, prototype of the ASTRI Mini-Array telescopes. Exploiting lessons learned from ASTRI-Horn, we decided to adopt an iterative incremental model for the software in order to provide more software releases according to the project schedule. Due to this software peculiarity, we have implemented a Quality Assurance (QA) programme specific for the software, which defines the strategy and the organization for the management of the quality control. In this contribution we present the layout and the contents of the ASTRI Mini-Array QA software programme, describing the organization adopted for its management and reporting some examples of how it has been applied so far.
Experiments at the CERN High-Luminosity Large Hadron Collider (HL-LHC) will produce hundreds of Petabytes of data per year. Efficient processing of this dataset represents a significant human resource and technical challenge. Today, ATLAS data processing applications run in multi-threaded mode, using Intel TBB for thread management, which allows efficient utilization of all available CPU cores on the computing resources. However, modern HPC systems and high-end computing clusters are increasingly based on heterogeneous architectures, usually a combination of CPU and accelerators (e.g., GPU, FPGA). To run ATLAS software on these machines efficiently, we started developing a distributed, fine-grained, vertically integrated task scheduling software system. A first simplified implementation of such a system called Raythena was developed in late 2019. It is based on Ray - a high-performance distributed execution platform developed by Riselab at UC Berkeley. Raythena leverages the ATLAS event-service architecture for efficient utilization of CPU resources on HPC systems by dynamically assigning fine-grained workloads (individual events or event ranges) to ATLAS data-processing applications running simultaneously on multiple HPC compute nodes.
The main purpose of the Raythena project was to gain the experience of developing real-life applications with the Ray platform. However, in order to achieve our main objective, we need to design a new system capable of utilizing heterogeneous computing resources in a distributed environment. To accomplish this, we have started to evaluate HPX as an alternative to TBB/Ray. HPX is a C++ library for concurrency and parallelism developed by the Stellar group, which exposes a uniform, standards-oriented API for programming parallel, distributed, and heterogeneous applications.
This presentation will describe the preliminary results of the evaluation of HPX for implementation of the task scheduler for ATLAS data-processing applications aimed to enable cross-node scheduling in heterogeneous systems that offer a mixture of CPU and GPU architectures. We present the prototype applications implemented using HPX and the preliminary results of performance studies of these applications.
GPU acceleration has been successfully utilised in particle physics for real time analysis and simulation, in this study, we investigate the potential benefits for medical physics applications by analysing performance, development effort, and availability. We selected a software developer with no high performance computing experience to parallelise and accelerate a stand-alone Monte Carlo simulation consisting of electron single coulomb scattering. Such simulations contribute to real-time dose estimation for real-time adaptive radiotherapy, a new and emerging cancer treatment that heavily relies on high performance computing. As a proof of principle, we implement a single scattering process of electrons in a homogeneous material with pencil beam at constant initial energy. We compared performance gain offered by GPU acceleration against an optimised CPU implementation and evaluated it by computing 100M histories of a 128 keV electron interacting in water. We also evaluated 1B histories to measure the scalability. The results show that when comparing the multi-core CPU implementation running with 24 cores, a speedup of 808x (100M) and 1727x (1B), which corresponds to a 320x and 648x cost-equivalent speedup. The results on both architectures were statistically equivalent.The successful implementation and measured acceleration combined with the low level of expertise needed for obtaining such speedup is a promising first step for the use of GPU acceleration in a context such as real-time adaptive radiotherapy where there are strict performance and time requirements.
The LHCb experiment underwent a major upgrade for data taking with higher luminosity in Run 3 of the LHC. New software that exploits modern technologies in the underlying LHCb core software framework, is part of this upgrade. The LHCb simulation framework, Gauss, is adapted accordingly to cope with the increase in the amount of simulated data required for Run 3 analyses. An additional constraint rises from the fact that Gauss also relies on external simulation libraries.
The new version of Gauss, based on a newly-developed, experiment-agnostic core framework where the generic simulation components have been encapsulated, is called Gaussino. This simulation framework allows easier prototyping and testing of new technologies where only the core elements are affected. Gaussino provides a plug&play mechanism for modelling collisions and interfacing generators like Pythia and EvtGen. It relies on Gaudi for general functionalities and the Geant4 toolkit for particle transport, combining their specific multi-threaded approaches. A fast simulation interface to replace the Geant4 physics processes with a palette of fast simulation models for a given sub-detector, including new deep learning based options, is the most recent addition. Geometry layouts can be provided through DD4Hep or experiment-specific software. A new, built-in mechanism to define simple volumes at configuration time can ease the development cycle.
In this contribution, will describe the structure and functionality of Gaussino, as well as its more recent developments and performance. We will also show how the new version of Gauss exploits the Gaussino infrastructure to match the requirements of the simulation(s) of the LHCb experiment.
Since the last decade, the so-called Fourth Industrial Revolution is
ongoing. It is a profound transformation in industry, where new tech-
nologies such as smart automation, large-scale machine-to-machine com-
munication, and the internet of things are largely changing traditional
manufacturing and industrial practices. The analysis of the huge amount
of data, collected in all modern industrial plants, not only has greatly
benefited from modern tools of artificial intelligence, but has also spurred
the development of new ones. In this context, we present a new approach,
based on the combined use of a Long Short-Term Memory (LSTM) neu-
ral network and Bayesian inference, for the predictive maintenance of an
industrial plant. SPE and Hotelling metrics, assessing the degree of com-
patibility between the time-evolving industrial data and the output of the
LSTM, trained on a reference period of good working condition, are used
to update the Bayesian probability of a failure of the plant. This method
has successfully been applied to a real industrial case and the results are
presented and discussed. Finally, it is important to highlight that, although
developed to tackle a precise industrial need, the presented approach is
general and can be applied to a plethora of other scenarios.
Signal-background classification is a central problem in High-Energy Physics (HEP), that plays a major role for the discovery of new fundamental particles. The recent Parametric Neural Network (pNN) is able to leverage multiple signal mass hypotheses as an additional input feature to effectively replace a whole set of individual neural classifiers, each providing (in principle) the best response for the corresponding mass hypothesis. In this work we aim at deepening the understanding of pNNs in light of real-world usage. We discovered several peculiarities of parametric networks, providing intuition, metrics, and guidelines to them. We further propose the affine parametrization scheme, resulting in a new parameterized architecture: the affine parametric neural network (AffinePNN); along with many other generally applicable improvements, like the balanced training procedure, and the background's mass distribution. Finally, we extensively and empirically evaluate our models on the HEPMASS dataset, along its imbalanced version (HEPMASS-IMB) provided by us, to further validate our approach. Presented results are in terms of the impact of the proposed design decisions, classification performance, and interpolation capability.
The publication of full likelihood functions (LFs) of LHC results is vital for a long-lasting and profitable legacy of the LHC. Although major steps have been put forward in this direction, the systematic publication of LFs remains a big challenge in High Energy Physics (HEP) as such distributions are usually quite complex and high-dimensional. Thus, we propose to describe LFs with Normalizing Flows (NFs); a powerful class of expressive generative networks that provide density estimation by construction. In this talk, we show that NFs are able to accurately model the complex high-dimensional LFs found in HEP, in some cases even with relatively small training samples. This approach opens the possibility of compact and efficient characterisations of the LFs derived from LHC searches, SM measurements, phenomenological studies, etc.
We present a novel computational approach for extracting weak signals, whose exact location and width may be unknown, from complex background distributions with an arbitrary functional form. We focus on datasets that can be naturally presented as binned integer counts, demonstrating our approach on the datasets from the Large Hadron Collider. Our approach is based on Gaussian Process (GP) regression - a powerful and flexible machine learning technique that allowed us to model the background without specifying its functional form explicitly, and to separate the background and signal contributions in a robust and reproducible manner. Unlike functional fits, our GP-regression-based approach does not need to be constantly updated as more data becomes available. We discuss how to select the GP kernel type, considering trade-offs between kernel complexity and its ability to capture the features of the background distribution. We show that our GP framework can be used to detect the Higgs boson resonance in the data with more statistical significance than a polynomial fit specifically tailored to the dataset. Finally, we use Markov Chain Monte Carlo (MCMC) sampling to confirm the statistical significance of the extracted Higgs signature.
We present an end-to-end reconstruction algorithm to build particle candidates from detector hits in next-generation granular calorimeters similar to that foreseen for the high-luminosity upgrade of the CMS detector. The algorithm exploits a distance-weighted graph neural network, trained with object condensation, a graph segmentation technique. Through a single-shot approach, the reconstruction task is paired with energy regression. We describe the reconstruction performance in terms of efficiency as well as in terms of energy resolution. In addition, we show the jet reconstruction performance of our method and discuss its inference computational cost. To our knowledge, this work is the first-ever example of single-shot calorimetric reconstruction of (1000) particles in high-luminosity conditions with 200 pileup.
The matrix element (ME) calculation in any Monte Carlo physics event generator is an ideal fit for implementing data parallelism with lockstep processing on GPUs and on CPU vector registers. For complex physics processes where the ME calculation is the computational bottleneck of event generation workflows, this can lead to very large overall speedups by efficiently exploiting these hardware architectures, which are now largely underutilized in HEP. In this contribution, we will present the latest status of our work on the reengineering of the Madgraph5_aMC@NLO event generator for these architectures. The new implementations of the ME calculation in vectorized C++, in CUDA and in the ALPAKA, KOKKOS and SYCL portability frameworks will be described in detail, as well as their integration into the existing MadEvent framework to keep the same overall look-and-feel of the user interface. Performance numbers will be reported both for the ME calculation alone and for the overall production workflow for unweighted event generation. First experience with an alpha release of the software supporting LHC LO processes, which is expected by the time of the ACAT2022 conference, will also be discussed.
For more than a decade Monte Carlo (MC) event generators with the current matrix element algorithms have been used for generating hard scattering events on CPU platforms, with excellent flexibility and good efficiency.
While the HL-LHC is approaching and precision requirements are becoming more demanding, many studies have been made to solve the bottleneck in the current MC event generator toolchains. The novel family of fast matrix element algorithms (BlockGen) shown in this report, is one of the new developments that are more suitable for GPU acceleration.
We report the development experience of porting Blockgen using Kokkos. Moreover, we discuss the performance of the Kokkos version in comparison with the dedicated GPU version in CUDA.
For more than a decade the current generation of fully automated, matrix element generators has provided hard scattering events with excellent flexibility and good efficiency.
However, as recent studies have shown, they are a major bottleneck in the established Monte Carlo event generator toolchains. With the advent of the HL-LHC and ever rising precision requirements, future developments will need to focus on computational performance, especially at intermediate to large jet multiplicities.
We present the novel BlockGen family of fast matrix element algorithms that are amenable for GPU acceleration, making use of modern, minimal color decompositions. Moreover, we discuss the performance achieved for standard candle processes such as V+jets and $t\bar{t}$+jets production.
High-precision calculations are an indispensable ingredient to the success of the LHC physics programme, yet their poor computing efficiency has been a growing cause for concern, threatening to become a paralysing bottleneck in the coming years. We present solutions to eliminate the apprehension by focussing on two major components of general purpose Monte Carlo event generators: The evaluation of parton-distribution functions along with the generation of perturbative matrix elements. We show that for the cost-driving event samples employed by the ATLAS experiment to model omnipresent irreducible Standard Model backgrounds, such as weak boson+jets as well as top-quark-pair production, these components dominate the overall run time by up to 80%. We demonstrate that a reduction of the computing footprint of LHAPDF and SHERPA by factors of around 50 can be achieved for multi-leg NLO event generation, thereby smashing one of the major milestones set by the HSF event generator working group whilst paving the way towards affordable state-of-the-art event simulation in the HL-LHC era.
The goal of this study is to understand the observed differences in ATLAS software performance, when comparing results measured under ideal laboratory conditions with those from ATLAS computing resources on the Worldwide LHC Computing Grid (WLCG). The laboratory results are based on the full simulation of a single ttbar event and use dedicated, local hardware. In order to have a common and reproducible base to which to compare, thousands of identical ttbar full simulation benchmark jobs were submitted to hundreds of Grid sites using the HammerCloud infrastructure. The impact of the heterogeneous hardware of the Grid sites and the performance difference of different hardware generations is analysed in detail, and a direct, in depth comparison of jobs performed on identical CPU types is also done. The choice of the physics sample used in the benchmark is validated by comparing the performance on each Grid site measured with HammerCloud, weighted by its contribution to the total ATLAS full simulation production output.
Ionization of matters by charged particles are the main mechanism for particle identification in gaseous detectors. Traditionally, the ionization is measured by the total energy loss (dE/dx). The concept of cluster counting, which measures the number of clusters per track length (dN/dx), was proposed in the 1970s. The dN/dx measurement can avoid many sources of fluctuations from the dE/dx measurement, which in the end can potentially have a resolution two times better than the dE/dx.
The dN/dx measurement requires highly efficient reconstruction algorithm. One need to determine the number of peaks associated with the primary electrons in the induced current waveform in a single detection unit. The main challenge of the algorithm is to handle the highly pileup situations of the single peaks and to discriminate the primary peaks from the secondary electrons and noises. A machine learning based algorithm is developed for the cluster counting problem. The algorithm consists of a peak finding algorithm, which aims to find all peaks in the waveform, based on the Recurrent Neural Network (RNN). And a clustering algorithm, which is to determine the number of primary peaks, based on the Convolutional Neural Network (CNN).
In the talk, the basic idea of cluster counting and the reconstruction algorithm based on machine learning will be presented.
The challenges expected for the HL-LHC era, both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of re-thinking their computing models at many levels. In fact a big chunk of the R&D efforts of the CMS experiment have been focused on optimizing the computing and storage resource utilization for the data analysis, and Run3 could provide a perfect benchmark to make studies on new solutions in a realistic scenario. The work that will be shown is focused on the integration and validation phase of an interactive environment for data analysis with the peculiarity of providing a seamless scaling over grid resources at Italian T2s, and possibly opportunistic providers such as HPC. In this approach the integration of new resources has been proved to be exceptionally easy in terms of requirements, thus computing power can be included dynamically in a very effective way. The presentation will firstly focus on an overview of the architectural pillars and the integration challenges. Then the results of a first set of performance measurements will be presented, thanks to a first real user CMS analysis built on top of Root RDataFrame ecosystem that has been successfully executed over such an infrastructure.
The High Energy Physics world will face challenging trigger requests in the next decade. In particular the luminosity increase to 5-7.5 x 1034 cm-2 s-1 at LHC will push the major experiments as ATLAS to exploit the online tracking for their inner detector to reach 10 kHz of events from 1 MHz of Calorimeter and Muon Spectrometer trigger. The project described here is a proposal for a tuned Hough Transform algorithm implementation on FPGA high-end technology, versatile to adapt different tracking situations. The platform developed allows to study different dataset from a software “emulating” the firmware and consequently to the hardware performance and to generate input dataset from ATLAS simulation. Xilinx FPGA have been destined to this implementation, exploiting up to now the VC709 commercial board and its PCI Express Generation 3 technology. The system provides the features to possibly process a 200 pile up event of ATLAS Run4 in the order of 10 µs averagely, with the possibility to run two events at a time. Best efficiency reached are simulated to be > 95 % for single muon tracking. The project plans to be proposed for the Event Filter TDAQ ATLAS Upgrade of Phase-II.
Hydra is an AI system employing off-the-shelf computer vision technologies aimed at autonomously monitoring data quality. Data quality monitoring is an essential step in modern experimentation and Nuclear Physics is no exception. Certain failures can be identified through alarms (e.g. electrical heartbeats) while others are more subtle and often require expert knowledge to identify and diagnose. In the GlueX experiment at Jefferson Laboratory data quality monitoring is a multistep, human in the loop process that begins with shift crews looking at a litany of plots (e.g. occupancy plots) which indicate the performance of detector subsystems. With the sheer complexity of the systems and number of plots needing to be monitored subtle issues can be, and are, missed. During its time in production (over 2 years) Hydra has lightened the load of shift takers of GlueX by autonomously monitoring detector systems. This talk will describe the construction, training, and operation of the Hydra system in GlueX as well as the ongoing work to develop and deploy the system with other experiments at Jefferson Laboratory and beyond.
High energy physics experiments are pushing forward the precision measurements and searching for new physics beyond standard model. It is urgent to simulate and generate mass data to meet requirements from physics. It is one of the most popular areas to make good use of existing power of supercomputers for high energy physics computing. Taking the BESIII experiment as an illustration, we deploy the offline software BOSS into the top-tier supercomputer "Tianhe-II" with the help of Singularity. With very limited internet connection bandwidth and without root privilege, we synchronize and maintain the simulation software up to date through CVMFS successfully, and an acceleration rate in a comparison of HPC and HTC is realized for the same large-scale task. There are two creative ideas to be shared in the community: on one hand, common users constantly meet problems in the real-time internet connection and the conflict of loading locker. We solve these two problems by deployment a squid server and using fuse in memory in each computing node. On the other hand, we provide a MPI python interface for high throughput parallel computation in TianheII. Meanwhile, the program to deal with data output is also specially aligned so that there is no queue issue in the I/O task. The acceleration rate in simulation reaches 80% so far, as we have done the simulation tests up to 15 K processes in parallel.
AtlFast3 is the next generation of high precision fast simulation in ATLAS that is being deployed by the collaboration and was successfully used for the simulation of 7 billion events in Run 2 data taking conditions. AtlFast3 combines a parametrization-based approach known as FastCaloSimV2 and a machine-learning based tool that exploits Generative Adversarial Networks (FastCaloGAN) for the simulation of hadrons.
For the purpose of Run 3, the parametrization of AtlFast3 was fully reworked and many active developments are ongoing to further enhance the quality of fast simulation in ATLAS. This talk will give a brief overview of AtlFast3 with focus on FastCaloSimV2 and outline several improvements with respect to the previous simulator tool AFII. Furthermore, recent advancements in the parametrised simulation, such as the development of a dedicated tune of electromagnetic shower shapes to data are presented.
The inner tracking system of the CMS experiment, consisting of the silicon pixel and strip detectors, is designed to provide a precise measurement of the momentum of charged particles and to perform the primary and secondary vertex reconstruction. The movements of the individual substructures of the tracker detectors are driven by the change in the operating conditions during data taking. Frequent updates in the detector geometry are therefore needed to describe accurately the position, orientation, and curvature of the tracker modules.
The procedure in which new parameters of the tracker geometry are determined is referred to as the alignment of the tracker. The latter is performed regularly during data taking using reconstructed tracks from both collisions and cosmic rays data, and it is further refined after the end of data-taking. The tracker alignment performance corresponding to the ultimate accuracy of the alignment calibration for the legacy reprocessing of the CMS Run 2 data will be presented. The data-driven methods used to derive the alignment parameters and the set of validations that monitor the performance of the physics observables will be reviewed. The first results obtained with the data taken during the year 2021 and the most recent set of results from LHC Run 3 will be presented.
Accurate reconstruction of charged particle trajectories and measurement of their parameters (tracking) is one of the major challenges of the CMS experiment. A precise and efficient tracking is one of the critical components of the CMS physics program as it impacts the ability to reconstruct the physics objects needed to understand proton-proton collisions at the LHC. In this work, we present the tracking performance measured in data where the tag and-probe technique was applied to $Z\longrightarrow \mu^{+}\mu^{-}$ di-muon resonances for all reconstructed muon trajectories and the subset of trajectories in which the CMS Tracker is used to seed the measurement. The performance is assessed using LHC Run 2 at $\sqrt{s}$ = 13 TeV and early LHC Run 3 data at $\sqrt{s}$ = 13.6 TeV.
Building on top of the multithreading functionality that was introduced in Run-2, the CMS software framework (CMSSW) has been extended in Run-3 to offload part of the physics reconstruction to NVIDIA GPUs. The first application of this new feature is the High Level Trigger (HLT): the new computing farm installed at the beginning of Run-3 is composed of 200 nodes, and for the first time each one is equipped with two AMD Milan CPUs and two NVIDIA T4 GPUs. In order to guarantee that the HLT can run on machines without any GPU accelerators - for example as part of the large scale Monte Carlo production running on the grid - the HLT reconstruction has been implemented both for NVIDIA GPUs and for traditional CPUs.
CMS has undertaken a comprehensive validation and commissioning activity to ensure the successful operations of the new HLT farm and the reproducibility of the physics results while using either of the two implementations: some have taken place offline, on dedicated Tier-2 centres equipped with NVIDIA GPUs; other activities ran online during the LHC commissioning period, after installing GPUs on few of the nodes from the Run-2 HLT farm. The final steps were the optimisation of the HLT configuration, after the installation of the new HLT farm.
This contribution will describe the steps taken to validate the GPU-based reconstruction and commission the new HLT farm, leading to the successful data taking activities after the LHC Run-3 start up.
High Energy Physics (HEP) has been using column-wise data stored in synchronized containers, such as most prominently ROOT’s TTree, for decades. These containers have proven to be very powerful as they combine row-wise association capabilities needed by most HEP event processing frameworks (e.g. Athena) with column-wise storage, which typically results in better compression and more efficient support for many analysis use-cases. The downside, however, is that all events (rows) need to contain the same attributes and therefore extending the list of items to be stored, even if needed only for a subsample of events, can be costly in storage and lead to data duplication.
The ATLAS experiment has developed navigational infrastructure to allow storing custom data extensions for subsample of events in separate, but synchronized containers. These extensions can easily be added to ATLAS standard data products (such as DAOD-PHYS or PHYSLITE) avoiding duplication of those core data products, while limiting their size increase. As a proof of principle, a prototype based on the Long Lived Particle search is implemented. Preliminary results concerning the event-size as well as reading/writing performance implications associated with this prototype will be presented.
Augmented data as described above are stored within the same file as the core data. Storing them in dedicated files will be investigated in future, as this could provide more flexibility to store augmentations separate from core data, e.g. certain sites may only want a subset of several augmentations or augmentations can be archived to disk once their analysis is complete.
The Belle II experiment has been collecting data since 2019 at the second generation e+/e- B-factory SuperKEKB in Tsukuba, Japan. The goal of the experiment is to explore new physics via high precision measurement in flavor physics. This is achieved by collecting a large amount of data that needs to be calibrated promptly for fast reconstruction and recalibrated thoroughly for the final reprocessing. To fully automate the calibration process a Python plugin package, b2cal, had been developed based on the open-source Apache Airflow package using Directed Acyclic Graphs (DAGs) to describe the ordering of processes and Flask to provide administration and job submission web pages. Prompt processing and reprocessing are performed at different calibration centers (BNL and DESY, respectively). After calibration, the raw data are reconstructed on the GRID to an analysis-oriented format (mDST), also stored on the GRID, and delivered to the collaborations. This talk will describe the whole procedure, from raw data calibration to mDST production.
Computing in high energy physics is one kind of typical data-intensive applications, especially some data analysis , which require access to a large amount of data. The traditional computing system adopts the "computing-storage" separation mode, which leads to large data volume move during the computing process, and and also increase transmission delay and network load. Therefore, it can effectively alleviate this situation by pushing down some data-intensive tasks from computing node to storage node. The philosophy is that bringing computing as close to the source of data as possible in order to reduce latency and bandwidth use. Generally, storage nodes have computing resources like CPUs, necessary for deploying distributed file system. However, the computing power in storage node is often ignored. This paper designed and implemented a computational storage system based on CERN Open Storage (EOS). The system presents transparently the computational storage functions through standard POSIX file system interface, such as open, read and write. A plugin implemented in EOS storage node (FST) will execute the specified algorithm or program when it finds the special arguments in filename, for example "&CSS=decode". The plugin can read and write file locally in FST, then register new-generated file into EOS name node (MGM). The paper finally give some test results showing that the computational storage mode performs faster and supports more parallel computing tasks than the traditional mode in some applications like raw data decode for LHAASO experiment. Computational storage mode reduces computation time by 37% in single task execution and 72% in the case of 40 tasks in parallel compared with traditional mode.
The outstanding performances obtained by the CMS experiment during Run1 and Run2 represent a great achievement of seamless hardware and software integration. Among the different software parts, the CMS offline reconstruction software is essential for translating the data acquired by the detectors into concrete objects that can be easily handled by the analyzers. The CMS offline reconstruction software needs to be reliable and fast. The long shutdown 2 (LS2) elapsed between LHC Run2 and Run3 has been instrumental in the optimization of the CMS offline reconstruction software and for the introduction of new algorithms reaching a continuous CPU speedup. In order to reach these goals, a continuous benchmarking pipeline has been implemented; CPU timing and memory profiling, using the igprof tool, are performed on a regular basis to monitor the footprint of the new developments and identify the possible areas of performance improvement. The current status and achievement obtained by a continuous benchmarking of CMS experiment offline reconstruction software are described here.
The landscape of computing power available for the CMS experiment is rapidly evolving, from a scenario dominated by x86 processors deployed at WLCG sites, towards a more diverse mixture of Grid, HPC, and Cloud facilities incorporating a higher fraction of non-CPU components, such as GPUs. Using these facilities’ heterogeneous resources efficiently to process the vast amounts of data to be collected in the LHC Run3 and beyond, in the HL-LHC era, is key to CMS’s achieving its scientific goals.
The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. The Submission Infrastructure, together with other elements in the CMS workload management, has been modified in its strategies and enlarged in its scope to make use of these new resources.
In this evolution, key questions such as the optimal level of granularity in the description of the resources, or how to prioritize workflows in this new resource mix must be taken into consideration. In addition, access to many of these resources is considered opportunistic by CMS, thus each resource provider may also play a key role in defining particular allocation policies, diverse from the up-to-now dominant system of pledges. All these matters must be addressed in order to ensure the efficient allocation of resources and matchmaking to tasks to maximize their use by CMS.
This contribution will describe the evolution of the CMS Submission Infrastructure towards a full integration and support of heterogeneous resources according to CMS needs. In addition, a study of the pool of GPUs already available to CMS Offline Computing will be presented, including a survey of their diversity in relation to CMS workloads, and the scalability reach of the infrastructure to support them.
During ATLAS Run 2, in the online track reconstruction algorithm of the Inner Detector (ID), a large proportion of the CPU time was dedicated to the fast track finding. With the proposed HL-LHC upgrade, where the event pile-up is predicted to reach <μ>=200, track finding will see a further large increase in CPU usage. Moreover, only a small subset of Pixel-only seeds is accepted after the fast track finding procedure, essentially discarding the CPU time used on rejected seeds. Therefore, a computationally cheap track candidate seed pre-selection procedure based on approximate track following was designed, which is described in this report. The algorithm uses a parabolic track approximation in the plane perpendicular to the beamline, a combinatorial Kalman filter simplified by a reference-related coordinate system to find the best track candidates. For such candidates, a set of numerical features are created to classify seeds using machine learning techniques, such as Support Vector Machines (SVM) or kernel-based methods. The algorithm was tuned for high identification and rejection of bad seeds, while ensuring no significant loss of track finding efficiency. Current studies focus on implementing the algorithm into the Athena framework for online seed pre-selection, which could be used during Run 3 or potentially be adapted for the ITk geometry for Run 4 of the HL-LHC.
The production of simulated datasets for use by physics analyses consumes a large fraction of ATLAS computing resources, a problem that will only get worse as increases in the instantaneous luminosity provided by the LHC lead to more collisions per bunch crossing (pile-up). One of the more resource-intensive steps in the Monte Carlo production is reconstructing the tracks in the ATLAS Inner Detector (ID), which takes up about 60% of the total detector reconstruction time [1]. This talk discusses a novel technique called track overlay, which substantially speeds up the ID reconstruction. In track overlay the pile-up ID tracks are reconstructed ahead of time and overlaid onto the ID tracks from the simulated hard-scatter event. We present our implementation of this track overlay approach as part of the ATLAS Fast Chain simulation, as well as a method for deciding in which cases it is possible to use track overlay in the reconstruction of simulated data without performance degradation.
[1] ATL-PHYS-PUB-2021-012 (60% refers to Run3, mu=50, including large-radius tracking, p11)
With the scale and complexity of High Energy Physics(HEP) experiments increase, researchers are facing the challenge of large-scale data processing. In terms of storage, HDFS, a distributed file system that supports the "data-centric" processing model, has been widely used in academia and industry. This file system can support Spark and other distributed data localization calculations, researching the application of Hadoop Distributed File System(HDFS) in the field of HEP is the basis for ensuring the application of upper-layer computing in this field. However, HDFS expand the cluster capacity by adding cluster nodes, this way cannot meet the high cost-effective system requirements for the persistence and backup process of massive HEP experimental data. In response to the above problems, researching Hadoop Distributed & Tiered File System(HDTFS) that supports disk-tape storage, taking full advantage of the fast disk access speed and the advantages of large tape storage capacity, low price, and long storage period, to solve the high cost of horizontal expansion of HDFS clusters. The system provides users with a single global namespace, and avoids dependence on external metadata servers to access the data stored on tape. In addition, tape layer resources are managed internally so that users do not have to deal with complex tape storage. The experimental results show that this method can effectively solve the massive data storage of HEP Hadoop cluster.
When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and invested statistical methods including sampling and reweighting to deal with biases introduced by the filtering.
The Phase-II upgrade of the LHC will increase its instantaneous luminosity by a factor of 7 leading to the High Luminosity LHC (HL-LHC). At the HL-LHC, the number of proton-proton collisions in one bunch crossing (called pileup) increases significantly, putting more stringent requirements on the LHC detectors electronics and real-time data processing capabilities.
The ATLAS Liquid Argon (LAr) calorimeter measures the energy of particles produced in LHC collisions. This calorimeter has also trigger capabilities to identify interesting events. In order to enhance the ATLAS detector physics discovery potential, in the blurred environment created by the pileup, an excellent resolution of the deposited energy and an accurate detection of the deposited time is crucial.
The computation of the deposited energy is performed in real-time using dedicated data acquisition electronic boards based on FPGAs. FPGAs are chosen for their capacity to treat large amount of data with very low latency. The computation of the deposited energy is currently done using optimal filtering algorithms that assume a nominal pulse shape of the electronic signal. These filter algorithms are adapted to the ideal situation with very limited pileup and no overlap of the electronic pulses in the detector. However, with the increased luminosity and pileup, the performance of the optimal filter algorithms decreases significantly and no further extension nor tuning of these algorithms could recover the lost performance.
The back-end electronic boards for the Phase-II upgrade of the LAr calorimeter will use the next high-end generation of INTEL FPGAs with increased processing power and memory. This is a unique opportunity to develop the necessary tools, enabling the use of more complex algorithms on these boards. We developed several neural networks (NNs) with significant performance improvements with respect to the optimal filtering algorithms. The main challenge is to efficiently implement these NNs into the dedicated data acquisition electronics. Special effort was dedicated to minimising the needed computational power while optimising the NNs architectures.
Five NN algorithms based on CNN, RNN, and LSTM architectures will be presented. The improvement of the energy resolution and the accuracy on the deposited time compared to the legacy filter algorithms, especially for overlapping pulses, will be discussed. The implementation of these networks in firmware will be shown. Two implementation categories in VHDL and Quartus HLS code are considered. The implementation results on Stratix 10 INTEL FPGAs, including the resource usage, the latency, and operation frequency will be reported. Approximations in the firmware implementations, including the use of fixed-point precision arithmetic and lookup tables for activation functions, will be discussed. Implementations including time multiplexing to reduce resource usage will be presented. We will show that two of these NNs implementations are viable solutions that fit the stringent data processing requirements on the latency (O(100ns)) and bandwidth (O(1Tb/s) per FPGA) needed for the ATLAS detector operation.
The ATLAS detector at CERN measures proton proton collisions at the Large Hadron Collider (LHC) which allows us to test the limits of the Standard Model (SM) of particles physics. Forward moving electrons produced at these collisions are promising candidates for finding physics beyond the SM. However, the ATLAS detector is not construed to measure forward leptons with pseudorapidity $\eta$ of more than 2.5 with high precision. The ATLAS performance for forward leptons can be improved by enhancing the trigger system. This system selects events of interest in order to not overwhelm the data storage with the information of around 1.7 billion collisions per second. First studies using the Neural Ringer algorithm for selecting forward electrons with $2.5<\eta<3.2$ show promising results. The Neural Ringer using machine learning to analyse detector information to distinguish electromagnetic from hadronic signatures, is being presented. Additionally, its performance on simulated ATLAS Monte Carlo samples in improving the high level trigger for forward electrons will be shown.
As CMS starts the Run 3 data taking, the experiment’s data management software tools along with the monitoring infrastructure have undergone significant upgrades to cope up with the conditions expected in the coming years. The challenges of an efficient, real-time monitoring for the performance of the computing infrastructure or for data distribution are being met using state-of-the-art technologies that are continuously evolving. In this talk, we describe how we set up monitoring pipelines based on a combination of technologies, such as Kubernetes, Spark/Hadoop and other open-source software stacks. We show how the choice of these components is critical for this new generation of services and infrastructure for CMS data management and monitoring. We also discuss how some of the developed monitoring services such as data management monitoring, CPU efficiency monitoring, data-set access and transfers metrics, have been instrumental for taking strategic decisions and increasing the physics harvest through maximal utilization of computing resources available to us.
PARSIFAL (PARametrized SImulation) is a software tool originally implemented to reproduce the complete response of a triple-GEM detector to the passage of a charged particle, taking into account the involved physical processes by their simple parametrization and thus in a very fast way.
Robust and reliable software, such as GARFIELD++, is widely used to simulate the transport of electrons and ions in the gas and all their interactions step by step, but it is CPU-time consuming. The implementation of PARSIFAL code was driven by the need to reduce the processing time, while maintaining the precision of a full simulation.
The software must be initialized with some parameters that can be extracted from the GARFIELD++ simulation, which must be run once-and-for-all. Then it can be run independently to provide a reliable simulation, from the ionization, to diffusion, multiplication, signal induction and electronics, only by sampling from a set of functions which describe the physical effects and depend on the input parameters.
The code has been thoroughly tested on triple-GEM detectors and the simulation was finely tuned to experimental data collected at testbeam.
Recently, PARSIFAL has been extended to another detector in the MPGD family, the micro-RWELL, thanks to the modular structure of the code. The main difference in the treatment of the physical processes is the introduction of the resistive plane and its effect on the formation of the signal. For this purpose, the charge spread on the resistive layer has been described following the work of M. S. Dixit and A. Rankin (NIM A518 (2004) 721-727, NIM A566 (2006) 281-285) and the electronics readout (APV-25) was added to the description.
A fine tuning of the simulation is ongoing to reproduce the experimental data collected during testbeams. A similar strategy already validated for the triple-GEM case is used: the variables of interest for the comparison of the experimental data with simulated results are the cluster charge, cluster size and the position resolution obtained by charge centroid and micro-TPC reconstruction algorithms. In this case, special attention must be paid to the tuning of the resistivity of the resistive layer.
An illustration of the general code, setting the focus on this latest implementation and the first comparison with experimental data from testbeam are the subject of this contribution.
The particle-flow (PF) algorithm is of central importance to event reconstruction at the CMS detector, and has been a focus of developments in light of planned Phase-2 running conditions with an increased pileup and detector granularity. Current rule-based implementations rely on extrapolating tracks to the calorimeters, correlating them with calorimeter clusters, subtracting charged energy and creating neutral particles from significant energy deposits. Such rule-based algorithms can be difficult to extend and may be computationally inefficient under high detector occupancy, while also being challenging to port to heterogeneous architectures in full detail.
In recent years, end-to-end machine learning approaches for event reconstruction have been proposed, including for PF at CMS, with the possible advantage of directly optimising for the physical quantities of interest, being highly reconfigurable to new conditions, while also being a natural fit for deployment on heterogeneous accelerators.
One of the proposed approaches for machine-learned particle-flow (MLPF) reconstruction relies on graph neural networks to infer the full particle content of an event from the tracks and calorimeter clusters based on a training on simulated samples, and has been recently implemented in CMS as a possible future reconstruction R&D direction to fully map out the characteristics of such an approach in a realistic setting.
We discuss progress in CMS towards an improved implementation of the MLPF reconstruction, now optimised on generator-level particle information for the first time to our knowledge, thus paving the way to potentially improving the detector response in terms of physical quantities of interest. We show detailed physics validation with respect to the current PF algorithm in terms of high-level physical quantities such as jet and MET resolution. Furthermore, we discuss progress towards deploying the MLPF algorithm in the CMS software framework on heterogeneous platforms, performing large-scale hyperparameter optimization using HPC systems, as well as the possibilities of making use of explainable artificial intelligence (XAI) to interpret the output.
Secrets Management is a process where we manage secrets, like certificates, database credentials, tokens, and API keys in a secure and centralized way. In the present CMSWEB (the portfolio of CMS internal IT services) infrastructure, only the operators maintain all services and cluster secrets in a secure place. However, if all relevant persons with secrets are away, then we are left with no choice but to contact them to get secrets in case of emergency needs.
In order to overcome this issue, we performed an R&D study for the management of secrets and explored various strategies such as Hashicorp Vault, Github credential manager, and SOPS/age. In this talk, we’ll discuss the process by which CMS investigated these strategies and perform a feasibility analysis of them. We will also underline why CMS chose SOPS as a solution, reviewing how the features of SOPS with age satisfy our needs. We will also discuss how other experiments could adopt our solution.
The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. CMS Tier-0 tasks, such as data repacking and prompt reconstruction, critical for data-taking operations, are executed on a collection of computing resources at CERN, also managed by the CMS Submission Infrastructure.
All this computing power is harnessed via a number of federated resource pools, supervised by HTCondor and GlideinWMS services. Elements such as pilot factories, job schedulers and connection brokers are deployed in HA mode across several “availability zones”, providing stability to our services via hardware redundancy and numerous failover mechanisms.
Given the upcoming start of the LHC Run 3, the Submission Infrastructure stability has been recently tested in a series of controlled exercises, performed without interruption of our services. These tests have demonstrated the resilience of our systems, and additionally provided useful information in order to further refine our monitoring and alarming system.
This contribution will describe the main elements in the CMS Submission Infrastructure design and deployment, along with the performed failover exercises, proving that our systems are ready to serve their critical role in support of CMS activities.
Over the past several years, a deep learning model based on convolutional neural networks has been developed to find proton-proton collision points (also known as primary vertices, or PVs) in Run 3 LHCb data. By converting the three-dimensional space of particle hits and tracks into a one-dimensional kernel density estimator (KDE) along the direction of the beamline and using the KDE as an input feature into a neural network, the model has achieved an efficiency of 98% with a low false positive rate. The success of this method motivates its extension to other experiments, including ATLAS and CMS. Although LHCb is a forward spectrometer and ATLAS and CMS are central detectors, both ATLAS and CMS have the necessary characteristics to compute KDEs analogous to the LHCb detector. While the ATLAS and CMS detectors will benefit from higher precision, the expected number of visible PVs per event will be approximately 10 times that for LHCb, resulting in only slightly altered KDEs. The KDE and a few related input features are fed into the same neural network architectures used to achieve the results for LHCb. We present the development of the input feature and initial results across different network architectures. The results serve as a proof-of-principle that a deep neural network can achieve high efficiency and low false positive rates for finding vertices in ATLAS and CMS data.
Restarting the LHC again after more than 3 years of shutdown, unprecedented amounts of data are expected to be recorded. Even with the WLCG providing a tremendous amount of compute resources to process this data, local resources will have to be used for additional compute power. This, however, makes the landscape in which computing takes place more heterogeneous.
In this contribution, we present a solution for dynamically integrating non-HEP resources into existing infrastructures using the COBalD/TARDIS resource manager. By providing all resources through conventional CEs as single point-of-entry, the use of these external resources becomes completely transparent for experiments and users.
In addition, experiences with an existing setup, operated in production since more than a year, extending the German Tier 2 WLCG site operated at RWTH Aachen University with a local HPC cluster will be discussed.
The INFN-CNAF Tier-1 is engaged for years in a continuous effort to integrate its computing centre with more tipologies of computing resources. In particular, the challenge of providing opportunistic access to nonstandard CPU architectures, such as PowerPC or hardware accelerators (GPUs) has been actively exploited. In this work, we describe a solution to transparently integrate access to ppc64 CPUs as also GPUs. This solution has been tested to transparently extend the INFN-T1 Grid computing centre with Power9 based machines and V100 GPUs from the Marconi 100 HPC cluster managed by CINECA. We also discuss further possible improvements and how this will meet requirements and future plans for the new tecnopolo centre, where the CNAF Tier-1 will be hosted soon.
HEPscore is a CPU benchmark, based on HEP applications, that the HEPiX Working Group is proposing as a replacement of the currently used HEPSpec06 benchmark, adopted in WLCG for procurement, computing resource pledges and performance studies.
In 2019, we presented at ACAT the motivations for building a benchmark for the HEP community based on HEP applications. The process from the conception to the implementation and validation of this objective has been inspiring and challenging. In the spirit of the HEP community, it has involved many contributions from software developers, data analysts, experts of the experiments, representatives of several WLCG computing centres, as well as the WLCG HEPscore Deployment Task Force.
In this contribution, we review this long journey and in particular the technological solutions selected, such as containerization of the HEP applications and cvmfs snapshotting. We update the community on the readiness status of HEPscore, the HEP application mix selected to build HEPscore and the deployment plans for 2023. We describe the current campaign of measurements performed on multiple WLCG sites, intended to study the performance of eleven HEP applications on more than 50 different computer systems.
Finally, we also cover how to extend the HEPscore adoption to the benchmarking of heterogeneous resources, and how it can include workloads for physics analysis and Machine Learning algorithms.
During the LHC LS2, the ALICE experiment has undergone a major upgrade of the data acquisition model, evolving from a trigger-based model to a continuous readout. The upgrade allows for an increase in the number of recorded events by a factor of 100 and in the volume of generated data by a factor of 10. The entire experiment software stack has been completely redesigned and rewritten to adapt to the new requirements and to make optimal use of storage and CPU resources. The architecture of the new processing software relies on running parallel processes on multiple processor cores and using large shared memory areas for exchanging data between them.
Without mechanisms that guarantee job resource isolation, the deployment of multi-process jobs can result in a usage that exceeds those originally requested and allocated. Internally, jobs may launch as many processes as defined in their workflow, significantly higher than the number of allocated CPU cores. This freedom of execution can be limited by mechanisms like cgroups, already employed by some Grid sites, however these are a minority. If jobs are allowed to run unconstrained, they may interfere with each other in terms of the simultaneous utilization of the resources. Constrainment mechanisms in this context improve the fairness of resource utilization, both between ALICE jobs and towards other users in general.
The efficient use of the worker nodes' cache memory is closely related to the CPU cores executing the job. An important aspect to consider is the host architecture and the cache topology, i.e. cache levels, size and hierarchical connection to individual cores. Memory usage patterns of running tasks, the memory and cache topologies and the chosen CPU cores to constrain the job to influence the overall efficiency of the execution, in terms of useful work done by unit of time.
This paper presents an analysis of the impact of different CPU pinning strategies on the efficiency of the execution of simulation tasks. The evaluation of the different configurations is performed by extracting a set of metrics tightly related to job turnaround and efficient resource utilization. The results are presented both for the execution of a single job on an idle machine and for whole node saturation, analyzing the interference between jobs. Different host architectures are studied for a global and robust assessment.
The CMS experiment has 1056 Resistive Plate Chambers (RPCs) in its muon system. Monitoring their currents is the first essential step towards maintaining the stability of the CMS RPC detector performance. An automated monitoring tool to carry out this task has been developed. It utilises the ability of Machine Learning (ML) methods in the modelling of the behavior of the current of these chambers. Two types of ML approaches are used: Generalized Linear Models and Autoencoders. In the GLM case, a set of parameters such as environmental conditions, LHC parameters and working point are used to characterize the behavior of the current. In the autoencoder case, the set of currents for all of the high-voltage channels of the RPC system are used as input and the autoencoder network is trained to reproduce these inputs on the output neurons. Both approaches show very good predictive capabilities, with accuracy of the order of 1-2 μA. These predictive capabilities are the basis for the monitoring tool, which is going to be tested during Run 3. All the developed tools are integrated in a framework that can be easily accessed and controlled by a specially developed Web User Interface that allows the end user to work with the monitoring tool in a simple manner.
To increase the science rate for high data rates/volumes, JLab is partnering with ESnet for development of an AI/ML directed dynamic Compute Work Load Balancer (CWLB) of UDP streamed data. The CWLB is an FPGA featuring dynamically configurable, low fixed latency, destination switching and high throughput. The CLWB effectively provides seamless integration of edge / core computing to support direct experimental data processing for immediate use by JLab science programs and others such as the EIC as well as data centers of the future. The ESnet/JLaB FPGA Accelerated Transport (EJFAT) project is targeting near future projects requiring high throughput and low latency for both hot and cooled data for both running experiment data acquisition systems and data center use cases.
The essential function of the CWLB data plane is to redirect so designated data channel streams sharing a common data event designation to selectable destination hosts as a function of data event id, and target host ports as a function of data channel id. Thus is effected a form of hierarchical horizontal scaling at two levels; the first across compute host machines data event by data event for a type of pipe-lined processing for a series of events and secondly across ports on a compute host so that different data id channels may be assigned to different processors for parallelized further processing, e.g., reassembly, event reconstruction, physics harvesting, etc.
An EJFAT control plane running external to the CLWB and using both network and compute farm telemetry, effects AI directed and predictive resource allocation, capacity assessment, and scheduling of compute farm resources in order to dynamically reconfigure the CLWB in-situ as the operating context and conditions require.
CLUE (CLUsters of Energy) is a fast, fully-parallelizable clustering algorithm developed to optimize such a crucial step in the event reconstruction chain of future high granularity calorimeters. The main drawback of having an unprecedentedly high segmentation in this kind of detectors is a huge computation load that, in case of the CMS, must be reduced to fit the harsh requirements of the Phase-2 High Level Trigger.
With the adoption of alpaka as performance portability library in CMSSW, the CLUE algorithm has been tested on multiple accelerators and hybrid platforms. This work presents the latest results obtained with the alpaka implementation of CLUE, which can fully exploit the available hardware on each machine and fulfill the task with high performance.
We propose a novel neural architecture that enforces an upper bound on the Lipschitz constant of the neural network (by constraining the norm of its gradient with respect to the inputs). This architecture was useful in developing new algorithms for the LHCb trigger which have robustness guarantees as well as powerful inductive biases leveraging the neural network’s ability to be monotonic in any subset of features. A new and interesting direction for this architecture is that it can also be used in the estimation of the Wasserstein metric (or the Earth Mover’s Distance) in optimal transport using the Kantorovich-Rubinstein duality. In this talk, I will describe how such architectures can be leveraged for developing new clustering algorithms using the Energy Mover’s Distance. Clustering using optimal transport generalizes all previous well-known clustering algorithms in HEP (anti-kt, Cambridge-Aachen, etc.) to arbitrary geometries and offers new flexibility in dealing with effects such as pile-up and unconventional topologies. I will also talk in detail about how this flexibility can be used to develop new algorithms which are more suitable for the Electron-Ion Collider setting than conventional ones.
Jet tagging is a critical yet challenging classification task in particle physics. While deep learning has transformed jet tagging and significantly improved performance, the lack of a large-scale public dataset impedes further enhancement. In this work, we present JetClass, a new comprehensive dataset for jet tagging. The JetClass dataset consists of 100 M jets, about two orders of magnitude larger than existing public datasets. A total of 10 types of jets are simulated, including several types unexplored for tagging so far. Based on the large dataset, we propose a new Transformer-based architecture for jet tagging, called Particle Transformer (ParT). By incorporating pairwise particle interactions in the attention mechanism, ParT achieves higher tagging performance than a plain Transformer and surpasses the previous state-of-the-art, ParticleNet, by a large margin. The pre-trained ParT models, once fine-tuned, also substantially enhance the performance on two widely adopted jet tagging benchmarks.
https://arxiv.org/abs/2202.03772
Besides modern architectures designed via geometric deep learning achieving high accuracies via Lorentz group invariance, this process involves high amounts of computation. Moreover, the framework is restricted to a particular classification scheme and lacks interpretability.
To tackle this issue, we present BIP, an efficient and computationally cheap framework to build rotational, permutation, and boost in the jet mean axis invariances. Moreover, we show the versatility of our approach to obtaining state-of-the-art range accuracies in both supervised and unsupervised jet tagging by using several out-of-the-box classifiers.
Hadronization is a non-perturbative process, which theoretical description can not be deduced from first principles. Modeling hadron formation requires several assumptions and various phenomenological approaches. Utilizing state-of-the-art Computer Vision and Deep Learning algorithms, it is eventually possible to train neural networks to learn non-linear and non-perturbative features of the physical processes.
Here, I would like to present the latest results of two deep neural networks, by investigating global and kinematical quantities, indeed jet- and event-shape variables. The widely used Lund string fragmentation model is applied as a baseline in √s=7 TeV proton-proton collisions to predict the most relevant observables at further LHC energies. Non-liear QCD scaling properties were also identified and validated by experimental data.
[1] G. Bíró, B. Tankó-Bartalis, G.G. Barnaföldi; arXiv:2111.15655
For many years, the matrix element method has been considered the perfect approach to LHC inference. We show how conditional invertible neural networks can be used to unfold detector effects and initial-state QCD radiation, to provide the hard-scattering information for this method. We illustrate our approach for the CP-violating phase of the top Yukawa coupling in associated Higgs and single-top production.
Learning tasks are implemented via mappings of the sampled data set, including both the classical and the quantum framework. The quantum-inspired approach mimics the support vector machine mapping in a high-dimensional feature space, yielded by the qubit encoding. In our application such scheme is framed in the formulation of a least-squares problem for the minimization of the mean squared error cost function, implemented by means of measurements. The ability of quantum algorithms to manage a high number of parameters will characterize their analysis capability for complex systems, like the targeted biomedical framework.
As the search for new fundamental phenomena at modern particle colliders is a complex and multifaceted task dealing with high-dimensional data, it is not surprising that machine learning based techniques are quickly becoming a widely used tool for many aspects of searches. On the one hand, classical strategies are being supercharged by ever more sophisticated tagging algorithms; on the other hand, new paradigms — such as searching for anomalies in a data-driven way — are being proposed. This talk will review some key developments and consider which steps might be needed to maximise the discovery potential of particle physics experiments.
Trapped ion is the leading candidate for realizing practically useful quantum computers, as the system features highest performance quantum computational operations. Introduction of advanced integration technologies has provided an opportunity to convert a complex atomic physics experiment into a stand-alone programmable quantum computer. In this talk, I will discuss recent technological progress that changed the perception of a trapped ion system as a scalable quantum computer and enabled commercially viable quantum computer. I will also discuss several application areas where quantum computers can make a practical contribution to the computational frontier in scientific applications.
Today, we live in a data-driven society. For decades, we wanted fast storage devices that can quickly deliver data, and storage technologies evolved to meet this requirement. As data-driven decision making becomes an integral part of enterprises, we are increasingly faced with a new need-–one for cheap, long-term storage devices that can safely store the data we generate for tens or hundreds of years to meet legal and regulatory compliance requirements.
In this talk, we will first explore recent trends in the storage hardware landscape that show that all current storage media face fundamental limitations that threaten our ability to store, much less process, the data we generate over long time frames. We will then focus on unconventional biological and analog media that have received quite some attention recently--synthetic Deoxyribonucleic acid (DNA) and film. After highlighting the pros and cons of using each as a digital storage media, I will present our recent work in the EU-funded Future and Emerging Technologies (FET) project OligoArchive, that focuses on overcoming challenges in using such media to build a deep archival tier for data management systems.
Over the past few years, intriguing deviations from the Standard Model predictions have been reported in measurements of angular observables and branching fractions of $B$ meson decays, suggesting the existence of a new interaction that acts differently on the three lepton families. The Belle II experiment has unique features that allow to study $B$ meson decays with invisible particles in the final state, in particular neutrinos. It is possible to deduce the presence of such particles from the energy-momentum imbalance obtained after reconstructing the companion $B$ meson produced in the event. This task is complicated by the thousands of possible final states $B$ mesons can decay into, and is currently performed at Belle II by the Full Event Interpretation (FEI) software, an algorithm based on Boosted Decision Trees and limited to specific, hard-coded decay processes.
In recent years, graph neural networks have proven to be very effective tools to describe relations in physical systems, with applications in a range of fields. Particle decays can be naturally represented in the form of rooted, acyclic tree graphs, with nodes corresponding to particles and edges representing the parent-child relations between them. In this work, we present a graph neural network approach to generically reconstruct $B$ decays at Belle II by exploiting the information from the detected final state particles, without formulating any prior assumption about the nature of the decay. This task is performed by reconstructing the Lowest Common Ancestor matrix, a novel representation, equivalent to the adjacency matrix, that allows reconstruction of the decay from the final state particles alone. Preliminary results show that the graph neural network approach outperform the FEI by a factor of at least 3.
Detector modeling and visualization are essential in the life cycle of a High Energy Physics (HEP) experiment. Unity is a professional multi-media creation software that has the advantages of rich visualization effects and easy deployment on various platforms. In this work, we applied the method of detector transformation to convert the BESIII detector description from the offline software framework into the 3D detector modeling in Unity. By matching the geometric units with detector identifiers, the new event display system based on Unity can be developed for BESIII. The potential for further application development into virtual reality will also be introduced.
Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis.
In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source. This view is generated on demand and the data is not copied. The column readers are generated based on the run-time type of the views. The readers are passed to a generated source derived from ROOT::RDF::RDataSource.
The ak.from_rdataframe function converts the selected columns as native Awkward Arrays.
We discuss the details of the implementation exploiting JIT techniques. We present examples of analysis of data stored in Awkward Arrays via a high-level interface of an RDataFrame.
We show a few examples of the column definition, applying user-defined filters written in C++, and plotting or extracting the columnar data as Awkward Arrays.
We discuss current limitations and future plans.
We present a revived version of the CERNLIB, the basis for software
ecosystems of most of the pre-LHC HEP experiments. The efforts to
consolidate the CERNLIB are part of the activities of the Data Preservation
for High Energy Physics collaboration to preserve data and software of
the past HEP experiments.
The presented version is based on the CERNLIB version 2006 with numerous
patches made for the compatibility with modern compilers and operating systems.
The code is available publicly in the CERN GitLab repository with all
the development history starting from the early 1990s. The updates also
include a re-implementation of the build system in cmake to make CERNLIB
compliant with the current best practices and to increase the chances of
preserving the code in a compilable state for the decades to come.
The revived CERNLIB project also includes an updated documentation, which we
believe is a cornerstone for any preserved software depending on it.
Identifying and locating proton-proton collisions in LHC experiments (known as primary vertices or PVs) has been the topic of numerous conference talks in the past few years (2019-2021). Efforts to search for a variety of potential architectures have yielded potential candidates for PV-finder. The UNet model, for example, has achieved an efficiency of 98% with a low false-positive rate. These results can be obtained with numerous other neural network architectures. It also converges faster than any previous model. While this does not answer the question of how the algorithm learns, it does provide some useful insights into the open question. We present the results from this architectural study of different algorithms and their performance in locating PVs for LHCb data. The goal is to demonstrate progress in developing a performant architecture and evaluate different algorithms' learning.
The FairRoot software stack is a toolset for the simulation, reconstruction, and analysis of high energy particle physics experiments (currently used i.e. at FAIR/GSI, and CERN). In this work we give insight into recent improvements of Continuous Integration (CI) for this software stack. CI is a modern software engineering method to efficiently assure software quality. We discuss relevant development workflows and how they were improved through automation. Furthermore, we present our infrastructure detailing its hardware and software design choices. The entire toolchain is composed of free and open source software. Finally, this work concludes with lessons learned from an operational as well as a user perspective and outlines ideas for future improvements.
After a successful adoption of Rucio following its inception in 2018 as the new data management system, a subsequent step is to advertise this to the users among other stakeholders. In this perspective, one of the objectives is to keep improving the tooling around Rucio. As Rucio introduces a new data management paradigm w.r.t the previous model, we begin by tackling the challenges arising from such a shift in the data model, while trying to alleviate the impact on users. Thus we focus on building a monitoring system capable of answering questions that do not naturally fit the current paradigm while also providing new features and services for the users to naturally push further the adoption and the benefits of the new implementation. In this regard, we present the process of development and evolution path of a set of new interfaces dedicated to the extension of the current monitoring infrastructure and the integration of a user-dedicated CLI capable of granting users an almost seamless transition and enhancement for their daily data management activity. We try to maintain minimum dependencies and ensure decoupling to these tools making them of potential use for other experiments. These will form a set of extensions to the Rucio API that is intended at automating a series of most frequent use cases. Eventually enhancing the user experience and lowering the barriers for newcomers.
In High Energy Physics (HEP) experiment, Data Quality Monitoring (DQM) system is crucial to ensure the correct and smooth operation of the experimental apparatus during the data taking. DQM at Jiangmen Underground Neutrino Observatory (JUNO) will reconstruct raw data directly from JUNO Data Acquisition (DAQ) system and use event visualization tools to show the detector performance for high quality data taking. The strategy of the JUNO DQM, as well as its design and performance will be presented.
The common ALICE-FAIR software framework ALFA offers a platform for simulation, reconstruction and analysis of particle physics experiments. FairMQ is a module of ALFA that provides building blocks for distributed data processing pipelines, composed out of components communicating via message passing. FairMQ integrates and efficiently utilizes standard industry data transport technologies, while hiding the transport details behind an abstract interface. In this work we present the latest developments in FairMQ, focusing on the new and improved features of the transport layer, primarily the shared memory transport and the generic interface features. Furthermore, we present the new control and configuration facilities, that allow programmatically controlling a group of FairMQ components. Additionally, new debugging and monitoring tools are highlighted. Finally, we outline how these tools are used by the ALICE experiment.
We evaluate two Generative Adversarial Network (GAN) models developed by the COherent Muon to Electron Transition (COMET) collaboration to generate sequences of particle hits in a Cylindrical Drift Chamber (CDC). The models are first evaluated by measuring the similarity between distributions of particle-level, physical features. We then measure the Effectively Unbiased Fréchet Inception Distance (FID) between distributions of high-dimensional representations obtained with: InceptionV3; then a version of InceptionV3 fine-tuned for event classification; and a 3D Convolutional Neural Network that has been specifically designed for event classification. We also normalize the obtained FID values by the FID for two sets of real samples, setting the scores for different representations on the same scale. This novel relative FID metric is used to compare our GAN models to state-of-the-art natural image generative models.
The Mu2e experiment will search for the CLFV neutrinoless coherent conversion of muon to electron, in the field of an Aluminium nucleus. A custom offline event display has been developed for Mu2e using TEve, a ROOT based 3-D event visualisation framework. Event displays are crucial for monitoring and debugging during live data taking as well as for public outreach. A custom GUI allows event selection and navigation. Reconstructed data like the tracks, hits and clusters can be displayed within the detector geometries upon GUI request. True Monte Carlo trajectory of particles traversing the muon beam line, obtained directly from Geant4 can also be displayed. Tracks are coloured according to their particle ID and users can select the trajectories to be displayed. Reconstructed tracks are refined using a Kalman filter. The resulting tracks can be displayed alongside truth information, allowing visualisation of the track resolution. The user can remove/add data based on energy deposited in a detector or arrival time. This is a prototype and an online event display, is currently under-development using Eve-7 which allows remote access for live data taking and lets multiple users to simultaneously view and interact with the display.
The CMS software framework (CMSSW) has been recently extended to perform part of the physics reconstruction with NVIDIA GPUs. To avoid writing a different implementations of the code for each back-end the decision was to use a performance portability library and so Alpaka has been chosen as the solution for Run-3.
In the meantime different studies have been performed to test the track reconstruction and clustering algorithms on different back-ends like CUDA and Alpaka.
With the idea of exploring new solutions, INTEL GPUs have been considered as a new possible back-end and their implementation is currently under development.
This is achieved using SYCL, that is a cross-platform abstraction C++ programming model for heterogeneous computing. It allows developers to reuse code across different hardware and also perform custom tuning for a specific accelerator. The SYCL implementation used is the Data Parallel C++ library (DPC++) in the Intel oneAPI Toolkit.
In this work, we will present the performance of physics reconstruction algorithms on different hardware. Strengths and weaknesses of this heterogeneous programming model will also be presented.
The CMS collaboration has a growing interest in the use of heterogeneous computing and accelerators to reduce the costs and improve the efficiency of the online and offline data processing: online, the High Level Trigger is fully equipped with NVIDIA GPUs; offline, a growing fraction of the computing power is coming from GPU-equipped HPC centres. One of the topics where accelerators could be used for both online and offline processing is data compression.
In the past decade a number of research papers exploring the use of GPUs for lossless data compression have appeared in academic literature, but very few practical application have emerged. In the industry, NVIDIA has recently published the nvcomp GPU-accelerated data compression library, based on closed-source implementations of standard and dedicated algorithms. Other platforms, like the IBM Power 9 processors, offer dedicated hardware for the acceleration of data compression tasks.
In this work we review the recent developments on the use of accelerators for data compression. After summarising the recent academic research, we will measure the performance of representative open- and closed-source algorithms over CMS data, and compare it with the CPU-only algorithms currently used by ROOT and CMS (lz4, zlib, zstd).
Description of development of cascades of particles in a calorimeter of a high energy physics experiment relies on precise simulation of particle interactions with matter. It is inherently slow and constitutes a challenge for HEP experiments. Furthermore, with the upcoming high luminosity upgrade of the Large Hadron Collider and a much increased data production rate, the amount of required simulated events will increase accordingly. Several research directions investigated the use of Machine Learning (ML) based models to accelerate particular calorimeter response simulation. These models typically require a large amount of data and time for training, and the result is a specifically tuned simulation. Meanwhile, meta-learning has emerged in ML community as a fast learning algorithm using small training datasets. In this contribution, we present MetaHEP, a meta-learning approach to accelerate shower simulation in different calorimeters using very high granular data. We show its application using a calorimeter proposed for the Future Circular Collider (FCC-ee) and integration into key4hep framework.
GPU applications require a structure of array (SoA) layout for the data to achieve good memory access performance. During the development of the CMS Pixel reconstruction for GPUs, the Patatrack developers crafted various techniques to optimise the data placement in memory and its access inside GPU kernels. The work presented here gathers, automates and extends those patterns, and offers a simplified and consistent programming interface.
The work automates the creation of SoA structures, fulfilling technical requirements like cache line alignment, while optionally providing alignment and cache hinting to the compiler and range checking. Protection of read-only products of the CMS software framework (CMSSW) is also ensured with constant versions of the SoA. A compact description of the SoA is provided to minimize the size of data passed to GPU kernels. Finally, the user interface is designed to be as simple as possible, providing an AoS-like semantic allowing compact and readable notation in the code.
The result of porting of CMSSW to SoA will be presented, along with performance measurements.
In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst’s perspective, obtaining highest possible performance is desirable, but recently, some focus has been laid on studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier‘s vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. This contribution presents different approaches using a set of attacks with varying complexity. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account. Additional cross-checks against other, physics-inspired mismodeling scenarios are performed and give rise to the presumption that adversarially trained models can cope better with simulation artifacts or subtle detector effects.
In this study, jets with up to 30 particles are modelled using Normalizing Flows with Rational Quadratic Spline coupling layers. The invariant mass of the jet is a powerful global feature to control whether the flow-generated data contains the same high-level correlations as the training data. The use of normalizing flows without conditioning shows that they lack the expressive power to do this. Using the mass as a condition for the coupling transformation enhances the model's performance on all tracked metrics. In addition, we demonstrate how to sample the original mass distribution with the use of the empirical cumulative distribution function and we
study the usefulness of including an additional mass constraint in the loss term. On the JetNet dataset, our model shows state-of-the-art performance combined with a general model and stable training.
The CMS experiment employs an extensive data quality monitoring (DQM) and data certification (DC) procedure. Currently, this approach consists mainly of the visual inspection of reference histograms which summarize the status and performance of the detector. Recent developments in several of the CMS subsystems have shown the potential of computer-assisted DQM and DC using autoencoders, spotting detector anomalies with high accuracy and a much finer time granularity than previously accessible. We will discuss a case study for the CMS pixel tracker, as well as the development of a common infrastructure to host computer-assisted DQM and DC workflows. This infrastructure facilitates accessing the input histograms, provides tools for preprocessing, training and validating, and generates an overview of potential detector anomalies.
Jiangmen Underground Neutrino Observatory (JUNO), located at the southern part of China, will be the world’s largest liquid scintillator(LS) detector. Equipped with 20 kton LS, 17623 20-inch PMTs and 25600 3-inch PMTs in the central detector, JUNO will provide a unique apparatus to probe the mysteries of neutrinos, particularly the neutrino mass ordering puzzle. One of the challenges for JUNO is the high precision vertex reconstruction for reactor neutrino events. This talk will present machine learning-based vertex reconstruction in JUNO, particularly the comparison of different machine learning models as well as the optimization of the model inputs for better reconstruction performance.
The Particle Flow (PF) algorithm, used for a majority of CMS data analyses for event reconstruction, provides a comprehensive list of final-state state particle candidates and enables efficient identification and mitigation methods for simultaneous proton-proton collisions (pileup). The higher instantaneous luminosity expected during the upcoming LHC Run 3 will impose challenges for CMS event reconstruction. This will be amplified in the HL-LHC era, where luminosity and pileup rates are expected to be significantly higher. One of the approaches CMS is investigating to cope with this challenge is to adopt the heterogeneous computing architectures and accelerate event reconstruction. In this talk, we will discuss the effort to adopt the PF reconstruction to take advantage of GPU accelerators.
We will discuss the design and implementation of PF clustering for the CMS Electromagnetic and Hadronic Calorimeters using Cuda, including optimizations of the PF algorithm. The physics validation and performance of the GPU-accelerated algorithms will be demonstrated by comparing these to the CPU-based implementation.
Density Functional Theory (DFT) is an extended ab initio method used for calculating the electronic properties of molecules. Considering Hartree Fock methods, the DFT offers appropriate approximations regarding the time calculations. Recently, the DFT method has been used for discovering and analyzing protein interactions by means of calculating the free energies of these macro-molecules from short to large scales. However, calculating the ground-state energy by DFT for many-body systems of molecules as proteins, in a reasonable time with enough accuracy, is still a very challenging and intensive task for the CPU’s resources.
On the other hand, Geant4 is a toolkit for simulating the effects of energy through matter and the nature of materials with a wide range of specialized methods that include DNA and protein exploration. Unfortunately, the execution time to obtain an effective protein analysis is still a strong restriction for CPU processors. In this sense, the GeantV project searches to exploit the vectorization of CPUs, designed to tackle the problem of intensive charge of calculus at the cores of CPUs. In this work, we present the preliminary results of the partial implementation of the DFT in the Geant4 framework and the vectorized GeantV project. We show the advantages and the partial methods used for vectorizing several sub-routines in the calculus of ground-state energy for some amino acids and some molecules.
Computing resources in the Worldwide LHC Computing Grid (WLCG) have been based entirely on the x86 architecture for more than two decades. In the near future, however, heterogeneous non-x86 resources, such as ARM, POWER and Risc-V, will become a substantial fraction of the resources that will be provided to the LHC experiments, due to their presence in existing and planned world-class HPC installations. The CMS experiment, one of the four large detectors at the LHC, has started to prepare for this situation, with the CMS software stack (CMSSW) already compiled for multiple architectures. In order to allow for a production use, the tools for workload management and job distribution need to be extended to be able to exploit heterogeneous architectures.
Profiting from the opportunity to exploit the first sizable IBM Power9 allocation available on Marconi100 HPC system at CINECA, CMS developed all the needed modifications to the CMS workload management system. After a successful proof of concept, a full physics validation has been performed in order to bring the system in production. The experiences are of very high value, when it comes to commissioning of the similar (even larger) Summit HPC system at Oak Ridge, where CMS is also expecting a resource allocation. Moreover the compute power of those systems is being provided also via GPUs and this represents an extremely valuable opportunity to exploit the offloading capability already implemented in CMSSW.
The status of the current integration including the exploitation of the GPUs, the results of the validation as well as the future plans will be shown and discussed.
The Phase-2 upgrade of CMS, coupled with the projected performance of the HL-LHC, shows great promise in terms of discovery potential. However, the increased granularity of the CMS detector and the higher complexity of the collision events generated by the accelerator pose challenges in the areas of data acquisition, processing, simulation, and analysis. These challenges cannot be solved solely by increments in the computing resources available to CMS, but must be accompanied by major improvements of the computing model and computing software tools, as well as data processing software and common software tools. We present aspects of our roadmap for those improvements, focusing on the plans to reduce storage and CPU needs as well as take advantage of heterogeneous platforms, such as the ones equipped with GPUs, and High Performance Computing Centers. We describe the most prominent research and development activities being carried out in the experiment, demonstrating their potential effectiveness in either mitigating risks or quantitatively reducing computing resource needs on the road to the HL-LHC.
The CMS Level-1 Trigger, for its operation during Phase-2 of LHC, will undergo a significant upgrade and redesign. The new trigger system, based on multiple families of custom boards, equipped with Xilinx Ultrascale Plus FPGAs and interconnected with high speed optical links at 25 Gb/s, will exploit more detailed information from the detector subsystems (calorimeter, muon systems, tracker). In contrast to its implementation during Phase-1, information from the CMS tracker is now also available at the Level-1 Trigger and can be used for particle flow algorithms. The final stage of the Level-1 Trigger, called Global Trigger (GT), will receive more than 20 different trigger object collections from upstream systems and will be able to evaluate a menu of more than 1000 cut-based algorithms distributed over 12 boards. These algorithms may not only apply conditions on parameters such as momentum or angle of a particle, but can also do arithmetic calculations, like the invariant mass of a suspected mother particle of interest or the angle between two particles. The Global Trigger is designed as a modular system, with an easily re-configurable algorithm unit, to meet the demand of high flexibility required for shifting trigger strategies during Phase-2 operation of the LHC. The algorithms themselves are kept highly configurable and tools are provided to allow their study from within the CMS offline software framework (CMSSW) without the need for knowledge of the underlying firmware implementation. To allow the reproducible translation of the physicist-designed trigger menu to VHDL for use in the hardware trigger, a tool has been developed that converts the Python-based configuration used by CMSSW to VHDL. In addition to cut-based algorithms, neural net algorithms are being developed and integrated into the Global Trigger framework. To make use of these algorithms in hardware, the HLS4ML framework is used, which transpiles pre-trained neural nets, generated in the most commonly used software frameworks, into firmware code. A prototype firmware for a single Global Trigger board has been developed, which includes the de-multiplexing logic, conversion to an internal common object format and distribution of the data over all Super Logic Regions. In this framework 312 algorithms are implemented at a clock speed of 480MHz. The prototype has been thoroughly tested and verified with the bit-wise compatible C++ emulator. In this contribution we present the Phase-2 Global Trigger with an emphasis on the Global Trigger algorithms, their implementation in hardware, configuration with Python and the novel integration within the CMS offline software framework (CMSSW).
With the start of run 3 in 2022, the LHC has entered a new period, now delivering higher energy and luminosity proton beams to the Compact Muon Solenoid (CMS) experiment. These increases make it critical to maintain and upgrade the tools and methods used to monitor the rate at which data is collected (the trigger rate). Software tools have been developed to allow for automated rate monitoring, and we present several upgrades to these software tools, which maintain and expand on their functionality. These trigger rate monitoring tools allow for real-time monitoring including alerts which go out to on-call experts in the case of abnormalities. Fits are produced from previously collected data and extrapolate the behaviors of the triggers as a function of pile-up (the average number of particle interactions per bunch-crossing). These fits allow for visualization and statistical analysis of the behavior of the triggers and are displayed on the online monitoring system (OMS). The rate monitoring code can also be used for offline data certification and more complex trigger analysis. This presentation will show some of the upgrades to this software with an emphasis on the automation for easier and consistent upgrades and fixes to the software, and the increased interactivity with the users.
Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program.
The low-level abstraction of memory access (LLAMA) is a C++ library that provides a zero-runtime-overhead abstraction layer, underneath which memory layouts can be freely exchanged, focusing on multidimensional arrays of nested, structured data.
It provides a framework for defining and switching custom memory mappings at compile time to define data layouts, data access and access instrumentation, making LLAMA an ideal tool to tackle memory-related optimization challenges in heterogeneous computing.
After its scientific debut, several improvements and extensions have been added to LLAMA. This includes compile-time array extents for zero memory overhead, support for computations during memory access, new mappings (e.g. int/float bit-packing or byte-swapping) and more. This contribution provides an overview of the LLAMA library, its recent development and an outlook of future activities.
We present a machine-learning based method to detect deviations from a reference model, in an almost independent way with respect to the theory assumed to describe the new physics responsible for the discrepancies.
The analysis is based on an Effective Field Theory (EFT) approach: under this hypothesis the Lagrangian of the system can be written as an infinite expansion of terms, where the first ones are those from the Standard Model (SM) Lagrangian and the following terms are higher dimension operators. The presence of the EFT operators impacts the distributions of the observables by producing deviations from the shapes expected when the SM Lagrangian alone is considered .
We use a Variational AutoEncoder (VAE) trained on SM processes to identify EFT contributions as anomalies. While SM events are expected to be reconstructed properly, events generated taking into account EFT contributions are expected to be poorly reconstructed, thus accumulating in the tails of the loss function distribution. Since the training of the model does not depend on any specific new physics signature, the proposed strategy does not make specific assumptions on its nature. In order to improve the discrimination performances, we introduced a DNN classifier that distinguishes between EFT and SM events based on the values of the reconstruction and regularization losses of the model. In this second model a cross entropy term is added to the usual loss of the VAE, optimizing at the same time the reconstruction of the input variables and the classification. This procedure ensures that the model is optimized for discrimination, with a small price in terms of model independency due to the use of one of the 15 operators from the EFT model in the training.
In this talk we will discuss in detail the above-mentioned methods using generator level VBS events produced at LHC and assuming, in order to compute the significance of possible new physics contributions, an integrated luminosity of $350 fb^{-1}$.
The Belle II experiment at the second generation e+/e- B-factory SuperKEKB has been collecting data since 2019 and aims to accumulate 50 times more data than the first generation experiment, Belle.
To efficiently process these steadily growing datasets of recorded and
simulated data that end up on the order of 100 PB and to support
Grid-based analysis workflows using the DIRAC Workload Management
System, an XRootD-based caching architecture is presented.
The presented mechanism decreases job waiting time for often-used datasets by transparently adding copies of these files at smaller sites without managed storage.
The described architecture seamlessly integrates local storage services and supports the use of dynamic computing resources with minimal deployment effort.
This is especially useful in environments with many institutions providing comparatively small numbers of cores and limited personpower.
This talk will describe the implemented cache at GridKa, a main computing centre for Belle II, as well as its performance and upcoming opportunities for caching for Belle II.
Simulated event samples from Monte-Carlo event generators (MCEGs) are a backbone of the LHC physics programme.
However, for Run III, and in particular for the HL-LHC era, computing budgets are becoming increasingly constrained, while at the same time the push to higher accuracies
is making event generation significantly more expensive.
Modern ML techniques can help with the effort of creating such costly samples in two ways.
One way is to use inference models to try to learn the event distribution of the entire MCEG toolchain, or parts of it, such that events can then be generated with those \emph{replacement models}
in a fraction of the time a full MCEG would require.
This ansatz is however intrinsically constrained by the available training data.
Another way, and this is the one discussed in this talk, is to keep the MCEG,
and to use ML \emph{assistant models} to increase the efficiency of certain performance bottlenecks.
One of those bottlenecks is the sampling of the high-dimensional phase space of complex processes,
for which a given distribution must be approximated as closely as possible.
This is indeed a very generic problem, such that methods can be explored that have been developed
in entirely different fields of physics or even outside of physics.
In this talk I will discuss the potential to increase the phase space sampling efficiency
using the methods of Neural Importance Sampling and Nested Sampling,
and of neural network surrogates of the integrand to increase the efficiency of event unweighting.
The application of these methods within the \textsc{Sherpa} generator framework is then reviewed.
“Computation” has become a massive part of our daily lives; even more so, in science, a lot of experiments and analysis rely on massive computation. Under the assumption that computation is cheap, and time-to-result is the only relevant metric for all of us, we currently use computational resources at record-low efficiency.
In this talk, I argue this approach is an unacceptable waste of computing resources. I further define the goal of zero-waste computing and discuss how performance engineering methods and techniques can facilitate this goal. By means of a couple of case-studies, I will also demonstrate performance engineering at work, proving how efficiency and time-to-result can co-exist.
Lattice QCD is ab initio approach for QCD and plays an indispensable role in understanding the low energy properties of the strong interaction. Last four decades have witnessed the rapid development of the lattice QCD numerical calculation along with the progress of the high performance computing (HPC) techniques. Lattice QCD becomes one of the most resource-consuming HPC fields. China has built several native supercomputers with different hardware architectures,
such as Sunway series, Tianhe series and Sunrising-1 etc., which provide potentially massive HPC resources for lattice QCD studies.
This talk will give a brief introduction to the code developing and the performance of lattice QCD software on these strikingly different computing systems.
The INFN Tier1 data center is currently located in the premises of the Physics Department of the University of Bologna, where CNAF is also located. Soon it will be moved to the “Tecnopolo”, the new facility for research, innovation, and technological development in the same city area; it will follow the installation of Leonardo, the pre-exascale supercomputing machine managed by CINECA, co-financed as part of the EuroHPC Joint Undertaking.
The construction of the new CNAF data center will consist of two phases, corresponding to the computing requirements of LHC: Phase 1, starting from 2023, will involve an IT power of 3 MW, and Phase 2, starting from 2025, involving an IT power up to 10 MW.
The primary goal of the new data center is to cope with the computing requirements of the data taking of the HL-LHC experiments, in the time spanning from 2026 to 2040, providing, at the same time, computing services for several other INFN experiments, projects, and activities of interest, being they currently in operation, under construction, in advanced design, or even not yet defined. The co-location with Leonardo will also open new scenarios, with a close integration between the two systems able to share dynamically resources.
In this presentation we will describe the new center design, with a particular focus on the status of the migration, its schedule, and the technical challenges we have to face moving the data center without service interruption. On top of this, we will analyze the opportunities that the new infrastructure will open in the context of the PNRR (National Plan for Resilience and Recovery) funding and strategic plans, within and beyond the High Energy Physics domain.
The HERD experiment will perform direct cosmic-ray detection at the highest ever reached energies, thanks to an innovative design that maximizes the acceptance, and its placement on the future Chinese Space Station which will allow for an extended observation period."
Significant computing and storage resources are foreseen to be needed in order to cope with the necessities of a large community driving a big experimental device with an energy reach above PeV for hadrons and multi-TeV for electrons and positrons. For example, at PeV energies Monte Carlo simulations require a massive amount of computing power, and very large simulated data sets are needed for detector performance studies like electron-proton rejection.
The HERD computing infrastructure is currently being investigated and prototyped in order to provide a flexible, robust and easy to use cloud-based computing and storage platform. It is based on
technical solutions originally developed by the "Dynamic On Demand Analysis Service" (DODAS) framework in the context of projects such as INDIGO-DataCloud, EOSC-hub and XDC. It allows to seamlessly access both commercial and institutional cloud resources, in order to efficiently make use of opportunistic resources to cope with high-demand periods (like full dataset reprocessings and specialized Monte Carlo productions), as well transparently integrate with with on-premise computing resources managed by an HTCondor batch system. The cloud platform also allows for an easy and efficient deployment of services for the collaboration like calendar, document server, code repository etc. making use of available, free open source solutions. Finally, an Indigo-IAM instance provides a Single-Sign-On service for access control for the whole infrastructure.
An overview of the current status and of the future perspectives will be presented.
The ReCaS-Bari datacenter enriches its service portfolio providing a new HPC/GPU cluster for Bari University and INFN users. This new service is the best solution for complex applications requiring a massively parallel processing architecture. The cluster is equipped with cutting edge Nvidia GPUs, like V100 and A100, suitable for those applications able to use all the available parallel hardware. Artificial intelligence, complex model simulation (weather and earthquake forecasts, molecular dynamics and galaxy formation) and all high precision floating-point based applications are possible candidates to be executed on the new service. The cluster is composed of 10 machines with a total computing resource equals to 1755 cores, 13.7 TB RAM, 55 TB local disk and 38 high performance GPUs (18 Nvidia A100 and 20 Nvidia V100). Each node can access the ReCaS-Bari distributed storage based on GPFS equals to 8.3 PB. Applications are executed only within Docker containers, conferring to the HPC/GPU cluster features like easy application configuration and execution, reliability, flexibility and security. Currently, users are able to choose among different ready-to-use services like remote IDEs (Jupyter Notebook and RStudio), by which execute GPU based applications, or a job orchestration to whom submit complex workflow represented as DAG (Directed Acyclic Graphs). The user service portfolio is in evolution. If the provided user services do not cover the user needs, user-defined Docker containers can be executed on the Cluster. Long running services and job submission are managed with Marathon and Chronos respectively, two frameworks running along with Apache Mesos. These three tools add high availability, fault tolerant and security additional to the native capacity to manage all compute resources and user requests. The implemented technological solution allows users to continue to access their own data both from HTC cluster (based on HTCondor) and from HPC/GPU Cluster, based on Mesos.
The first phase, where local beta-testers used the cluster, concluded successfully. The service is now ready to join the national INFN-Cloud federation. Leveraging the INDIGO PaaS orchestrator, provides multiple ready-to-used frameworks and services (ML_INFN, Apache Spark, JupyterLab, …), a stable and secure authentication layer, a simple web dashboard that can be used to deploy services on top of and an heterogeneous set of resources. The evolution of the service, where a performance evaluation of Kubernetes as replacement of Apache Mesos, is in the pipeline.
In this contribution will be presented and discussed resources and technological solutions related to the HPC/GPU Cluster in the ReCaS-Bari data center and the most important applications running on the cluster.
The power consumption of computing is coming under intense scrutiny worldwide, driven both by concerns about the carbon footprint, and by rapidly rising energy costs.
ARM chips, widely used in mobile devices due to their power efficiency, are not currently in widespread use as capacity hardware on the Worldwide LHC Computing Grid.
However, the LHC experiments are increasingly able to compile their workloads on the ARM architecture to take advantage of various HPC facilities (e.g., ATLAS, CMS).
The work described in this paper attempts to compare the energy consumption of various workloads on two almost identical machines, one with an arm64 CPU and the other with a standard AMD x86_64 CPU, operating in identical conditions.
This builds on our initial study of two rather dissimilar machines, located at different UK Universities, which produced some interesting, but at times contradictory, results, showing the need to control the comparison more closely.
The set of benchmarks used include CPU intensive, memory intensive, and I/O bound tasks, ranging from simple scripts, through compiled C programs, to typical HEP workloads (full ATLAS simulations).
We also plan to test the most recent HEPscore containerized jobs, which are actively being developed to match LHC Run3 conditions and can already target different architectures.
The results compare both the power consumption and execution time of the same workload on the two different architectures (arm64 and x86_64).
This will help inform Grid sites whether there are any scenarios where power efficiency can be improved for LHC computing by deploying ARM-based hardware.
Track fitting and track hit classification are highly relevant, hence these two approaches could benefit each other. For example, if we know the underlying parameters of a track, then track hits associated with the track can be easily identified. On the other hand, if we know the hits of a track, then we can get underlying parameters by fitting them. Most existing works take the second scheme by classifying track hits and then estimating track parameters.
Inspired by the above observations and the success of multi-task training, we propose a unified framework to address track fitting and track hit classification simultaneously in an end-to-end fashion. The method takes hits from multiple tracks as inputs, where each hit holds 4-dimensional features, including 2D position, hitting time, and deposit charge. We feed these inputs to a backbone network to extract per-hit features. Then the network is divided into two branches. One branch is a reconstruction branch, which estimates the parameters of each track and its existence. The other branch is a track segmentation branch, which takes learned features of PointNet++ and tracks features to determine a hit-wise track assignment. In essence, we can assign each track hit to its potential track to classify track hits. This method allows us to predict the track parameters of a track candidate while conducting per-track hit classification. This study leverages the simulated multi-track samples of the BESIII drift chamber. Preliminary results indicate our framework is able to categorize hits of different tracks and the candidate track parameters simultaneously.
Graph Neural Networks (GNN) have recently attained competitive particle track reconstruction performance compared to traditional approaches such as combinatorial Kalman filters. In this work, we implement a version of Hierarchical Graph Neural Networks (HGNN) for track reconstruction, which creates the hierarchy dynamically. The HGNN creates “supernodes” by pooling nodes into clusters, and builds a “supergraph” which enables message passing among supernodes. A new differentiable pooling algorithm that can maintain the sparsity and produce variable number of supernodes is proposed to facilitate the hierarchy construction. We perform an apples-to-apples comparison between the Interaction Network (IN) and HGNN on track finding performance using node embedding metric learning, which shows that in general HGNNs are more robust against imperfectly constructed input graphs, and more powerful in recognizing long-distance patterns. Equipped with soft assignment, HGNN also allows assigning a given hit to multiple track candidates. The HGNN model can be used as a node-supernode pair classifier, where supernodes are considered to be track candidates. Under this regime, the pair-classifying HGNN is even more powerful than the node embedding HGNN. We show that the HGNN can not only improve upon the performance of common GNN architectures on embedding and clustering problems but also opens up other approaches for GNNs in high energy physics.
Particle track reconstruction poses a key computing challenge for future collider experiments. Quantum computing carries the potential for exponential speedups and the rapid progress in quantum hardware might make it possible to address the problem of particle tracking in the near future. The solution of the tracking problem can be encoded in the ground state of a Quadratic Unconstrained Binary Optimization. In our study, sets of three hits in the detector are grouped into triplets. True triplets are part of trajectories of particles, while false triplets are random combinations of three hits. By approximating the ground state, the Variational Quantum Eigensolver algorithm aims at identifying true triplets. Different circuits and optimizers are tested for small instances of the tracking problem with up to 23 triplets. Precision and recall are determined in a noiseless simulation and the effects of readout errors are studied. It is planned to repeat the experiments on real hardware and to combine the solutions of small instances to address the full-scale tracking problem.
As part of the Run 3 upgrade, the LHCb experiment has switched to a two stage event trigger, fully implemented in software. The first stage of this trigger, running in real time at the collision rate of 30MHz, is entirely implemented on commercial off-the-shelf GPUs and performs a partial reconstruction of the events.
We developed a novel strategy for this reconstruction, starting with two independent tracking algorithms, in the VELO and SciFi detectors, forming track segments which are then matched and merged to form full tracks, suitable for selecting events at LHCb. A key point enabling this sequence is the SciFi tracking algorithm, which was implemented for GPU with special care in order to meet the throughput requirements of a real time trigger.
Developing such algorithm is challenging due to the high number of track hypothesis that needs to be tested. We discuss how this challenge was overcome by using the GPU architecture efficiently and how the efficiency of the new sequence is compared to the current baseline reconstruction.
The use of hardware acceleration, particularly of GPGPUs is one promising strategy for coping with the computing demands in the upcoming high luminosity era of the LHC and beyond. Track reconstruction, in particular, suffers from exploding combinatorics and thus could greatly profit from the massively parallel nature of GPGPUs and other accelerators. However, classical pattern recognition algorithms and their current implementations, albeit very successfully deployed in the CPU based software of current LHC experiments, show several shortcomings when adapted to modern accelerator architectures; the geometry, for example, is often characterized by runtime-polymorphic shapes, which are incompatible with common heterogeneous programming platforms. In addition, field integration modules need efficient access to the magnetic field on a variety of devices, and adaptive Runge-Kutta methods may cause thread divergence.
In order to investigate whether state-of-the-art CPU based track reconstruction software can be adapted to run efficiently on GPUs, the ACTS project has launched a dedicated R&D program aiming to develop a demonstrator that mirrors the current track reconstruction chain based on seed finding followed by a combinatorial Kalman filter available in the ACTS suite. We demonstrate the implementation and performance of a core component of this chain: the propagation of track parameters and their associated covariances through a non-homogenous magnetic field including the navigation through a highly complex geometry with different shapes together with the application of material effects when passing through detector material. This demonstrator showcases the usage of the detray library for geometry description and navigation, the covfie library for an efficient description and interpolation of a complex magnetic field on different hardware backends, a dedicated algebra plugin that allows using different math implemenations, and is based on the vecmem library, which has been developed to handle memory resources on host and device. We demonstrate that it is possible to perform this task using single-source code across multiple devices, and we compare the performance of this heterogeneous reconstruction chain to existing CPU-based code in the ACTS project.
The computation of loop integrals is required in high energy physics to account for higher-order corrections of the interaction cross section in perturbative quantum field theory. Depending on internal masses and external momenta, loop integrals may suffer from singularities where the integrand denominator vanishes at the boundaries, and/or in the interior of the integration domain (for physical kinematics).
In previous work we implemented iterated integration numerically using one- or low-dimensional adaptive integration algorithms in subsequent coordinate directions, enabling intensive subdivision in the vicinity of singularities. To handle a threshold singularity originating from a vanishing denominator in the interior of the domain, we add a term (for example, $i\delta$) in the denominator, and perform a nonlinear extrapolation to a sequence of integrals obtained for a (geometrically) decreasing sequence of $\delta.$
In addition this may give rise to UV singularities, treated by dimensional regularization, where the space-time dimension $n = 4$ is replaced by $n = 4-2\varepsilon$ for a sequence of $\varepsilon$ values, and a linear extrapolation is applied as $\varepsilon$ tends to zero. Presence of both types of singularities may warrant a double extrapolation. In this paper we will devise and apply a strategy for loop integral computations by combining these methods as needed for a set of Feynman diagrams. In view of the compute-intensive nature, the code is further multi-threaded to run in a shared memory environment.
Evaluating loop amplitudes is a time-consuming part of LHC event generation. For di-photon production with jets we show that simple, Bayesian networks can learn such amplitudes and model their uncertainties reliably. A boosted training of the Bayesian network further improves the uncertainty estimate and the network precision in critical phase space regions. In general, boosted network training of Bayesian networks allows us to move between fit-like and interpolation-like regimes of network training.
Evaluation of one-loop matrix elements is computationally expensive and makes up a large proportion of time during event generation. We present a neural network emulator that builds in the factorisation properties of matrix elements which accurately reproduces the NLO k-factors for electron-position annihilation into up to 5 jets.
We show that our emulator retains good performance for high multiplicities and that there is a significant speed advantage over more traditional loop provider tools.
In this talk I will give an overview of our recent progress in developing anomaly detection methods for finding new physics at the LHC. I will discuss how we define anomalies in this context, and the deep learning tools that we can use to find them. I will also discuss how self-supervised representation learning techniques can be used to enhance anomaly detection methods.
Local Unitarity provides an order-by-order representation of perturbative cross-sections that realises at the local level the cancellation of final-state collinear and soft singularities predicted by the KLN theorem. The representation is obtained by manipulating the real and virtual interference diagrams contributing to transition probabilities using general local identities. As a consequence, the Local Unitarity representation can be directly integrated using Monte Carlo methods and without the need of infrared counter-terms. I will present first results from this new approach with examples up to N3LO accuracy. I will conclude by giving an outlook on future generalisations of the method applicable to hadronic collisions.
Over the past few years, intriguing deviations from the Standard Model predictions have been reported in measurements of angular observables and branching fractions of $B$ meson decays, suggesting the existence of a new interaction that acts differently on the three lepton families. The Belle II experiment has unique features that allow to study $B$ meson decays with invisible particles in the final state, in particular neutrinos. It is possible to deduce the presence of such particles from the energy-momentum imbalance obtained after reconstructing the companion $B$ meson produced in the event. This task is complicated by the thousands of possible final states $B$ mesons can decay into, and is currently performed at Belle II by the Full Event Interpretation (FEI) software, an algorithm based on Boosted Decision Trees and limited to specific, hard-coded decay processes.
In recent years, graph neural networks have proven to be very effective tools to describe relations in physical systems, with applications in a range of fields. Particle decays can be naturally represented in the form of rooted, acyclic tree graphs, with nodes corresponding to particles and edges representing the parent-child relations between them. In this work, we present a graph neural network approach to generically reconstruct $B$ decays at Belle II by exploiting the information from the detected final state particles, without formulating any prior assumption about the nature of the decay. This task is performed by reconstructing the Lowest Common Ancestor matrix, a novel representation, equivalent to the adjacency matrix, that allows reconstruction of the decay from the final state particles alone. Preliminary results show that the graph neural network approach outperform the FEI by a factor of at least 3.
Detector modeling and visualization are essential in the life cycle of a High Energy Physics (HEP) experiment. Unity is a professional multi-media creation software that has the advantages of rich visualization effects and easy deployment on various platforms. In this work, we applied the method of detector transformation to convert the BESIII detector description from the offline software framework into the 3D detector modeling in Unity. By matching the geometric units with detector identifiers, the new event display system based on Unity can be developed for BESIII. The potential for further application development into virtual reality will also be introduced.
RooFit is a toolkit for statistical modeling and fitting used by most experiments in particle physics. Just as data sets from next-generation experiments grow, processing requirements for physics analysis become more computationally demanding, necessitating performance optimizations for RooFit. One possibility to speed-up minimization and add stability is the use of automatic differentiation (AD). Unlike for numerical differentiation, the computation cost scales linearly with the number of parameters, making AD particularly appealing for statistical models with many parameters. In this talk, we report on one possible way to implement AD in RooFit. Our approach is to add a facility to generate C++ code for a full RooFit model automatically. Unlike the original RooFit model, this generated code is free of virtual function calls and other RooFit-specific overhead. In particular, this code is then used to produce the gradient automatically with Clad. Clad is a source transformation AD tool implemented as a plugin to the clang compiler, which automatically generates the derivative code for input C++ functions. We show results demonstrating the improvements observed when applying this code generation strategy to HistFactory and other commonly used RooFit models. HistFactory is the subcomponent of RooFit that implements binned likelihood models with probability densities based on histogram templates. These models frequently have a very large number of free parameters, and are thus an interesting first target for AD support in RooFit.
Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis.
In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source. This view is generated on demand and the data is not copied. The column readers are generated based on the run-time type of the views. The readers are passed to a generated source derived from ROOT::RDF::RDataSource.
The ak.from_rdataframe function converts the selected columns as native Awkward Arrays.
We discuss the details of the implementation exploiting JIT techniques. We present examples of analysis of data stored in Awkward Arrays via a high-level interface of an RDataFrame.
We show a few examples of the column definition, applying user-defined filters written in C++, and plotting or extracting the columnar data as Awkward Arrays.
We discuss current limitations and future plans.
We present a revived version of the CERNLIB, the basis for software
ecosystems of most of the pre-LHC HEP experiments. The efforts to
consolidate the CERNLIB are part of the activities of the Data Preservation
for High Energy Physics collaboration to preserve data and software of
the past HEP experiments.
The presented version is based on the CERNLIB version 2006 with numerous
patches made for the compatibility with modern compilers and operating systems.
The code is available publicly in the CERN GitLab repository with all
the development history starting from the early 1990s. The updates also
include a re-implementation of the build system in cmake to make CERNLIB
compliant with the current best practices and to increase the chances of
preserving the code in a compilable state for the decades to come.
The revived CERNLIB project also includes an updated documentation, which we
believe is a cornerstone for any preserved software depending on it.
Identifying and locating proton-proton collisions in LHC experiments (known as primary vertices or PVs) has been the topic of numerous conference talks in the past few years (2019-2021). Efforts to search for a variety of potential architectures have yielded potential candidates for PV-finder. The UNet model, for example, has achieved an efficiency of 98% with a low false-positive rate. These results can be obtained with numerous other neural network architectures. It also converges faster than any previous model. While this does not answer the question of how the algorithm learns, it does provide some useful insights into the open question. We present the results from this architectural study of different algorithms and their performance in locating PVs for LHCb data. The goal is to demonstrate progress in developing a performant architecture and evaluate different algorithms' learning.
The FairRoot software stack is a toolset for the simulation, reconstruction, and analysis of high energy particle physics experiments (currently used i.e. at FAIR/GSI, and CERN). In this work we give insight into recent improvements of Continuous Integration (CI) for this software stack. CI is a modern software engineering method to efficiently assure software quality. We discuss relevant development workflows and how they were improved through automation. Furthermore, we present our infrastructure detailing its hardware and software design choices. The entire toolchain is composed of free and open source software. Finally, this work concludes with lessons learned from an operational as well as a user perspective and outlines ideas for future improvements.
After a successful adoption of Rucio following its inception in 2018 as the new data management system, a subsequent step is to advertise this to the users among other stakeholders. In this perspective, one of the objectives is to keep improving the tooling around Rucio. As Rucio introduces a new data management paradigm w.r.t the previous model, we begin by tackling the challenges arising from such a shift in the data model, while trying to alleviate the impact on users. Thus we focus on building a monitoring system capable of answering questions that do not naturally fit the current paradigm while also providing new features and services for the users to naturally push further the adoption and the benefits of the new implementation. In this regard, we present the process of development and evolution path of a set of new interfaces dedicated to the extension of the current monitoring infrastructure and the integration of a user-dedicated CLI capable of granting users an almost seamless transition and enhancement for their daily data management activity. We try to maintain minimum dependencies and ensure decoupling to these tools making them of potential use for other experiments. These will form a set of extensions to the Rucio API that is intended at automating a series of most frequent use cases. Eventually enhancing the user experience and lowering the barriers for newcomers.
In High Energy Physics (HEP) experiment, Data Quality Monitoring (DQM) system is crucial to ensure the correct and smooth operation of the experimental apparatus during the data taking. DQM at Jiangmen Underground Neutrino Observatory (JUNO) will reconstruct raw data directly from JUNO Data Acquisition (DAQ) system and use event visualization tools to show the detector performance for high quality data taking. The strategy of the JUNO DQM, as well as its design and performance will be presented.
The common ALICE-FAIR software framework ALFA offers a platform for simulation, reconstruction and analysis of particle physics experiments. FairMQ is a module of ALFA that provides building blocks for distributed data processing pipelines, composed out of components communicating via message passing. FairMQ integrates and efficiently utilizes standard industry data transport technologies, while hiding the transport details behind an abstract interface. In this work we present the latest developments in FairMQ, focusing on the new and improved features of the transport layer, primarily the shared memory transport and the generic interface features. Furthermore, we present the new control and configuration facilities, that allow programmatically controlling a group of FairMQ components. Additionally, new debugging and monitoring tools are highlighted. Finally, we outline how these tools are used by the ALICE experiment.
We evaluate two Generative Adversarial Network (GAN) models developed by the COherent Muon to Electron Transition (COMET) collaboration to generate sequences of particle hits in a Cylindrical Drift Chamber (CDC). The models are first evaluated by measuring the similarity between distributions of particle-level, physical features. We then measure the Effectively Unbiased Fréchet Inception Distance (FID) between distributions of high-dimensional representations obtained with: InceptionV3; then a version of InceptionV3 fine-tuned for event classification; and a 3D Convolutional Neural Network that has been specifically designed for event classification. We also normalize the obtained FID values by the FID for two sets of real samples, setting the scores for different representations on the same scale. This novel relative FID metric is used to compare our GAN models to state-of-the-art natural image generative models.
The Mu2e experiment will search for the CLFV neutrinoless coherent conversion of muon to electron, in the field of an Aluminium nucleus. A custom offline event display has been developed for Mu2e using TEve, a ROOT based 3-D event visualisation framework. Event displays are crucial for monitoring and debugging during live data taking as well as for public outreach. A custom GUI allows event selection and navigation. Reconstructed data like the tracks, hits and clusters can be displayed within the detector geometries upon GUI request. True Monte Carlo trajectory of particles traversing the muon beam line, obtained directly from Geant4 can also be displayed. Tracks are coloured according to their particle ID and users can select the trajectories to be displayed. Reconstructed tracks are refined using a Kalman filter. The resulting tracks can be displayed alongside truth information, allowing visualisation of the track resolution. The user can remove/add data based on energy deposited in a detector or arrival time. This is a prototype and an online event display, is currently under-development using Eve-7 which allows remote access for live data taking and lets multiple users to simultaneously view and interact with the display.
The CMS software framework (CMSSW) has been recently extended to perform part of the physics reconstruction with NVIDIA GPUs. To avoid writing a different implementations of the code for each back-end the decision was to use a performance portability library and so Alpaka has been chosen as the solution for Run-3.
In the meantime different studies have been performed to test the track reconstruction and clustering algorithms on different back-ends like CUDA and Alpaka.
With the idea of exploring new solutions, INTEL GPUs have been considered as a new possible back-end and their implementation is currently under development.
This is achieved using SYCL, that is a cross-platform abstraction C++ programming model for heterogeneous computing. It allows developers to reuse code across different hardware and also perform custom tuning for a specific accelerator. The SYCL implementation used is the Data Parallel C++ library (DPC++) in the Intel oneAPI Toolkit.
In this work, we will present the performance of physics reconstruction algorithms on different hardware. Strengths and weaknesses of this heterogeneous programming model will also be presented.
The CMS collaboration has a growing interest in the use of heterogeneous computing and accelerators to reduce the costs and improve the efficiency of the online and offline data processing: online, the High Level Trigger is fully equipped with NVIDIA GPUs; offline, a growing fraction of the computing power is coming from GPU-equipped HPC centres. One of the topics where accelerators could be used for both online and offline processing is data compression.
In the past decade a number of research papers exploring the use of GPUs for lossless data compression have appeared in academic literature, but very few practical application have emerged. In the industry, NVIDIA has recently published the nvcomp GPU-accelerated data compression library, based on closed-source implementations of standard and dedicated algorithms. Other platforms, like the IBM Power 9 processors, offer dedicated hardware for the acceleration of data compression tasks.
In this work we review the recent developments on the use of accelerators for data compression. After summarising the recent academic research, we will measure the performance of representative open- and closed-source algorithms over CMS data, and compare it with the CPU-only algorithms currently used by ROOT and CMS (lz4, zlib, zstd).
Description of development of cascades of particles in a calorimeter of a high energy physics experiment relies on precise simulation of particle interactions with matter. It is inherently slow and constitutes a challenge for HEP experiments. Furthermore, with the upcoming high luminosity upgrade of the Large Hadron Collider and a much increased data production rate, the amount of required simulated events will increase accordingly. Several research directions investigated the use of Machine Learning (ML) based models to accelerate particular calorimeter response simulation. These models typically require a large amount of data and time for training, and the result is a specifically tuned simulation. Meanwhile, meta-learning has emerged in ML community as a fast learning algorithm using small training datasets. In this contribution, we present MetaHEP, a meta-learning approach to accelerate shower simulation in different calorimeters using very high granular data. We show its application using a calorimeter proposed for the Future Circular Collider (FCC-ee) and integration into key4hep framework.
GPU applications require a structure of array (SoA) layout for the data to achieve good memory access performance. During the development of the CMS Pixel reconstruction for GPUs, the Patatrack developers crafted various techniques to optimise the data placement in memory and its access inside GPU kernels. The work presented here gathers, automates and extends those patterns, and offers a simplified and consistent programming interface.
The work automates the creation of SoA structures, fulfilling technical requirements like cache line alignment, while optionally providing alignment and cache hinting to the compiler and range checking. Protection of read-only products of the CMS software framework (CMSSW) is also ensured with constant versions of the SoA. A compact description of the SoA is provided to minimize the size of data passed to GPU kernels. Finally, the user interface is designed to be as simple as possible, providing an AoS-like semantic allowing compact and readable notation in the code.
The result of porting of CMSSW to SoA will be presented, along with performance measurements.
In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst’s perspective, obtaining highest possible performance is desirable, but recently, some focus has been laid on studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier‘s vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. This contribution presents different approaches using a set of attacks with varying complexity. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account. Additional cross-checks against other, physics-inspired mismodeling scenarios are performed and give rise to the presumption that adversarially trained models can cope better with simulation artifacts or subtle detector effects.
In this study, jets with up to 30 particles are modelled using Normalizing Flows with Rational Quadratic Spline coupling layers. The invariant mass of the jet is a powerful global feature to control whether the flow-generated data contains the same high-level correlations as the training data. The use of normalizing flows without conditioning shows that they lack the expressive power to do this. Using the mass as a condition for the coupling transformation enhances the model's performance on all tracked metrics. In addition, we demonstrate how to sample the original mass distribution with the use of the empirical cumulative distribution function and we
study the usefulness of including an additional mass constraint in the loss term. On the JetNet dataset, our model shows state-of-the-art performance combined with a general model and stable training.
The CMS experiment employs an extensive data quality monitoring (DQM) and data certification (DC) procedure. Currently, this approach consists mainly of the visual inspection of reference histograms which summarize the status and performance of the detector. Recent developments in several of the CMS subsystems have shown the potential of computer-assisted DQM and DC using autoencoders, spotting detector anomalies with high accuracy and a much finer time granularity than previously accessible. We will discuss a case study for the CMS pixel tracker, as well as the development of a common infrastructure to host computer-assisted DQM and DC workflows. This infrastructure facilitates accessing the input histograms, provides tools for preprocessing, training and validating, and generates an overview of potential detector anomalies.
Jiangmen Underground Neutrino Observatory (JUNO), located at the southern part of China, will be the world’s largest liquid scintillator(LS) detector. Equipped with 20 kton LS, 17623 20-inch PMTs and 25600 3-inch PMTs in the central detector, JUNO will provide a unique apparatus to probe the mysteries of neutrinos, particularly the neutrino mass ordering puzzle. One of the challenges for JUNO is the high precision vertex reconstruction for reactor neutrino events. This talk will present machine learning-based vertex reconstruction in JUNO, particularly the comparison of different machine learning models as well as the optimization of the model inputs for better reconstruction performance.
The Particle Flow (PF) algorithm, used for a majority of CMS data analyses for event reconstruction, provides a comprehensive list of final-state state particle candidates and enables efficient identification and mitigation methods for simultaneous proton-proton collisions (pileup). The higher instantaneous luminosity expected during the upcoming LHC Run 3 will impose challenges for CMS event reconstruction. This will be amplified in the HL-LHC era, where luminosity and pileup rates are expected to be significantly higher. One of the approaches CMS is investigating to cope with this challenge is to adopt the heterogeneous computing architectures and accelerate event reconstruction. In this talk, we will discuss the effort to adopt the PF reconstruction to take advantage of GPU accelerators.
We will discuss the design and implementation of PF clustering for the CMS Electromagnetic and Hadronic Calorimeters using Cuda, including optimizations of the PF algorithm. The physics validation and performance of the GPU-accelerated algorithms will be demonstrated by comparing these to the CPU-based implementation.
Density Functional Theory (DFT) is an extended ab initio method used for calculating the electronic properties of molecules. Considering Hartree Fock methods, the DFT offers appropriate approximations regarding the time calculations. Recently, the DFT method has been used for discovering and analyzing protein interactions by means of calculating the free energies of these macro-molecules from short to large scales. However, calculating the ground-state energy by DFT for many-body systems of molecules as proteins, in a reasonable time with enough accuracy, is still a very challenging and intensive task for the CPU’s resources.
On the other hand, Geant4 is a toolkit for simulating the effects of energy through matter and the nature of materials with a wide range of specialized methods that include DNA and protein exploration. Unfortunately, the execution time to obtain an effective protein analysis is still a strong restriction for CPU processors. In this sense, the GeantV project searches to exploit the vectorization of CPUs, designed to tackle the problem of intensive charge of calculus at the cores of CPUs. In this work, we present the preliminary results of the partial implementation of the DFT in the Geant4 framework and the vectorized GeantV project. We show the advantages and the partial methods used for vectorizing several sub-routines in the calculus of ground-state energy for some amino acids and some molecules.
Computing resources in the Worldwide LHC Computing Grid (WLCG) have been based entirely on the x86 architecture for more than two decades. In the near future, however, heterogeneous non-x86 resources, such as ARM, POWER and Risc-V, will become a substantial fraction of the resources that will be provided to the LHC experiments, due to their presence in existing and planned world-class HPC installations. The CMS experiment, one of the four large detectors at the LHC, has started to prepare for this situation, with the CMS software stack (CMSSW) already compiled for multiple architectures. In order to allow for a production use, the tools for workload management and job distribution need to be extended to be able to exploit heterogeneous architectures.
Profiting from the opportunity to exploit the first sizable IBM Power9 allocation available on Marconi100 HPC system at CINECA, CMS developed all the needed modifications to the CMS workload management system. After a successful proof of concept, a full physics validation has been performed in order to bring the system in production. The experiences are of very high value, when it comes to commissioning of the similar (even larger) Summit HPC system at Oak Ridge, where CMS is also expecting a resource allocation. Moreover the compute power of those systems is being provided also via GPUs and this represents an extremely valuable opportunity to exploit the offloading capability already implemented in CMSSW.
The status of the current integration including the exploitation of the GPUs, the results of the validation as well as the future plans will be shown and discussed.
The CMS Level-1 Trigger, for its operation during Phase-2 of LHC, will undergo a significant upgrade and redesign. The new trigger system, based on multiple families of custom boards, equipped with Xilinx Ultrascale Plus FPGAs and interconnected with high speed optical links at 25 Gb/s, will exploit more detailed information from the detector subsystems (calorimeter, muon systems, tracker). In contrast to its implementation during Phase-1, information from the CMS tracker is now also available at the Level-1 Trigger and can be used for particle flow algorithms. The final stage of the Level-1 Trigger, called Global Trigger (GT), will receive more than 20 different trigger object collections from upstream systems and will be able to evaluate a menu of more than 1000 cut-based algorithms distributed over 12 boards. These algorithms may not only apply conditions on parameters such as momentum or angle of a particle, but can also do arithmetic calculations, like the invariant mass of a suspected mother particle of interest or the angle between two particles. The Global Trigger is designed as a modular system, with an easily re-configurable algorithm unit, to meet the demand of high flexibility required for shifting trigger strategies during Phase-2 operation of the LHC. The algorithms themselves are kept highly configurable and tools are provided to allow their study from within the CMS offline software framework (CMSSW) without the need for knowledge of the underlying firmware implementation. To allow the reproducible translation of the physicist-designed trigger menu to VHDL for use in the hardware trigger, a tool has been developed that converts the Python-based configuration used by CMSSW to VHDL. In addition to cut-based algorithms, neural net algorithms are being developed and integrated into the Global Trigger framework. To make use of these algorithms in hardware, the HLS4ML framework is used, which transpiles pre-trained neural nets, generated in the most commonly used software frameworks, into firmware code. A prototype firmware for a single Global Trigger board has been developed, which includes the de-multiplexing logic, conversion to an internal common object format and distribution of the data over all Super Logic Regions. In this framework 312 algorithms are implemented at a clock speed of 480MHz. The prototype has been thoroughly tested and verified with the bit-wise compatible C++ emulator. In this contribution we present the Phase-2 Global Trigger with an emphasis on the Global Trigger algorithms, their implementation in hardware, configuration with Python and the novel integration within the CMS offline software framework (CMSSW).
With the start of run 3 in 2022, the LHC has entered a new period, now delivering higher energy and luminosity proton beams to the Compact Muon Solenoid (CMS) experiment. These increases make it critical to maintain and upgrade the tools and methods used to monitor the rate at which data is collected (the trigger rate). Software tools have been developed to allow for automated rate monitoring, and we present several upgrades to these software tools, which maintain and expand on their functionality. These trigger rate monitoring tools allow for real-time monitoring including alerts which go out to on-call experts in the case of abnormalities. Fits are produced from previously collected data and extrapolate the behaviors of the triggers as a function of pile-up (the average number of particle interactions per bunch-crossing). These fits allow for visualization and statistical analysis of the behavior of the triggers and are displayed on the online monitoring system (OMS). The rate monitoring code can also be used for offline data certification and more complex trigger analysis. This presentation will show some of the upgrades to this software with an emphasis on the automation for easier and consistent upgrades and fixes to the software, and the increased interactivity with the users.
Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program.
The low-level abstraction of memory access (LLAMA) is a C++ library that provides a zero-runtime-overhead abstraction layer, underneath which memory layouts can be freely exchanged, focusing on multidimensional arrays of nested, structured data.
It provides a framework for defining and switching custom memory mappings at compile time to define data layouts, data access and access instrumentation, making LLAMA an ideal tool to tackle memory-related optimization challenges in heterogeneous computing.
After its scientific debut, several improvements and extensions have been added to LLAMA. This includes compile-time array extents for zero memory overhead, support for computations during memory access, new mappings (e.g. int/float bit-packing or byte-swapping) and more. This contribution provides an overview of the LLAMA library, its recent development and an outlook of future activities.
We present a machine-learning based method to detect deviations from a reference model, in an almost independent way with respect to the theory assumed to describe the new physics responsible for the discrepancies.
The analysis is based on an Effective Field Theory (EFT) approach: under this hypothesis the Lagrangian of the system can be written as an infinite expansion of terms, where the first ones are those from the Standard Model (SM) Lagrangian and the following terms are higher dimension operators. The presence of the EFT operators impacts the distributions of the observables by producing deviations from the shapes expected when the SM Lagrangian alone is considered .
We use a Variational AutoEncoder (VAE) trained on SM processes to identify EFT contributions as anomalies. While SM events are expected to be reconstructed properly, events generated taking into account EFT contributions are expected to be poorly reconstructed, thus accumulating in the tails of the loss function distribution. Since the training of the model does not depend on any specific new physics signature, the proposed strategy does not make specific assumptions on its nature. In order to improve the discrimination performances, we introduced a DNN classifier that distinguishes between EFT and SM events based on the values of the reconstruction and regularization losses of the model. In this second model a cross entropy term is added to the usual loss of the VAE, optimizing at the same time the reconstruction of the input variables and the classification. This procedure ensures that the model is optimized for discrimination, with a small price in terms of model independency due to the use of one of the 15 operators from the EFT model in the training.
In this talk we will discuss in detail the above-mentioned methods using generator level VBS events produced at LHC and assuming, in order to compute the significance of possible new physics contributions, an integrated luminosity of $350 fb^{-1}$.
The Belle II experiment at the second generation e+/e- B-factory SuperKEKB has been collecting data since 2019 and aims to accumulate 50 times more data than the first generation experiment, Belle.
To efficiently process these steadily growing datasets of recorded and
simulated data that end up on the order of 100 PB and to support
Grid-based analysis workflows using the DIRAC Workload Management
System, an XRootD-based caching architecture is presented.
The presented mechanism decreases job waiting time for often-used datasets by transparently adding copies of these files at smaller sites without managed storage.
The described architecture seamlessly integrates local storage services and supports the use of dynamic computing resources with minimal deployment effort.
This is especially useful in environments with many institutions providing comparatively small numbers of cores and limited personpower.
This talk will describe the implemented cache at GridKa, a main computing centre for Belle II, as well as its performance and upcoming opportunities for caching for Belle II.
With the continuous increase in the amount of large data generated and stored in various scientific fields ,such as cosmic ray detection, compression technology becomes more and more important in reducing the requirements for communication bandwidth and storage capacity. Zstandard, abbreviated as zstd, is a fast lossless compression algorithm. For zlib-level real-time compression scenarios, it can have a good compression ratio and a faster speed than similar algorithms. In this paper, we introduce the architecture of a new zstd compression kernel, and combine it with the root framework (an open-source data analysis framework used by high energy physics and others), and optimize the proposed architecture for the specific use case of lhaaso km2a data decode. The optimized kernel is implemented on Xilinx alveo U200 board.
Lossy compression algorithms are incredibly useful due to powerful compression results. However, lossy compression has historically presented a trade-off between the retained precision and the resulting size of data compressed with a lossy algorithm. Previously, we introduced BLAST, a state-of-the-art compression algorithm developed by Accelogic. We presented results that demonstrated BLAST can achieve a compression factor that undeniably surpasses compression algorithms currently available in the ROOT framework. However, the leading concern of utilizing the lossy compression technique is the delayed realization that more precision is necessary. This precision may have been irretrievably lost in an effort to decrease storage size. Thus, there is immense value in retaining higher precision data in reserve. Though, in the era of exabyte computing, it becomes extremely inefficient and costly to duplicate data stored at different compressive precision values. A tiered cascade of stored precision optimizes data storage and resolves these fundamental concerns.
Accelogic has developed a game-changing compression technique, known as “Precision Cascade”, which enables higher precision to be stored separately without duplicating information. With this novel method, varying levels of precision can be retrieved, potentially minimizing live storage space. Preliminary results from STAR and CMS demonstrate that multiple layers of precision can be stored and retrieved without significant penalty to the compression ratios and (de)compression speeds, when compared to the single-precision BLAST baseline.
In this contribution, we will present the integration of Accelogic’s “Precision Cascade” into the ROOT framework, with the principal purpose of enabling high-energy physics experiments to leverage this state-of-the-art algorithm with minimal friction. We also present our progress in exploring storage reduction and speed performance with this new compression tool in realistic examples from both STAR and CMS experiments and feel we are ready to deliver the compression algorithm to the wider community.
The evolution of the computing landscape has resulted in the proliferation of diverse hardware architectures, with different flavors of GPUs and other compute accelerators becoming more widely available. To facilitate the efficient use of these architectures in a heterogeneous computing environment, several programming models are available to enable portability and performance across different computing systems, such as Kokkos, SYCL, OpenMP and others. As part of the High Energy Physics Center for Computational Excellence (HEP-CCE) project, we investigate if and how these different programming models may be suitable for experimental HEP workflows through a few representative use cases. One of such use cases is the Liquid Argon Time Projection Chamber (LArTPC) simulation which is essential for LArTPC detector design, validation and data analysis. Following up on our previous investigations [1, 2] of using Kokkos to port LArTPC simulation in the Wire-Cell Toolkit (WCT) to GPUs, we have explored OpenMP and SYCL as potential portable programming models for WCT, with the goal to make diverse computing resources accessible to the LArTPC simulations. In this presentation, we will describe how we utilize relevant features of OpenMP and SYCL for the LArTPC simulation module in WCT. We will also show performance benchmark results on multi-core CPUs, NVIDIA and AMD GPUs for both the OpenMP and the SYCL implementations. Comparisons with different compilers will be given. Advantages and disadvantages of using OpenMP, SYCL and Kokkos in this particular use case will also be discussed.
The simplicity of Python and the power of C++ provide a hard choice for a scientific software stack. There have been multiple developments to mitigate the hard language boundaries by implementing language bindings. The static nature of C++ and the dynamic nature of Python are problematic for bindings provided by library authors and in particular features such as template instantiations with user-defined types or more advanced memory management.
The development of the C++ interpreter Cling has changed the way we can think of language bindings as it provides an incremental compilation infrastructure available at runtime. That is, Python can interrogate C++ on demand and fetch only the necessary information. This way of automatic binding provision requires no binding support by the library authors and offers better performance than Pybind11. This approach pioneered in ROOT with PyROOT and later was enhanced with its successor Cppyy. However, until now, Cppyy relied on the reflection layer of ROOT which is limited in terms of provided features and performance.
In this talk we show how basing Cppyy purely on Cling yields better correctness, performance and installation simplicity. We illustrate more advanced language interoperability of Numba-accelerated Python code capable of calling C++ functionality via Cppyy. We outline a path forward for integrating the reflection layer in LLVM upstream which will contribute to the project sustainability and will foster greater user adoption. We demonstrate usage of Cppyy through Cling’s LLVM mainline version Clang-Repl.
The continuous growth in model complexity in high-energy physics (HEP) collider experiments demands increasingly time-consuming model fits. We show first results on the application of conditional invertible networks (cINNs) to this challenge. Specifically, we construct and train a cINN to learn the mapping from signal strength modifiers to observables and its inverse. The resulting network infers the posterior distribution of the signal strength modifiers rapidly and for low computational cost. We present performance indicators of such a setup including the treatment of systematic uncertainties and highlight the features of cINNs estimating a signal strength for HEP-data on simulations.
Constraining cosmological parameters, such as the amount of dark matter and dark energy, to high precision requires very large quantities of data. Modern survey experiments like DES, LSST, and JWST, are acquiring these data sets. However, the volumes and complexities of these data – variety, systematics, etc. – show that traditional analysis methods are insufficient to exhaust the information contained in these survey data. Specifically, explicit likelihood-based inference as performed with MCMC likelihood fitting is prone to biases because the likelihoods are written as analytic expressions. This calls for a method that can simultaneously process large volumes of data and handle biases in an efficient manner. Simulation-based inference (SBI or likelihood-free inference) is rapidly gaining popularity for addressing diverse cosmological problems because of its ability to incorporate complex physical processes (statistical fluctuations of cluster properties) and observational effects (non-linear measurement errors) while generating the observables by forward simulations. In this work, we train a normalizing-flow-based machine learning algorithm embedded in the SBI framework on two datasets - generated by analytical forward models (via CosmoSIS) and N-body simulations (Quijote simulations suite). We use number counts and mean masses of dark matter halos to estimate posteriors of multiple cosmological parameters (e.g., Ωm, Ωb, h, ns, σ8). Our results show that the SBI method constrains the cosmological parameters within 2σ, which is comparable to the state-of-the-art MCMC-based inference methods, and results in a smaller bias for some parameters (h and ns) than MCMC. Furthermore, SBI trained on the Quijote simulations data permits a much shorter computational time when dealing with large datasets, compared to MCMC method.
The IRIS-HEP Analysis Grand Challenge (AGC) is designed to be a realistic environment for investigating how analysis methods scale to the demands of the HL-LHC. The analysis task is based on publicly available Open Data and allows for comparing usability and performance of different approaches and implementations. It includes all relevant workflow aspects from data delivery to statistical inference.
The reference implementation for the AGC analysis task is heavily based on tools from the HEP Python ecosystem. It makes use of novel pieces of cyberinfrastructure and modern analysis facilities in order to address the data processing challenges of the HL-LHC.
This contribution compares multiple different analysis implementations and studies their performance. Differences between the implementations include the use of multiple data delivery mechanisms and caching setups for the analysis facilities under investigation.
The Federation is a new machine learning technique for handling large amounts of data in a typical high-energy physics analysis. It utilizes Uniform Manifold Approximation and Projection (UMAP) to create an initial low-dimensional representation of a given data set, which is clustered by using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). These clusters can then be used for a federated learning approach, in which we separately train a classifier on the data of each individual cluster. As a requirement for this approach, we need to apply an imbalanced learning method to the data in the found clusters before the training. By using a Dynamic Classifier Selection method, the Federation can then make predictions for the whole data set. As a proof of concept for this novel technique, open data from the Higgs Boson Machine Learning Challenge is used and comparisons to results from established methods will be presented. We also investigated the issue of handling missing values and the jet-count feature for this data.
Tensor Networks (TN) are approximations of high-dimensional tensors designed to represent locally entangled quantum many-body systems efficiently. In this talk, we will discuss how to use TN to connect quantum mechanical concepts to machine learning techniques, thereby facilitating the improved interpretability of neural networks. As an application, we will use top jet classification against QCD jets and compare performance against state-of-the-art machine learning applications. Finally, we will discuss how to convert these models into Quantum Circuits to be compiled on a quantum device and show that classical TNs require exponentially large bond dimensions and higher Hilbert-space mapping to perform comparably to their quantum counterparts.
The potential exponential speed-up of quantum computing compared to classical computing makes it to a promising method for High Energy Physics (HEP) simulations at the LHC at CERN.
Generative modeling is a promising task for near-term quantum devices, the probabilistic nature of quantum mechanics allows us to exploit a new class of generative models: quantum circuit Born machine (QCBM).
These models use the stochastic nature of quantum measurement as random-like sources and have no classical analog.
More specifically, they produce samples from the underlying distribution of a pure quantum state by measuring a parametrized quantum circuit with probability given by the Born rule
This work presents an application of Born machines to Monte Carlo simulations and extends their reach to multivariate and conditional distributions.
Even if generating multivariate distributions with Born machines has already been explored, we propose an alternative circuit design with a reduced connectivity, better suited for NISQ devices.
Indeed, models are run on (noisy) simulators and IBM Quantum superconducting devices.
More specifically, Born machines are used to generate muonic force carriers (MFC) events resulting from scattering processes between muons and the detector material in high-energy-physics colliders experiments. MFCs are bosons appearing in beyond the standard model theoretical frameworks, which are candidates for dark matter. Empirical evidences suggest that Born machines can reproduce the underlying distribution of datasets coming from Monte Carlo simulations, and are competitive with classical machine learning-based generative models of similar complexity.
Accurate molecular force fields are of paramount importance for the efficient implementation of molecular dynamics techniques at large scales. In the last decade, machine learning methods have demonstrated impressive performances in predicting accurate values for energy and forces when trained on finite size ensembles generated with ab initio techniques. At the same time, quantum computers have recently started to offer new viable computational paradigms to tackle such problems. On the one hand, quantum algorithms may notably be used to extend the reach of electronic structure calculations. On the other hand, quantum machine learning is also emerging as an alternative and promising path to quantum advantage. Here we follow this second route and establish a direct connection between classical and quantum solutions for learning neural network potentials. To this end, we design a quantum neural network architecture and apply it successfully to different molecules of growing complexity. The quantum models exhibit larger effective dimension with respect to classical counterparts and can reach competitive performances, thus pointing towards potential quantum advantages in natural science applications via quantum machine learning.
The expected volume of data from the new generation of scientific facilities such as the Square Kilometre Array (SKA) radio telescope has motivated the expanded use of semi-automatic and automatic machine learning algorithms for scientific discovery in astronomy. In this field, the robust and systematic use of machine learning faces a number of specific challenges, including both a lack of labelled data for training (paradoxically although we have too much data we also don't have enough) and an inheritance of abstracted and sometimes subjective classification terminology. In this talk I will discuss our recent work using language models to derive semantic features that can be mapped to astrophysical target classes using non-technical language. This method is domain-agnostic and publicly available, and we hope that it may also prove useful for other scientific fields where expert data labelling is otherwise costly.
Strategies to detect data departures from a given reference model, with no prior bias on the nature of the new physical model responsible for the discrepancy might play a vital role in experimental programs where, like at the LHC, increasingly rich experimental data are accompanied by an increasingly blurred theoretical guidance in their interpretation. I will describe one such strategy that employs neural networks, leveraging their virtues as flexible function approximants, but builds its foundations directly on the canonical likelihood-ratio approach to hypothesis testing. The algorithm compares observations with an auxiliary set of reference-distributed events, possibly obtained with a Monte Carlo event generator. It returns a p-value, which measures the compatibility of the reference model with the data. It also identifies the most discrepant phase-space region of the dataset, to be selected for further investigation. Imperfections due to mismodelling in the reference dataset can be taken into account straightforwardly as nuisance parameters.
The new concepts of future electron-positron colliders such as Future Circular Collider, International Linear Collider or Circular Electron-Positron Collider push the precision state-of-the-art in experimental measurements. The tremendous efforts of experimental physicists to test the immense predictive power of the Standard Model are limited by the intrinsic uncertainties in the currently available theoretical calculations.
The bottleneck is due to the increasing complexity of Feynman integral calculations. In recent years, modern methods for the reduction and calculation of the Feynman integral have been developed. We will present some of the tools available on the market and highlight the advances in Feynman integral calculations.
The high-performance fourth-generation synchrotron radiation light source, e.g., the High Energy Photon Source (HEPS) has been proposed and built successively. The advent of beamlines at fourth-generation synchrotron sources and the advanced detector has made significant progress that push the demand for computing resource at the edge of current workstation capabilities. On the other hand, the vast data volume produced by specific experiments makes it difficult for users to take data away. In this case, on-site data analysis services are necessary both during and after experiments. On top of this, most synchrotron light source has shifted to prolonged remote operation because of the outbreak of a global pandemic, with the need for remote access to the online instrumental system during the experiments.
A data analysis platform with a graphical user interface (GUI) accessible via the browser-based Jupyter notebook framework was developed to address the above requirements. It aims to provide an interactive and user-friendly tool for the analysis of X-ray synchrotron radiation CT data collected during experiments. This platform allows remote access and quick reconstruction of large datasets from synchrotron radiation CT experiments. Various techniques to subtract background, normalize signal, reconstruct slice, and post-process the image have been made available. Through containerization and container orchestration techniques, it allows the platform to operate on heterogeneous and different scale computing resources.
This presentation will describe the design and status of the web-based data analysis platform for the CT imaging beamline of HEPS, as well as the future plan for this platform.
ROOT TTree has been widely used in the analysis and storage of various high-energy physical experiment data. The event data generated by the experiment is stored in TTree's bunch and further compressed and archived into a standard ROOT format file. At present, ROOT supports the compression storage of TBasket, the buffer of TBranch, using compression algorithms such as zlib, lzma, lz4, zstd, etc., and maximizes performance by using different compression algorithms in different scenarios, which is of great significance for the increasing amount of high-energy physical data. With the continuous improvement of hardware technology, it is possible to accelerate specific commonly used algorithms from the underlying hardware layer. In this article, by using ISA-L(The Intel Intelligent Storage Acceleration Library), the compression algorithm of ROOT is extended on the Intel X86 machine, enriching the options for ROOT data compression and further improving the comprehensive performance of TTree data compression. Performance tests on intel Xeon Silver 4215R CPUs indicate that the compression time using the ISA-L library is 25% higher than that of the ZSTD algorithm, and the compression rate is slightly better than ZSTD, but the decompression speed is slower than ZSTD. Adding ISA-L support to root allows users to choose more compression methods and effectively reduces compression time.
Modern high energy physics experiments and similar compute intensive fields are pushing the limits of dedicated grid and cloud infrastructure. In the past years research into augmenting this dedicated infrastructure by integrating opportunistic resources, i.e. compute resources temporarily acquired from third party resource providers, has yielded various strategies to approach this challenge. However, work on this topic is usually driven by practical needs to use specific resource providers for production workflows; in this context, research is ad hoc and relies on impressions gained during unique situations of resource providers, resource demand and opportunistic resource management. Replicating or even preparing a specific situation to investigate opportunistic resource management is extremely challenging or even impossible. More importantly research in the field of opportunistic resource management is therefore extremely limited.
We propose to tackle this challenge using simulation and to this end present the simulation framework LAPIS, a general purpose scheduling simulator offering programmatic control of resources. We demonstrate this approach by integrating LAPIS with the COBalD/TARDIS resource manager to investigate the behaviour of this resource manager in a simulated environment.
A precise measurement of the polarizability of the charged pion provides an important experimental test of our understanding of low-energy QCD. The goal of the Charged Pion Polarizability (CPP) experiment in Hall D at JLab, currently underway, is to make a precision measurement of this quantity through a high statistics study of the γγ → π+π− reaction near 2π threshold. The production of Bethe-Heitler electron and muon pairs present significant backgrounds, which demand high discrimination between e/π and μ/π to select a clean pion-pair signal. Two independent AI/ML projects were developed to classify μ/π and e/π respectively: a tensorflow-lite model (training in python, inference in C++) for μ/π, and the TMVA package from ROOT for e/π. A new detector, consisting of iron absorbers interspersed with multi-wire proportional chambers, was constructed to enhance the discrimination between muons and pions. Both models were deployed in real time data monitoring to verify good experimental conditions.
The reconstruction of particle trajectories is a key challenge of particle physics experiments as it directly impacts particle reconstruction and physics performances. To reconstruct these trajectories, different reconstruction algorithms are used sequentially. Each of these algorithms use many configuration parameters that need to be fine-tuned to properly account for the detector/experimental setup, the available CPU budget and the desired physics performance. Examples for such parameters are cut values limiting the search space of the algorithm, approximations accounting for complex phenomenons or parameters controlling algorithm performance. Until now, these parameters had to be optimised by human experts which is inefficient and raises issues for the long term maintainability of such algorithms. Previous experiences with using machine learning for particle reconstruction (such as the TrackML challenge) have shown that they can be easily adapted to different experiments by learning directly from the data. We propose to bring the same approach to the classic track reconstruction algorithms by connecting them to an agent driven optimiser which will allow us to find the best set of input parameters using an iterative tuning approach. We have so far demonstrated this method on different track reconstruction algorithms within A Common Tracking Software (ACTS) framework using the Open Data Detector (ODD). These algorithms include the trajectory seed reconstruction and selection, the particle vertex reconstruction and the generation of simplified material map used for trajectory reconstruction. Finally, we present a development plan for a flexible integration of tunable parameters within the ACTS framework to bring this approach to all aspects of trajectory reconstruction.
One way to improve the position and energy resolution in neutrino experiments, is to give parameters with high resolution to the reconstruction method. These parameters, the photon electron(PE) hit time and the expectation of PE count, can be analyzed from the waveforms. We developed a new waveform analysis method called Fast Scholastic Matching Pursuit(FSMP). It is based on Bayesian principles, and the possible solutions are sampled with Markov Chain Monte Carlo(MCMC). To accelerate the method, we ported it to GPU, and could analysis the waveforms with 0.01s per waveform. This method extracts all the information in the waveforms, and will benefit event reconstruction with high resolution. With the improved resolution, we can make our way to our final physics goal.
Track reconstruction (or tracking) plays an essential role in the offline data processing of collider experiments. For the BESIII detector working in the tau-charm energy region, plenty of efforts were made previously to improve the tracking performance with traditional methods, such as pattern recognition and Hough transform etc. However, for challenging tasks, such as the tracking of low momentum tracks, tracks from secondary vertices and tracks with high noise level, there is still large room for improvement.
In this contribution, we demonstrate a novel tracking algorithm based on machine learning method. In this method, a hit pattern map representing the connectivity between drift cells is established using an enormous MC sample, based on which we design an optimal method of graph construction, then an edge-classifying Graph Neural Network is trained to distinguish the hit-on-track from noise hits. Finally, a clustering method based on DBSCAN is developed to cluster hits from multiple tracks. Track fitting algorithm based on GENFIT is also studied to obtain the track parameters, where deterministic annealing filter are implemented to deal with ambiguities and potential noises.
The preliminary results on BESIII MC sample presents promising performance, showing potential to apply this method to other drift chamber based trackers as well, such as the CEPC and STCF detectors under pre-study.
Keywords: machine learning, tracking, drift chamber, GNN
Reference:
1. Steven Farrell et al, Novel deep learning methods for track reconstruction. arxiv: 1810.06111
2. A Generic Track-Fitting Toolkit. https://github.com/GenFit/GenFit
In particle physics, precise simulations are necessary to enable scientific progress. However, accurate simulations of the interaction processes in calorimeters are complex and computationally very expensive, demanding a large fraction of the available computing resources in particle physics at present. Various generative models have been proposed to reduce this computational cost. Usually, these models interpret calorimeter showers as 3D images in which each active cell of the detector is represented as a voxel. This approach becomes difficult for high-granularity calorimeters due to the larger sparsity of the data.
In this study, we use this sparseness to our advantage and interpret the calorimeter showers as point clouds. More precisely, we consider each hit as part of a hit distribution depending on a global latent calorimeter shower distribution.
Our model is based on PointFlow (Yang et al. 2019) and consists of a permutation invariant encoder and two normalizing flows. One flow models the global latent calorimeter shower distribution. The other flow models the distribution of individual hits conditioned on the calorimeter shower distribution.
We present first results, they are shown and compared with state-of-the-art voxel methods.
The Xrootd protocol is used by CMS experiment of LHC to access, transfer, and store data within Worldwide LHC Computing Grid (WLCG) sites running different kinds of jobs on their compute nodes. Its redirector system allows some execution tasks to run by accessing input data that is stored on any WLCG site. In 2029 the Large Hadron Collider (LHC) will start the High-Luminosity LHC (HL-LHC) program, when the luminosity will increase in a factor 10 as compared to the current values. This scenario will also imply an unprecedented increase of simulation and collision data to transfer, process and store in disk and tape systems. The Spanish WLCG sites that support CMS, the PIC Tier-1 and the CIEMAT Tier-2 have explored content delivery network type solutions in the Spanish region. One of the possible solutions under development has been the deployment of caches between the two sites that store the data requested by the jobs remotely, so that they get closer to the nodes to improve their job efficiency and input data transfer latency. In this contribution, we analyze the impact of deploying physical caches in production in the CMS region between PIC and CIEMAT, as well as the impact they have on job efficiency, latency and bandwidth gains, and potential storage savings.
The Jiangmen Underground Neutrino Observatory (JUNO) has a very rich physics program which primarily aims to the determination of the neutrino mass ordering and to the precisely measurement of oscillation parameters. It is under construction in South China at a depth of about 700~m underground. As data taking will start in 2023, a complete data processing chain is developed before the data taking. Conditions and parameters data, as non-event data, are one of important parts in the data processing chain, which are used by reconstruction and simulation. These data could be accessed via Frontier on JUNO-DCI (Distributed Computing Infrastructure), or via databases, such as MySQL and SQLite in local clusters.
In this contribution, the latest development of a lightweight database interface (DBI) for JUNO conditions and parameters data management system will be shown. This interface provides a unified method to access data from different backends, such as Frontier, MySQL and SQLite: production jobs could run on JUNO-DCI with Frontier; testing jobs could run in a local cluster with MySQL to validate the conditions and parameters data; fast reconstruction could run in a DAQ environment onsite using SQLite without any connections to remote database. Modern C++ template techniques are used in DBI: extension of a new backend is defined by a simple \texttt{struct} with two methods \texttt{doConnect} and \texttt{doQuery}; result sets are binding to \texttt{std::tuple} and the types of all the elements are known at compile-time. Finally, DBI is used by high-level user interfaces: data models in the database are mapping to normal C++ classes, so that users could access these objects without knowing DBI.
Awkward Array is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms. Auto-differentiation (also known as “autograd” and “autodiff”) is a technique for computing the derivative of a function defined by an algorithm, which requires the derivative of all operations used in that algorithm to be known.
The grad-hep group is primarily focused on end-to-end analysis, and they use JAX as their primary library for auto-differentiation. As part of such an effort, we developed an interoperability layer between JAX and Awkward Arrays using JAX’s pytrees API. JAX now differentiates most of the Awkward Array functions including reducers algorithms. This allows investigators to differentiate through their functions if they are using Uproot with Awkward Arrays. However, extending JAX’s vectorized mapping APIs is not possible currently, because of the fundamental differences between the two libraries.
Future work on this might involve testing for a large subset of most commonly used differentiable cases. Currently, testing is carried out on a relatively small number of cases which were developed to catch edge cases.
We also developed a GPU backend for Awkward Arrays by leveraging CuPy’s CUDA capabilities. Awkward Arrays now has the entire infrastructure to support operations on a GPU. However, many low-level “C” Kernels (115/204) are yet to be translated to CUDA. After implementing this, Awkward Arrays will have full GPU support and this would indirectly help in making auto-differentiation fully deployable on the GPUs too.
A broad range of particle physics data can be naturally represented as graphs. As a result, Graph Neural Networks (GNNs) have gained prominence in HEP and have increasingly been adopted for a wide array of particle physics tasks, including particle track reconstruction. Most problems in physics involve data that have some underlying compatibility with symmetries. These problems may either require, or at the very least, benefit from models that perform computations and construct representations that reflect these symmetries. In this work, we explore the application of symmetry group equivariance to GNNs within the context of charged particle tracking in pileup conditions similar to those expected at the high-luminosity Large Hadron Collider. In particular, we investigate whether rotationally-equivariant GNNs can perform competitively and yield models that either contain fewer, more expressive learned parameters or are more efficient vis-à-vis data and computational requirements. To our knowledge, this is the first study exploring equivariant GNNs for a track reconstruction use case. Additionally, we perform a side-by-side comparison of equivariant and non-equivariant architectures over evaluation metrics that capture both outright tracking performance as well as the track-building power-to-weight ratio of physics-constrained GNNs.
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture specific implementations is not a viable scenario, given the available person power and code maintenance issues.
The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and Alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using an assortment of representative use cases from DUNE, LHC ATLAS and CMS experiments. Central to the project is to develop a list of metrics that evaluate the suitability of each portability layer for the various testbeds. This list includes both subjective ratings, such as the ease of learning the language, and objective criteria such as performance.
We report on the status of these projects, the development and evaluation of the metrics, as well as the current benchmarks and evaluations of the portability layers for the testbeds under study and recommendations for HEP experiments seeking forward looking portability solutions.
The simplest and often most effective way of parallelizing the training of complex Machine Learning models is to execute several training instances on multiple machines, possibly scanning the hyperparameter space to optimize the underlying statistical model and the learning procedure.
Often, such a meta learning procedure is limited by the ability of accessing securely a common database organizing the knowledge of the previous and ongoing trials. Exploiting opportunistic GPUs provided in different environments represents a further challenge when designing such optimization campaigns.
In this contribution we discuss how a set of REST APIs can be used to access a dedicated service based on INFN Cloud to monitor and possibly coordinate multiple training instances, with gradientless optimization techniques, via simple HTTP requests. The service, named Hopaas (Hyperparameter OPtimization As A Service), is made of web interface and sets of APIs implemented with a FastAPI back-end running through Uvicorn and NGINX in a virtual instance of INFN Cloud. The optimization algorithms are currently based on Bayesian techniques as provided by Optuna. A Python front-end is also made available for quick prototyping.
We present applications to hyperparameter optimization campaigns performed combining private, INFN Cloud and CINECA resources.
In the European Center of Excellence in Exascale Computing "Research on AI- and Simulation-Based Engineering at Exascale" (CoE RAISE), researchers from science and industry develop novel, scalable Artificial Intelligence technologies towards Exascale. In this work, we leverage European High performance Computing (HPC) resources to perform large-scale hyperparameter optimization (HPO), multi-node distributed data-parallel training as well as benchmarking, using multiple compute nodes, each equipped with multiple GPUs.
Training and HPO of deep learning-based AI models is often compute resource intensive and calls for the use of large-scale distributed resources as well as scalable and resource efficient hyperparameter search algorithms. We evaluate the benefits of HPC for HPO by comparing different search algorithms and approaches, as well as performing scaling studies. Furthermore, the scaling and benefits of multi-node distributed data-parallel training using Horovod are presented, showing significant speed-up in model training. In addition, we present results from the development of a containerized benchmark based on an AI-model for event reconstruction that allows us to compare and assess the suitability of different hardware accelerators for training deep neural networks. A graph neural network (GNN) model known as MLPF, which has been developed for the task of Machine Learned Particle-Flow reconstruction in High Energy Physics (HEP), acts as the base model for which studies are performed.
Further developments of AI models in CoE RAISE have the potential to greatly impact the field of High Energy Physics by efficiently processing the very large amounts of data that will be produced by particle detectors in the coming decades. In order to do this efficiently, techniques that leverage modern HPC systems like multi-node training, large-scale distributed HPO as well as standardized benchmarking will be of great use.
The Jiangmen Underground Neutrino Observatory (JUNO) is under construction in South China and will start data taking in 2023. It has a central detector with a 20-kt liquid scintillator, equipped with 17,612 20-inch PMTs (photo-multiplier tubes) and 25,600 3-inch PMTs. The requirement on energy resolution of 3\%@1MeV makes the offline data processing challenging, so several machine learning based methods have been developed for reconstruction, particle identification, simulation etc. These methods are implemented with machine learning libraries in Python, however, the offline software is based on a C++ framework called SNiPER. Therefore, how to integrate them and run the inference in offline software is important.
In this contribution, integration of machine learning-trained models into JUNO's offline software will be presented. Three methods are explored: using SNiPER's Python binding to share data between C++ and Python; using native C/C++ APIs of the machine learning libraries, such as TensorFlow and PyTorch; using ONNX runtime. Even though SNiPER is implemented in C++, it provides Python binding via Boost Python. In recent updates of SNiPER, a special data buffer is implemented to share data between C++ and Python, which makes it possible to run machine learning methods in following way: a C++ algorithm reads event data and converts them to \texttt{numpy} arrays; a Python algorithm then accesses these \texttt{numpy} arrays and invokes machine learning libraries in Python; finally, the C++ algorithm puts the results into event data. For the native C/C++ APIs of machine learning libraries and ONNX runtime, a C++ algorithm is used to convert the event data to the corresponding formats and invoke the C/C++ APIs. The deployments of the three methods are also studied: using SNiPER's Python binding is the most flexible method for users, as users could install any Python libraries using \texttt{pip} by themselves; using native C/C++ APIs requires the users to use the same versions in JUNO official software release; using ONNX runtime only requires users to convert their own models to ONNX format. By comparing the three methods, ONNX is recommended for most of users in JUNO. For developing and testing of machine learning-models in offline software, developers could choose the other two methods.
CLUE is a fast and innovative density-based clustering algorithm to group digitized energy deposits (hits) left by a particle traversing the active sensors of a high-granularity calorimeter in clusters with a well-defined seed hit. Outliers, i.e. hits which do not belong to any clusters, are also identified. Its outstanding performance has been proven in the context of the CMS Phase-2 upgrade using both simulated and test beam data.
Initially CLUE was developed in a standalone repository to allow performance benchmarking with respect to its CPU and GPU implementations, demonstrating the power of algorithmic parallelization in the coming era of heterogeneous computing. In this contribution we will outline CLUE’s capabilities outside CMS and more specifically, at experiments at future colliders. In order to do so, CLUE was adapted to run in the key4hep framework (k4Clue): it was integrated in the Gaudi software framework and it now supports EDM4hep data format for inputs and outputs.
Implementation details and physics performance will be shown not only for several options of highly granular calorimeters for e+e- linear and circular future colliders, but also for the new Open Data Calorimeter detector, a recent extension to the Open Data Tracking detector, whose aim is to build a simulation-on-the-fly testbed for future algorithm R&D.
About 90% of the computing resources available to the LHCb experiment has been spent to produce simulated data samples for Run 2 of the Large Hadron Collider. The upgraded LHCb detector will operate at much-increased luminosity, requiring many more simulated events for the Run 3. Simulation is a key necessity of analysis to interpret data in terms of signal and background and estimate relevant efficiencies. The amount of simulation required will far exceed the pledged resources, requiring an evolution in technologies and techniques to produce simulated data samples. In this conference contribution, we discuss Lamarr, a Gaudi-based framework to speed-up the simulation production parametrizing both the detector response and the reconstruction algorithms of the LHCb experiment.
Deep Generative Models powered by several algorithms and strategies are employed to effectively parameterize the high-level response of the single components of the LHCb detector, encoding within neural networks the experimental errors and uncertainties introduced in the detection and reconstruction phases. Where possible, models are trained directly on real data, statistically subtracting any background components through the application of weights.
Embedding Lamarr in the general LHCb simulation framework (Gauss) allows to combine its execution with any of the available generators in a seamless way. The resulting software package enables a simulation process completely independent of the detailed simulation used to date.
The Jiangmen Underground Neutrino Observatory (JUNO) is under construction in South China at a depth of about 700~m underground: the data taking is expected to start in late 2023. JUNO has a very rich physics program which primarily aims to the determination of the neutrino mass ordering and to the precisely measurement of oscillation parameters.
The JUNO average raw data volume is expected to be about 2~PB/year and
will be transferred from the experimental site to the main computing center (IHEP, Beijing, China) using a dedicated link. When raw data arrive to IHEP, a Data Quality Monitoring (DQM) system will be used to monitor their quality. A so called Keep-Up-Production (KUP) will reconstruct the data and these processed data will be used for detector status studies and for some prompt physics analysis. In order to validate the complete data processing chain, a Mock Data Challenge is being performed and will produce a large scale Monte Carlo data-set for the JUNO experiment.
Due to the rare signals, most of the JUNO expected events are backgrounds, coming from natural radioactivity of rocks, cosmic muons and from the detector itself. There are 17 different components considered in this Mock Data Challenge, and the simulation of each component is performed using the JUNO Distributed Computing Infrastructure (JUNO-DCI). The Monte Carlo output can then be used for the electronics and digitization simulation. However, the electronics simulation needs to simultaneously read a huge amount of data for each background component, and that makes the production on JUNO-DCI really challenging. A pre-mixing method is implemented to mix the radioactivity events beforehand so that the number of required input files can be significantly reduced: a radioactivity background event is picked from the existing data files according to the event rates and then saved into a pre-mixed data file.
In this contribution, details on the Mock Data Challenge, on the JUNO data processing logic-flow and on the practical challenges to be faced for a successful production, will be reported.
The podio event data model (EDM) toolkit provides an easy way to generate a performant implementation of an EDM from a high level description in yaml format. We present the most recent developments in podio, most importantly the inclusion of a schema evolution mechanism for generated EDMs as well as the "Frame", a thread safe, generalized event data container. For the former we discuss some of the technical aspects in relation with supporting different I/O backends and leveraging potentially existing schema evolution mechanisms provided by them. Regarding the Frame we introduce the basic concept and highlight some of the functionality as well as important aspects of its implementation. We also present some other, smaller new features, which have been inspired by the usage of podio for generating different EDMs for future collider projects, most importantly EDM4hep, the common EDM for the Key4hep project. We end with a brief overview on current developments towards a first stable version as well as an outlook on future developments beyond that.
Machine Learning (ML) applications, which have become quite common tools for many High Energy Physics (HEP) analyses, benefit significantly from GPU resources. GPU clusters are important to fulfill the rapidly increasing demand for GPU resources in HEP. Therefore, the Karlsruhe Institute of Technology (KIT) provides a GPU cluster for HEP accessible from the physics institute via its batch system and the Grid. As the exact hardware needs of such applications heavily depend on the ML hyperparameters, a flexible resource setup is necessary to utilize the available resources as efficient as possible. Therefore, the multi-instance GPU feature of the Nvidia A100 GPUs was studied. Several neural network training scenarios performed on the GPU cluster at KIT are discussed to illustrate possible performance gains and the setup that has been used.
The reconstruction of electrons and photons in CMS depends on topological clustering of the energy deposited by an incident particle in different crystals of the electromagnetic calorimeter (ECAL). These clusters are formed by aggregating neighbouring crystals according to the expected topology of an electromagnetic shower in the ECAL. The presence of upstream material (beampipe, tracker and support structures) causes electrons and photons to start showering before reaching the calorimeter. This effect, combined with the 3.8T CMS magnetic field, leads to energy being spread in several clusters around the primary one. It is essential to recover the energy contained in these satellite clusters in order to achieve the best possible energy resolution for physics analyses.
Historically satellite clusters have been associated to the primary cluster using a purely topological algorithm which does not attempt to remove spurious energy deposits from additional pileup interactions (PU). The performance of this algorithm is expected to degrade during LHC Run 3 (2022+) because of the larger average PU levels and the increasing levels of noise due to the ageing of the ECAL detector. New methods are being investigated that exploit state-of-the-art deep learning architectures like Graph Neural Networks (GNN) and self-attention algorithms. These more sophisticated models improve the energy collection and are more resilient to PU and noise.
This contribution covers the model optimization results and the steps to put it in production inside the realistic CMS reconstruction sequence. The impact on the electron and photon energy resolution and tests of the resiliency of the algorithm to the changing detector conditions are shown.
The future development projects for the Large Hadron Collider will constantly bring nominal luminosity increase, with the ultimate goal of reaching a peak luminosity of $5 \times 10^{34} cm^{−2} s^{−1}$. This would result in up to 200 simultaneous proton collisions (pileup), posing significant challenges for the CMS detector reconstruction.
The CMS primary vertex (PV) reconstruction is a two-step procedure consisting of vertex finding and fitting. First, the Deterministic Annealing algorithm clusters tracks coming from the same interaction vertex. Secondly, an Adaptive Vertex Fit computes the best estimate of the vertex position. In High Luminosity LHC (HL-LHC) conditions, due to the high track density, the reconstruction of PVs is expected to be particularly time expensive (up to 6\% of reconstruction time).
This work presents a complete study about adapting the CMS primary vertex reconstruction algorithms in order to be run on heterogeneous architectures that allows us to exploit parallelization techniques to significantly reduce the processing time, while retaining similar physics performance. Results obtained for both Run3 and HL-LHC conditions will be discussed.
The pyrate framework provides a dynamic, versatile, and memory-efficient approach to data format transformations, object reconstruction and data analysis in particle physics. Developed within the context of the SABRE experiment for dark matter direct detection, pyrate relies on a blackboard design pattern where algorithms are dynamically evaluated throughout a run and scheduled by a central control unit. The system intends to improve the user experience, portability and scalability of offline software systems currently available in the particle physics community, with particular attention to medium to small-scale experiments. Pyrate is implemented with the python programming language, allowing easy access to the scientific python ecosystem and commodity big data technologies. This presentation addresses the pyrate design and implementation.
The LHCb detector at the LHC is a general purpose detector in the forward region with a focus on studying decays of c- and b-hadrons. For Run 3 of the LHC (data taking from 2022), LHCb will take data at an instantaneous luminosity of 2 × 10^{33} cm−2 s−1, five times higher than in Run 2 (2015-2018). To cope with the harsher data taking conditions, LHCb will deploy a purely software based trigger with a 30 MHz input rate.
The software trigger at LHCb is composed of two stages: in the first stage the selection is based on a fast and simplified event reconstruction, while in the second stage a full event reconstruction is used. This gives room to perform a real-time alignment and calibration after the first trigger stage, which provides an offline-quality detector alignment in the second stage of the trigger. The detector alignment is an essential ingredient to have the best detector performance in the full event reconstruction. The alignment of the whole tracking system of LHCb is evaluated in real-time by an automatic iterative procedure. This is particularly important for the vertex detector, which is retracted for LHC beam injection and centered around the primary vertex position with stable beam conditions in each fill. Hence it is sensitive to position changes on fill-by-fill basis.
The real-time alignment procedure is fully automatic procedure in the online framework that uses a multi-core farm. It is executed as soon as the required data sample is collected. The alignment tasks are split in two parts to allow the parallelization of the event reconstruction via a multi-threads process, while the the evaluation of the alignment parameters is performed on a single thread after collecting all the needed information from all the reconstruction processes in the first part. The execution of the alignment tasks is under the control of the LHCb Experiment Control System, and it is implemented as a finite state machine. The procedure is run at the beginning of each LHC fill and for the alignment of the full tracking system (about 300 elements and about 1000 dofs) takes few minutes. The parameters are updated immediately in the software trigger. This in turn allows to achieve the optimal performance in the trigger output data that can be used for physics analysis without a further offline event reconstruction.
The framework and the procedure for a real-time alignment of the LHCb detector developed for Run 3 data taking are discussed from both the technical and operational point of view. Specific challenges of this procedure and its performance are presented.
Quantum Computing and Machine Learning are both significant and appealing research fields. In particular, the combination of both has led to the emergence of the research field of quantum machine learning which has recently taken enormous popularity. We investigate in the potential advantages of this synergy for the application in high energy physics, more precisely in the reconstruction of particle decay trees in particle collision experiments. Due to the larger computational space of quantum computers, this highly complex combinatorical problem is well suited for investigating in a potential quantum advantage compared to the classical scenario. However, current quantum devices are subject to noise and provide only a limited number of qubits. We therefore propose the utilization of a variational quantum circuit within a classical graph neural network which has been shown to be feasible for reconstruction of particle decay trees before. We evaluate our approach on artificially generated decay trees on a quantum simulator and a real quantum computer by IBM Quantum and compare our results to the purely classical approach. Our proposed approach does not only enable the effective utilization of nowadays quantum devices, but also shows competitive results even in the presence of noise.
One of the most challenging computational problems in the Run 3 of the Large Hadron Collider (LHC) and more so in the High-Luminosity LHC (HL-LHC) is expected to be finding and fitting charged-particle tracks during event reconstruction. The methods used so far at the LHC and in particular at the CMS experiment are based on the Kalman filter technique. Such methods have shown to be robust and to provide good physics performance, both in the trigger and offline. In order to improve computational performance, we explored Kalman-filter-based methods for track finding and fitting, adapted for many-core SIMD architectures. This adapted Kalman-filter-based software, called “mkFit”, was shown to provide a significant speedup compared to the traditional algorithm, thanks to its parallelized and vectorized implementation. The mkFit software was recently integrated into the offline CMS software framework, in view of its exploitation during the Run 3 of the LHC. At the start of the LHC Run 3, mkFit will be used for track finding in a subset of the CMS offline track reconstruction iterations, allowing for significant improvements over the existing framework in terms of computational performance, while retaining comparable physics performance. The performance of the CMS track reconstruction using mkFit at the start of the LHC Run 3 is presented, together with prospects of further improvement in the upcoming years of data taking.
The Key4hep project aims to provide a turnkey software solution for the full experiment life-cycle, based on established community tools. Several future collider communities (CEPC, CLIC, EIC, FCC, and ILC) have joined to develop and adapt their workflows to use the common data model EDM4hep and common framework. Besides sharing of existing experiment workflows, one focus of the Key4hep project is the development and integration of new experiment independent software libraries. Ongoing collaborations with projects such as ACTS, CLUE, PandoraPFA and the OpenDataDector show the potential of Key4hep as an experiment-independent testbed and development platform. In this talk, we present the challenges of an experiment-independent framework along with the lessons learned from discussions of interested communities (such as LUXE) and recent adopters of Key4hep in order to discuss how Key4hep could be of interest to the wider HEP community while staying true to its goal of supporting future collider designs studies.
LUXE (Laser Und XFEL Experiment) is a proposed experiment at DESY using the electron beam of the European XFEL and a high-intensity laser. LUXE will study Quantum Electrodynamics (QED) in the strong-field regime, where QED becomes non-perturbative. One of the key measurements is the positron rate from electron-positron pair creation, which is enabled by the use of a silicon tracking detector. Precision tracking of positrons becomes very challenging at high laser intensities due to the high rates, which can be computationally expensive for classical computers. The talk will present the latest progress of quantum algorithm-based tracking, which relies on Variational Quantum Eigensolver (VQE) or Quantum Approximate Optimisation Algorithm (QAOA) to reconstruct tracks, and compare the results with classical methods using Graph Neural Networks or a Combinatorial Kalman Filter.
One of the objectives of the EOSC (European Open Science Cloud) Future Project is to integrate diverse analysis workflows from Cosmology, Astrophysics and High Energy Physics in a common framework. The project’s development relies on the implementation of the Virtual Research Environment (VRE), a prototype platform supporting the goals of Dark Matter and Extreme Universe Science Projects in the respect of FAIR data policies, making use of a common AAI system, and leveraging experiments data via a reliable and scalable distributed storage infrastructure for multi-science: the Data Lake. The entry point of such a platform is a jupyterhub instance sitting on top of a complex K8s infrastructure, which provides an interactive GUI interface for researchers to access and share data, as well as to run notebooks. The data access and browsability is enabled through API calls to the high level data management and storage orchestration software (Rucio).
The cluster’s functionality, currently allowing data injection replication, storage and deletion, is being expanded to include a software repository plug-in enabling researchers to directly select computational environments from Docker images and to host a re-analysis platform (REANA) supporting various distributed computing backends (K8s, HTCondor, Slurm), which allows scientists to spawn and interact with complete re-analysis workflows.
The goal of the VRE project, bringing together data and software access, workflow reproducibility and enhanced user interface, is to facilitate scientific collaboration, ultimately accelerating research in various fields.
The LIGO, VIRGO and KAGRA Gravitational-wave interferometers are getting ready for their fourth observational period, scheduled to begin in March 2023, with improved sensitivities and higher event rates.
Data from the interferometers are exchanged between the three collaborations and processed by running search pipelines for a range of expected signals, from coalescing compact binaries to continuous waves and burst events, along with sky localisation and parameter estimation pipelines. One of the most important peculiarities of GW computing (and, more generally, of time-domain astrophysics) is that data processing happens both offline and on special low-latency infrastructures, in order to provide timely “event candidate alerts” to other observatories and make multi-messenger astronomy possible.
Significant efforts have been made in recent years to design and build a common computing infrastructure, both in terms of a common architecture and shared resources, to prepare for growing computing demand and increasingly exploit distributed computing resources. Many custom tools, difficult to maintain, have been replaced by more mainstream tools, more widely adopted in the physics community, in order to streamline workflows and reduce the burden of maintenance and operations.
We report on this activities, the status of the infrastructure and the plans for the upcoming observation period.
Since its inception, the minimal Linux image CernVM provides a portable and reproducible runtime environment for developing and running scientific software. Its key ingredient is the tight coupling with the CernVM-FS client to provide access to the base platform (operating system and tools) as well as the experiment application software. Up to now, CernVM images are designed to use full virtualization. The goal of CernVM 5 is to deliver all the benefits of the CernVM appliance and to be equally practical as a container and as a full VM. To this end, the CernVM 5 container image consists of a “Just Enough Operating System (JeOS)”, with its contents defined by the HEP_OSlibs meta-package commonly used as a base platform in HEP. CernVM 5 further aims at smooth integration of the CernVM-FS client in various container environments (such as Docker, kubernetes, podman, apptainer). Lastly, CernVM 5 uses special build tools and post-build processing to ensure that experiment software stacks using their custom compilers and build chains can coexist with standard system application stacks. As a result, CernVM 5 aims at providing a single, minimal container image that can be used as a virtual appliance for mounting the CernVM-FS client and for running and developing HEP application software.
Searches for new physics set exclusion limits in parameter spaces of typically up to 2 dimensions. However, the relevant theory parameter space is usually of a higher dimension but only a subspace is covered due to the computing time requirements of signal process simulations. An Active Learning approach is presented to address this limitation. Compared to the usual grid sampling, it reduces the number of parameter space points for which exclusion limits need to be determined. Hence it allows to extend interpretations of searches to higher dimensional parameter spaces and therefore to raise their value, e.g. via the identification of barely excluded subspaces which motivate dedicated new searches.
In an iterative procedure, a Gaussian Process is fit to excluded signal cross-sections. Within the region close to the exclusion contour predicted by the Gaussian Process, Poisson disc sampling is used to determine further parameter space points for
which the cross-section limits are determined. The procedure is aided by a warm-start phase based on computationally inexpensive, approximate limit estimates such as total signal cross-sections. A python package, excursion [1], provides the Gaussian Process routine. The procedure is applied to a Dark Matter search performed by the ATLAS experiment, extending its interpretation from a 2 to a 4-dimensional parameter space while keeping the computational effort at a low level.
[1] https://github.com/diana-hep/excursion
The Hubble Tension presents a crisis for the canonical LCDM model of modern cosmology: it may originate in systematics in data processing pipelines or it may come from new physics related to dark matter and dark energy. The aforementioned crisis can be addressed by studies of time-delayed light curves of gravitationally lensed quasars, which have the capacity to constrain the Hubble constant ($H_0$). A critical task in this analysis is the interpolation of time series with varying duration and irregular time sampling. In this problem, the baseline approach is Gaussian processes (GPs), which have issues in converging on the maximum likelihood.
In this work, we compare the interpolation performance of multiple models: GPs inferred with maximum likelihood optimization, GPs inferred with neural density estimation (NDE), and heteroscedastic temporal neural networks. For the NDE approach, a normalizing flow infers the posteriors of GP’s parameters from time series’ encodings independent of duration or time sampling. Of the neural networks, we use spline-based convolutional variational autoencoders (VAEs) and multi-time attention VAEs.
We validate our methods on simulations of Gaussian processes, on the observed lensed quasar light curves as well as on real-world datasets that are baselines for irregularly sampled time series interpolation. Our analysis shows that the Gaussian processes inferred with neural density estimators outperform the other approaches in interpolation quality.
PAUS is a 40 narrow-band imaging survey using the PAUCam instrument installed at
the William Herschel Telescope (WHT). Since the survey started in 2015, this
instrument has acquired a unique dataset, performing a relatively deep and
wide survey, but with a simultaneous excelled redshift accuracy. The survey
is a compromise in performance between deep spectroscopic survey and wide
field imaging, showing an order of magnitude better redshift resolution
than typical broad band surveys.
The survey data reduction was designed based on classical data reduction
techniques. For example the redshift template fitting needed a different
algorithm to properly handle the PAUS data (Eriksen 2019). While the data
reduction and redshift estimation worked, it had room for improvements.
In this talk, we detail the different efforts of replacing steps in the
PAUS data reduction with deep learning algorithms. First, deep learning
techniques obtain a 50 per.cent reduction in the photo-z scatter for
the fainted galaxies. This is achieved through various techniques,
including using transfer learning from simulations to handle a small
data set.
Furthermore, we have constructed multiple algorithms to improve the
data reduction stage. Noise estimation from background estimation from
a non-uniform background was handled in BKGNet (Cabayol-Garcia 2019),
the galaxy photometry (light measure) was introduced with Lumus
(Cabayol-Garcia 2021). Recent work includes the effort of directly
estimating the galaxy distance from images. In this talk we also
discuss the challenges encountered by differences between the
survey fields and recent advances in applying unsupervised denoising
techniques.
RooFit is a toolkit for statistical modeling and fitting used by most experiments in particle physics. Just as data sets from next-generation experiments grow, processing requirements for physics analysis become more computationally demanding, necessitating performance optimizations for RooFit. One possibility to speed-up minimization and add stability is the use of automatic differentiation (AD). Unlike for numerical differentiation, the computation cost scales linearly with the number of parameters, making AD particularly appealing for statistical models with many parameters. In this talk, we report on one possible way to implement AD in RooFit. Our approach is to add a facility to generate C++ code for a full RooFit model automatically. Unlike the original RooFit model, this generated code is free of virtual function calls and other RooFit-specific overhead. In particular, this code is then used to produce the gradient automatically with Clad. Clad is a source transformation AD tool implemented as a plugin to the clang compiler, which automatically generates the derivative code for input C++ functions. We show results demonstrating the improvements observed when applying this code generation strategy to HistFactory and other commonly used RooFit models. HistFactory is the subcomponent of RooFit that implements binned likelihood models with probability densities based on histogram templates. These models frequently have a very large number of free parameters, and are thus an interesting first target for AD support in RooFit.
High-multiplicity loop-level amplitude computations involve significant algebraic complexity, which is usually sidestepped by employing numerical routines. Yet, when available, final analytical expressions can display improved numerical stability and reduced evaluation times. It has been shown that significant insights into the analytic structure of the results can be obtained by tailored numerical evaluations. I present new developments on the object-oriented python package lips
(Lorentz invariant phase space) for the generation and manipulation of complex massless kinematics. Phase-space points can be defined at the spinor level over complex numbers ($\mathbb{C}$), finite fields ($\mathbb{F}_p$ ), and $p$-adic numbers ($\mathbb{Q}_p$). Facilities are also available for the evaluation of arbitrary spinor-helicity expressions in any of these fields. Through the algebraic-geometry submodule, which relies on Singular
through the python interface syngular
, one can define and manipulate ideals in spinor variables (either covariant components or invariant brackets). These allow to identify irreducible varieties, where amplitudes have well-defined zeros and poles, and to fine-tune numerical phase-space points to be on or close to such varieties. Explicit precision tracking in the $p$-adic implementation allows one to perform numerical computations in singular configurations while keeping track of the numerical uncertainty as an $\mathcal{O}(p^k)$ term. As an example application, I will show how to infer valid partial-fraction decompositions from $p$-adic evaluations.
I will discuss the analytic calculation of two-loop five-point helicity amplitudes in massless QCD. In our workflow, we perform the bulk of the computation using finite field arithmetic, avoiding the precision-loss problems of floating-point representation. The integrals are provided by the pentagon functions. We use numerical reconstruction techniques to bypass intermediate complexity and obtain compact forms for the rational coefficients. I will present results for NLO gluon-initiated diphoton-plus-jet production and NNLO trijet production.
In front Federico II room
Vector fields are ubiquitous mathematical structures in many scientific domains including high-energy physics where — among other things — they are used to represent magnetic fields. Computational methods in these domains require methods for storing and accessing vector fields which are both highly performant and usable in heterogeneous environments. In this paper we present covfie, a co-processor-aware vector field library developed by the ACTS community which aims to flexibly and performantly represent vector fields for a wide variety of scientific domains and across a range of programming platforms. To this end, we employ a compositional design philosophy which enables us to meet domain requirements through the composition of simple structures we refer to as vector field transformers. In this work, we detail the design and implementation of our library, and enumerate the different kinds of vector fields that our library supports. Furthermore, we evaluate the performance of our library using a mini-application that renders vector magnitudes of a slice of the ATLAS magnetic field on both an x86-based CPU platform and a CUDA-compatible GPGPU platform; through this mini-application, we demonstrate that different storage methods — all of which can be implemented using our library — can have a significant impact on the performance of client applications.
The CMS simulation, reconstruction, and HLT code have been used to deliver an enormous number of events for analysis during Runs 1 and 2 of the LHC at CERN. In fact, these techniques have been regarded as of fundamental importance for the CMS experiment. In the following arguments presented, several ways to improve efficiency of these procedures will be described and it will be displayed how no particular conceptual or technical blocker has been identified in their implementation.
In this framework, particular attention will be devoted to highlight how CMS simulation, Reco and HLT will gain a considerable increase in speed recompiling several CMS sub-libraries using advanced compiler options. In fact, using this logic, the compiler will be leveraged to obtain a up to 10% speedup. As will be shown, the focus of the reasonings reported will be on the LTO (Link Time Optimization) and PGO (Profile Guided Optimization) approaches: using these advanced tools, several results will be seen about improving the event loop time and event throughput and the differences between the profiles of the processes will be shown. Moreover, an important feature of PGO approach will be considered: profiles obtained running events based on one process will be enough to speedup many other ones (and a profile obtained with the Phase 1 detector configuration will manage to give an improvement for Phase 2 processes too).
Uproot reads ROOT TTrees using pure Python. For numerical and (singly) jagged arrays, this is fast because a whole block of data can be interpreted as an array without modifying the data. For other cases, such as arrays of std::vector<std::vector<float>>
, numerical data are interleaved with structure, and the only way to deserialize them is with a sequential algorithm. When written in Python, such algorithms are very slow.
We solve this problem by writing the same logic in a language that can be executed quickly. AwkwardForth is a Domain Specific Language (DSL), based on Standard Forth with I/O extensions for making Awkward Arrays, and it JIT-compiles to a fast virtual machine without requiring LLVM as a dependency. We generate code as late as possible to take advantage of optimization opportunities. All ROOT types previously implemented with Python are being converted to AwkwardForth.
Double and triple-jagged arrays have already been implemented and are 400× faster in AwkwardForth than in Python, with multithreaded scaling up to 1 second/GB because AwkwardForth releases the Python GIL. In this talk, we describe design aspects, performance studies, and future directions in accelerating Uproot with AwkwardForth.
In the past few years, using Machine and Deep Learning techniques has become more and more viable, thanks to the availability of tools which allow people without specific knowledge in the realm of data science and complex networks to build AIs for a variety of research fields. This process has encouraged the adoption of such techniques: in the context of High Energy Physics, new algorithms based on ML are being tested for event selection in trigger operations, end-user physics analysis, computing metadata based optimizations, and more. Time critical applications can benefit from implementing algorithms on low-latency hardware like specifically designed ASICs and programmable micro-electronics devices known as FPGAs. The latter offers a unique blend of the benefits of both hardware and software. Indeed, they implement circuits just like hardware, providing power, area and performance benefits over software, yet they can be reprogrammed cheaply and easily to implement a wide range of tasks, at the expense of performance with respect to ASICs.
In order to facilitate the translation of ML models to fit in the usual workflow for programming FPGAs, a variety of tools have been developed. One example is the HLS4ML toolkit, developed by the HEP community, which allows the translation of Neural Networks built using tools like TensorFlow to a High-Level Synthesis description (e.g. C++) in order to implement this kind of ML algorithms on FPGAs.
This paper presents and discusses the activity started at the Physics and Astronomy department of University of Bologna and INFN-Bologna devoted to preliminary studies for the trigger systems of the Compact Muon Solenoid (CMS) experiment at the CERN LHC accelerator. A broader-purpose open-source project from Xilinx (a major FPGA producer) called PYNQ is being tested combined with the HLS4ML toolkit. The PYNQ purpose is to grant designers the possibility to exploit the benefits of programmable logic and microprocessors using the Python language. This software environment can be deployed on a variety of Xilinx platforms, from IOT devices like the ZYNQ-Z1 board, to the high performance ones, like Alveo accelerator cards and on the cloud AWS EC2 F1 instances.
Even though a rich documentation can be found on how to use hls4ml, a comprehensive description of the entire workflow from Python to FPGA is still hard to find. This work tries to fill this gap, presenting hardware and software set-up, together with performance tests on various baseline models used as benchmarks. The presence or not of some overhead causing an increase in latency will be investigated. Eventually, the consistency in the predictions of the NN, with respect to a more traditional way of interacting with the FPGA using C++ code, will be verified.
Use of declarative languages for HEP data analysis is an emerging, promising approach. One highly developed example is ADL (Analysis Description Language), an external domain specific language that expresses the analysis physics algorithm in a standard and unambiguous way, independent of frameworks. The most advanced infrastructure that executes an analysis written in the formal ADL syntax is the CutLang (CL) runtime interpreter based on traditional parsing tools. CL which was previously presented in this conference, has been further developed in the last years to cope with most LHC analyses. The new additions include full fledged histogramming and data-MC comparison facilities alongside an interface to a number of well known limit setting tools.
The ADL/CL architecture was thus far prepared and built with a general-purpose programming language, without formal computing expertise and has grown into a complex monolithic structure. To facilitate maintenance and further development of CL, while making it reusable in other (non-scientific) domains, we designed a protocol called Dynamic Domain Specific eXtensible Language (DDSXL) that modularizes its monolithic structure. The DDSXL protocol provides a set of strict rules that allow each researcher to work in their area of expertise and understand the work done without any expertise in other areas, completely independent of the programming languages and frameworks used.
DDSXL integrates a domain ecosystem (such as CL) into the development environment with a completely abstract structure using various OOP design patterns and with a set of rules determined through communication over the network. This protocol also integrates numerous programming languages and frameworks, allowing each developer to integrate it into their own module without the need for expertise in technologies from other modules.
Here, we introduce the latest developments in ADL/CL focusing on the working principles of the DDSXL protocol and integration.
The growing amount of data generated by the LHC requires a shift in how HEP analysis tasks are approached. Usually, the workflow involves opening a dataset, selecting events, and computing relevant physics quantities to aggregate into histograms and summary statistics. The required processing power is often so high that the work needs to be distributed over multiple cores and multiple nodes. This contribution establishes ROOT RDataFrame as the single entry point for virtually all HEP data analysis use cases. In fact, the typical steps of an analysis workflow can be easily and flexibly written with RDataFrame. Data ingestion from multiple sources is streamlined through a single interface. Relevant metadata can be made available to the dataframe and used during analysis execution. A declarative API offers the most common operations to the users, while transparently taking care of data processing optimisations. For example, it is possible to inject user-defined code to compute complex quantities, gather them into histograms or other relevant statistics, include large sets of systematic variations and use machine-learning inference kernels. A Pythonic layer allows dynamic injection of Python functions in the main C++ event loop. Finally, any RDataFrame application can seamlessly scale out to hundreds of cores on the same machine or multiple distributed nodes by changing a single line of code. The latest performance validation studies are also included in this contribution to demonstrate the efficiency of the tool on both the computation complexity and the scalability spectra.
The Jiangmen Underground Neutrino Observation (JUNO) experiment is designed to measure the neutrino mass order (NMO) using a 20-kton liquid scintillator detector to solve one of the biggest remaining puzzles in neutrino physics. Regarding the sensitivity of JUNO’s NMO measurement, besides the precise measurement of reactor neutrinos, the independent measurement of the atmospheric neutrino oscillation has great potential to enhance the sensitivity in the combined analysis. This heavily relies on the event reconstruction performance at high energy (GeV) level, including the angular resolution of the incident neutrino, the energy resolution, as well as the accuracy of the flavor identification etc.
In this contribution, we present a multi-purposed reconstruction algorithm for high energy particles in JUNO based on machine learning method. This includes extracting effective features from tens of thousands of PMT waveforms, as well as the development of two types of machine learning models (spherical GNN and planar CNN/Transformer). Novel techniques, such as improving the model convergence speed and eliminating reconstruction bias by maintaining the rotation-invariance are also discussed. Preliminary results based on JUNO simulation present reconstruction precision at an unprecedented level, showing great application potential for other large liquid scintillator detectors as well.
The sPHENIX experiment at RHIC requires substantial computing power for its complex reconstruction algorithms. One class of these algorithms is tasked with processing signals collected from the sPHENIX calorimeter subsystems, in order to extract signal features such as the amplitude, timing of the peak and the pedestal. These values, calculated for each channel, form the basis of event reconstruction in the calorimeter. The baseline technique used for signal feature extraction is fitting the signal waveforms in individual calorimeter channels with a parametrized function which optimally represents the signal shape. Due to the large channel count in the sPHENIX calorimeters, such fitting procedure may consume a non-trivial fraction of the total reconstruction time in a given event. To solve this problem, an alternative technique is being explored, based on a Machine Learning algorithm utilizing a Neural Network, in which the training data sample is produced using the traditional fitting technique. Initial results demonstrate an order of magnitude improvement in speed of signal processing while preserving acceptable level of accuracy. A prototype of a Keras/TensorFlow-based inference application has been created, to be deployed on the worker nodes running sPHENIX event reconstruction software. Comparison with the standard fitting technique has been performed. We present our experience with the design and implementation of the ML-based algorithm for the sPHENIX calorimeter signal processing.
Nowadays, medical images play a mainstay role in medical diagnosis, and computer tomography, nuclear magnetic resonance, ultrasound and other imaging technologies have become a powerful means of in vitro imaging. Extracting lesion information from these images can enable doctors to observe and diagnose the lesion more effectively, so as to improve the accuracy of quasi diagnosis. Therefore, the segmentation of medical images has important social value.The achievement of image semantic segmentation shows the potential of the Convolutional Neural Network (CNN) for medical image analysis. However, the application of the existing CNN model to the video neglect the correlation between frames of the video. A video semantic segmentation framework based on U-Net is proposed in this article that the feature map of the pre-frame is propagated to the next frame via an optical flow field. The accuracy of segmentation is boosted with slight performance degradation. The framework includes three parts: 1) a segmentation sub module using UNet to segment the current frame; 2) an optical flow feature extraction module to perform feature extraction on the motion information of the current frame and the previous frame; 3) a correction module, which assigns weights to the segmentation results and optical flow features to achieve the correction effect. The effectiveness of our proposed method is presented on two public datasets (Drosophila melanogaster electron micrographs, Chaos), and private Digital Subtraction Angiography (DSA) video datasets.
Earth Observation (EO) has experienced promising progress in the modern era via an impressive amount of research on establishing a state-of-the-art Machine Learning (ML) technique to learn a large dataset. Meanwhile, the scientific community has also extended the boundary of ML to the quantum system and exploited a new research area, so-called Quantum Machine Learning (QML), to integrate advantages from both ML and Quantum Computing (QC). Recent papers investigated the application of QML in the EO domain mainly based on Parameterized Quantum Circuits (PQCs), which are regarded as suitable architecture for quantum neural networks (QNNs) due to their potential to be efficiently simulated on near-term quantum hardware. But more contributions are still required in-depth, and various challenges should be tackled, such as large EO image size for the current quantum simulators, trainability of the quantum circuit, etc.
This work introduces a hybrid Quantum-Classical model performing reconstruction and classification simultaneously and explores its application for EO image multi-class classification. Moreover, we investigate for the first time the correlation between different PQC descriptors and the training results in the realistic EO use case. The results demonstrate that the hybrid model successfully achieves up to 10 class classification suggesting a potential usage of QNNs for a realistic context, and also hint at generic approaches for choosing the suitable PQC architecture for a given problem.
The search of New Physics through Dark Sectors is an exciting possibility to explain, among others, the origin of Dark Matter (DM). Within this context, the sensitivity study of a given experiment is a key point in estimating its potential for discovery. In this contribution we present the fully GEANT4-compatible Monte Carlo simulation package for production and propagation of DM particles, DMG4. In particular, we discuss the implementation of production cross-sections in its GEANT4-independent sub-package, DarkMatter, and DMG4 latest release, including a finer application programming interface (API) to GEANT4. We also cover its recent developments with faster and more accurate cross-sections computations, sampling methods, extended energy range, as well as the expansion of the package to $B-L$ and semi-visible models. We finally discuss the improvements in the simulations of New Physics processes specific to muon beams.
The Belle II is an experiment taking data from 2019 at the asymmetric e+e- SuperKEKB collider, a second generation B-factory, at Tsukuba, Japan. Its goal is to perform high precision measurements of flavor physics observables One of the many challenges of the experiment is to have a Monte Carlo simulation with very accurate modeling of the detector, including any variation occurring during data taking. To this goal, a dedicated “run dependent” Monte Carlo has been developed, using the detector conditions during data taking, as well as using beam induced background collected with random triggers. In this talk, the procedure for setup and processing of run-dependent Monte Carlo at Belle II will be shown.
The generation of unit-weight events for complex scattering processes presents a severe challenge to modern Monte Carlo event generators. Even when using sophisticated phase-space sampling techniques adapted to the underlying transition matrix elements, the efficiency for generating unit-weight events from weighted samples can become a limiting factor in practical applications. Here we present the combination of a two-staged unweighting procedure with a factorisation-aware matrix element emulator using neural networks which we make accessible in the Sherpa event generation framework. The algorithm can significantly accelerate the unweighting process, while it still guarantees unbiased sampling from the correct target distribution. We apply, validate and benchmark the approach in high-multiplicity LHC production processes, including Z/W+4 jets and t¯t+3 jets, where we find speed-up factors up to 60.
In Lattice Field Theory, one of the key drawbacks of the Markov Chain Monte Carlo(MCMC) simulation is the critical slowing down problem. Generative machine learning methods, such as normalizing flows, offer a promising solution to speed up MCMC simulations, especially in the critical region. However, training these models for different parameter values of the lattice theory is inefficient. We address this issue by interpolating or extrapolating the flow model in the critical region. We demonstrate the effectiveness of the proposed method for MCMC sampling in critical regions for multiple parameter values of phi4 scalar theory and U(1) gauge theory in 1+1 dimensions and compare its performance against HMC and flow-based methods.
The interpretation of detector data to observables that we can use to perform our physics analyses is an essential part in modern day experimental physics. It is also a field among the biggest profiteers in the recent advances of machine learning. In this contribution we want to highlight our event reconstruction efforts using Graph Neural Networks in the IceCube experiment. Using a pulse-based approach our network can adapt to the irregular architecture of our detector. We can show not only speed-ups on the order of magnitudes but also increases in reconstruction resolution of up to 20% compared to our current baseline algorithms. Our goal is to provide an easy-to-use but effective entry into machine learning-based event reconstruction for any physics
purpose: from neutrino oscillations, over beyond-the-standard-model searches, to neutrino astronomy. In addition, our software package is not just compatible with the current IceCube experiment, but also for future extensions, like the IceCube Upgrade or Gen2, as well as any neutrino detector.
AI is making an enormous impact on scientific discovery. Growing volumes of data across scientific domains are enabling the use of machine learning at ever increasing scale to accelerate discovery. Examples include using knowledge extraction and reasoning over large repositories of scientific publications to quickly study scientific questions or even come up with new questions, applying AI surrogate models to speed up simulation campaigns and generate critical new data and knowledge, leveraging generative models to construct new hypotheses and make predictions about them, and automating experimentation through robotic labs to enable tighter loops of hypothesis-test cycles. At the same time, new machine learning techniques based on “foundation models” are gaining focus in AI. Foundation models aim to learn “universal representations” from enormous amounts of data, typically using self-supervised or unsupervised training, with the goal to effectively enable subsequent downstream tasks. Prominent examples are large-language models, which have been driving state-of-the-art performance for natural language processing tasks. In this talk, we review how foundation models work by learning representations at scale and show examples of how they can further accelerate scientific discovery. By targeting bottlenecks in the scientific method, we discuss the potential of foundational models to impact a broad set of scientific challenges.
The production, validation and revision of data analysis applications is an iterative process that occupies a large fraction of a researcher's time-to-publication.
Providing interfaces that are simpler to use correctly and more performant out-of-the-box not only reduces the community's average time-to-insight but it also unlocks completely novel approaches that were previously impractically slow or complex.
All of the above becomes especially true at the unprecedented integrated luminosity that will be achieved during LHC Run 3 and beyond, which further motivates the fast-paced evolution that has been taking place in the HEP analysis software ecosystem in recent years.
This talk analyzes the trends and challenges that characterize this evolution.
In particular we focus on the emerging pattern of strongly decoupling end-user analysis logic from low-level I/O and work scheduling by interposing high-level interfaces that gather semantic information on the particular analysis application.
We show how this pattern brings benefits to analysis ergonomics and reproducibility, as well as opportunities for performance optimizations.
We highlight potential issues in terms of extensibility and debugging experience, together with possible mitigations.
Finally, we explore the consequences of this convergent evolution towards smart, HEP-aware "middle-man analysis software" in the context of future analysis facilities and data formats:
both will have to support a bazaar of high-level solutions while optimizing for typical low-level data structures and access patterns.
Our goal is to provide novel insights useful to boost the ever-ongoing, stimulating conversation that, since always, characterizes the HEP software community.
The Belle II experiment has been taking data at the SuperKEKB collider since 2018. Particle identification is a key component of the reconstruction, and several detector upgrades from Belle to Belle II were designed to maintain performance with the higher background rates.
We present a method for a data-driven calibration that improves the overall particle identification performance and is resilient against imperfections in the calibration of individual detectors. Our framework also defines a “blame” metric that identifies the detectors with largest contributions to correctly and incorrectly assigned particle hypotheses.
The size, complexity, and duration of telescope surveys are growing beyond the capacity of traditional methods for scheduling observations. Scheduling algorithms must have the capacity to balance multiple (often competing) observational and scientific goals, address both short-term and long-term considerations, and adapt to rapidly changing stochastic elements (e.g., weather). Reinforcement learning (RL) methods have the potential to significantly automate the scheduling and operation of telescope campaigns and greatly reduce the amount of human effort needed to vet schedules produced via costly simulation work.
In this work, we present the application of an RL-based scheduler, which uses a Markov decision process framework to construct scheduling policies in a way that is scalable, recoverable in the case of interruptions during observation, and computationally efficient for surveys that can include over a hundred observations.
We simulate surveys of objects in the Galactic equator, assuming the location and optics of Stone Edge Observatory. We present schedules generated by our RL technique. While initial results are not comparable to human-tuned schedules, we are encouraged by the technique’s scalable, automated approach. We examine how well an RL agent’s produced schedules compare to human-designed schedules by comparing different formulations of cumulative reward for these schedules. We also investigate the success of our model as we vary the complexity of the telescope environment and as we vary the reward function. We present this work as a motivation to explore more complex situations and surveys.
In this work we present the adaptation of the popular clustering algorithm DBSCAN to reconstruct the primary vertex (PV) at the hardware trigger level in collisions at the High-Luminosity LHC. Nominally, PV reconstruction is performed by a simple histogram-based algorithm. The main challenge in PV reconstruction is that the particle tracks need to be processed in a low-latency environment $\mathcal{O}$(1 μs). To achieve this an accelerated version of the DBSCAN algorithm was developed to run in a Field Programmable Gate Array (FPGA). A CPU-optimized version of DBSCAN was implemented in C++ to serve as a benchmark for comparison. The CPU version of DBSCAN resulted in an average PV reconstruction latency of 93 μs, while the FPGA firmware only had a latency of 0.73 μs resulting in a 127x speedup. The speedup is a result of running all the input tracks in parallel, which ultimately results in high resource consumption, of up to 48.6 % of the available logic. Most of the logic was attributed to the use of sorting networks that allows for the parallel processing of the input tracks. To tune the firmware for a specific latency and resource usage constraints, the firmware has been parametrized by the number of input tracks to consider at a time. The accelerated DBSCAN method yielded a higher PV reconstruction efficiency when compared to the simpler histogram-based method. As clustering applications are prominent in High Energy Physics, we modified the accelerated DBSCAN algorithm for higher-dimensional datasets.
Binned template-fitting is one of the most important tools in the High-Energy physics (HEP) statistics toolbox. Statistical models based on combinations of histograms are often the last step in a HEP physics analysis. Both model and data can be represented in a standardized format - HistFactory (C++/XML) and more recently pyHF (Python/JSON), have taken advantage of that fact to make template fits both easy and reproducible.
We present a port of pyHF to the Julia programming language much like the way pyHF started out as a port of the C++ HistFactory. The new package, LiteHF.jl, provides an independent, fully compatible implementation of the pyHF JSON specification. Since Julia compiles to native code via LLVM and has a lower function-call overhead than Python, LiteHF.jl can outperform the original pyHF. We utilize Julia's meta-programming capabilities to keep the implementation simple and flexible, and the likelihood gradient is obtained for free via automatic differentiation. LiteHF.jl also makes it easy for the user to add custom template modifiers.
Models generated by LiteHF.jl can be used directly in BAT.jl (Bayesian Analysis Toolkit) in Julia and other Julia inference packages. This enables full Bayesian inference with a few simple commands. BAT.jl provides a full suite of analysis tools including MCMC, nested sampling, automatic re-parametrization, Bayesian evidence calculation, and plotting. A user-friendly likelihoodist inference path for LiteHF.jl is available as well.
The usage of Deep Neural Networks (DNNs) as multi-classifiers is widespread in modern HEP analyses. In standard categorisation methods, the high-dimensional output of the DNN is often reduced to a one-dimensional distribution by exclusively passing the information about the highest class score to the statistical inference method. Correlations to other classes are hereby omitted.
Moreover, in common statistical inference tools, the classification values need to be binned, which relies on the researcher's expertise and is often non-trivial. To overcome the challenge of binning multiple dimensions and preserving the correlations of the event-related classification information, we perform K-means clustering on the high-dimensional DNN output to create bins without marginalising any axes.
We evaluate our method in the context of a simulated cross section measurement at the CMS experiment, showing an increased expected sensitivity over the standard binning approach.
To support the needs of novel collider analyses such as long-lived particle searches, considerable computing resources are spent forward-copying data products from low-level data tiers like CMS AOD and MiniAOD to reduced data formats for end-user analysis tasks. In the HL-LHC era, it will be increasingly difficult to ensure online access to low-level data formats. In this talk, we present a novel online data storage mechanism that obviates the need for data tiers by storing individual data products in column objects using RadosGW, a Ceph object store technology. Benchmarks of the performance of storage and retrieval of the event data through the S3 protocol for a prototype of typical analysis workflows will be presented, and compared with traditional xrootd ROOT file access protocols.
The large statistical fluctuations in the ionization energy loss high energy physics process by charged particles in gaseous detectors implies that many measurements are needed along the particle track to get a precise mean, and this represent a limit to the particle separation capabilities that should be overcome in the design of future colliders. The cluster counting technique (dN/dx) represents a valid alternative which takes advantage of the Poisson nature of the primary ionization process and offers a more statistically robust method to infer mass information. Simulation studies by using Garfield++ and Geant4 prove that the cluster counting allows to reach a resolution two times better than traditional dE/dx method over a wide momentum range in the use-case of a helium-based drift chamber. It consists in singling out, in ever recorded detector signal, the electron peak structures related to the arrival of the electrons belonging to a single primary ionization act (cluster) on the anode wire. However, the search for hundreds of electron peaks and the cluster recognition in real data-driven waveform signals is extremely challenge because of their superimposition in the time scale. The state-of-the-art open-source algorithms fail in finding the expected number even in low-noise conditions. In this talk, we present cutting-edge algorithms to search for electrons peaks and identify ionization clusters in experimental data using the latest available computing tools and physics knowledge. To validate the algorithms and show the advantages of the cluster counting technique, two beam tests has been performed at CERN/H8 facility collecting data with different helium based gas mixtures at different gas gains and angles between the wire direction and the ionizing tracks using a muon beam ranging from 40 GeV/c to 180 GeV/c on a setup made of different size drift tubes, equipped with different diameter sense wires. We show the data analysis results concerning the ascertainment of the Poisson nature of the cluster counting technique, the establishment of the most efficient cluster counting and electrons clustering algorithms among the various ones proposed, and the definition of the limiting effects for a fully efficient cluster counting, like the cluster dimensions, the space charge density around the sense wire and the dependence of the counting efficiency versus the beam particle impact parameter.
Due to the massive nature of HEP data, performance has always been a factor in its analysis and processing. Languages like C++ would be fast enough but are often challenging to grasp for beginners, and can be difficult to iterate quickly in an interactive environment . On the other hand, the ease of writing code and extensive library ecosystem make Python an enticing choice for data analysis. Increasing interoperability between Python and C++, as well as the introduction of libraries such as Numba, had been accelerating Python’s traction in the HEP community.
Vector is a Python library for 2D, 3D, and Lorentz vectors, especially arrays of vectors, designed to solve common physics problems in a NumPy-like way. Vector currently supports pure Python Object, NumPy, Awkward, and Numba-based (Numba-Object, Numba-Awkward) backends.
We are introducing the library, with a focus on the Numba-based Awkward Lorentz vectors to perform operations on HEP data without compromising on the speed and the ease of writing code. Awkward is one of the core libraries of the Scikit-HEP ecosystem that allows data analysis with jagged arrays. Numba, on the other hand, allows Python codebases to harness the power of Just-In-Time compilation, enabling the Python code to be compiled before executing.
The library seamlessly integrates with the existing Scikit-HEP libraries, especially with Awkward. Our talk will start with an introduction to this library, with the main agenda of compiling Awkward Lorentz vectors with Numba. Furthermore, Vector is still under active development and preparing for a 1.0 release; hence, we will also take in user feedback while discussing the overall development roadmap.
In the past years the CMS software framework (CMSSW) has been extended to offload part of the physics reconstruction to NVIDIA GPUs. This can achieve a higher computational efficiency, but it adds extra complexity to the design of dedicated data centres and the use of opportunistic resources, like HPC centres. A possible solution to increase the flexibility of heterogeneous clusters is to offload part of the computations to GPUs installed in external, dedicated nodes.
Our studies on this topic have been able to achieve high-throughput, low-latency data transfers to and from a remote NVIDIA GPU across Mellanox NICs, using the Remote Direct Memory Access (RDMA) technology to access the GPU memory without involving either nodes' operating system.
In this work we present our approach based on the Open MPI framework, and compare the performance of data transfers of local and remote GPUs from different generations, using different communication libraries and network protocols.
HEPD-02 is a new, upgraded version of the High Energy Particle Detector as part of a suite of instruments for the second mission of the China Seismo-Electromagnetic Satellite (CSES-02) to be launched in 2023. Designed and realized by the Italian Collaboration LIMADOU of the CSES program, it is optimized to identify fluxes of charged particles (mostly electrons and protons) and determine their energy and incoming direction, providing new measurements of cosmic rays at low energies (up to 200 MeV for protons and up to 100 MeV for electrons). As already experienced in the previous version of the detector, i.e. HEPD-01 on board CSES-01, the reconstruction of the collected events will be performed using a strategy based entirely on deep learning~(DL). This choice is motivated by the fact that deep learning models are very effective when working with particle detectors, in which a variety of electrical signals are produced and may be treated as low-level features. The new HEPD-02 DL-based event reconstruction will be trained on dedicated Monte Carlo simulation and tested on both simulated and test-beam data. Moreover, the collaboration is working on new deep-learning approaches to increase the robustness of the performance assessments, especially when passing from simulated samples to real data, and the interpretability of these algorithms to be used in future analysis.
In this contribution, the entire event reconstruction of the HEPD-02 detector will be described and the performance will be reported.
In real-time computing facilities - system, network, and security monitoring are core components to run efficiently and effectively. As there are many diverse functions that can go awry, such as load, network, processes, and power issues, having a well-functioning monitoring system is imperative. In many facilities you will see the standard set of tools such as Ganglia, Grafana, Nagios, etc. While these are noteworthy, the diversity of tools used clearly points to an adequacy gap (none is self-sufficient) and furthermore, they lack in their alerting and anomaly detection capabilities beyond the binary events.
The ELK stack (Elasticsearch, Logstash, & Kibana) is the combination of three open-source projects to ingest, search, and visualize logs and data. The basic free license of ELK enables these features but overall is limited for use in a real-time facility. Instead, by leveraging the full capabilities of ELK, the gained features are significant. ELK offerings provide many enhancements from single sign-on and means to control Authorization for security, including alerting for unusual events, Machine Learning capabilities, and many other tools that are useful for advanced data analytics.
With the advanced set of Machine Learning techniques, the ELK toolbox adds features such as clustering, time series decomposition, and correlation analysis. For example, these Machine Learning techniques can be applied to alerts, providing you with the details of events for an unusual uptick in resource usage, if there is rare or high process activity, or unusual port activity. A standard monitoring tool would typically not have such capability.
In this report, will discuss the details and features of how a facility could benefit from the open source and premium versions of the ELK stack. We will provide procedures and details for configuring these tools, and how it benefits compute facility monitoring postures within a scientific based environment.
We hold these truths to be self-evident: that all physics problems are created unequal, that they are endowed with their unique data structures and symmetries, that among these are tensor transformation laws, Lorentz symmetry, and permutation equivariance. A lot of attention has been paid to the applications of common machine learning methods in physics experiments and theory. However, much less attention is paid to the methods themselves and their viability as physics modeling tools. One of the most fundamental aspects of modeling physical phenomena is the identification of the symmetries that govern them. Incorporating symmetries into a model can reduce the risk of over-parameterization, and consequently improve a model's robustness and predictive power. As usage of neural networks continues to grow in the field of particle physics, more effort will need to be invested in narrowing the gap between the black-box models of ML and the analytic models of physics.
Building off of previous work, we demonstrate how careful choices in the details of network design – creating a model both simpler and more grounded in physics than the traditional approaches – can yield state-of-the-art performance within the context of problems including jet tagging and particle four-momentum reconstruction. We present the Permutation-Equivariant and Lorentz-Invariant or Covariant Aggregator Network (PELICAN), which is based on three key ideas: symmetry under permutations of particles, Lorentz symmetry, and the ambiguity of the aggregation process in Graph Neural Networks. For the first, we use the most general permutation-equivariant layer acting on rank 2 tensors, which can be viewed as a maximal generalization of Message Passing. For the second, we use classical theorems of Invariants Theory to reduce the 4-vector inputs to a tensor of Lorentz-invariant latent quantities. Finally, the flexibility of the aggregation process commonly used in Graph Networks can be leveraged for improved accuracy, in particular to allow variable scaling with the size of the input.
The ever growing increase of computing power necessary for the storage and data analysis of the high-energy physics experiments at CERN requires performance optimization of the existing and planned IT resources.
One of the main computing capacity consumers in the HEP software workflow is the data analysis. To optimize the resource usage, the concept of Analysis Facility (AF) for Run 3 has been introduced. The AFs are special computing centres with a combination of CPU and fast interconnected disk storage resources, allowing for rapid turnaround of analysis tasks on a subset of data. This in turn allows for optimization of the analysis process and the codes before the analysis is performed on the large data samples on the WLCG Grid.
In this paper, the structure and the first benchmark tests of the Wigner AF are presented.
Particle physics experiments spend large amounts of computational effort on Monte Carlo simulations. Due to the computational expense of simulations, they are often executed and stored in large distributed computing clusters. To lessen the computational cost, physicists have introduced alternatives to speed up the simulation. Generative Adversarial Networks (GANs) are an excellent Deep-Learning-based alternative due to their ability to imitate probability distributions. Concretely, one of the more tackled problems is calorimeter simulations since they involve a large portion of the computing power. GANs simulate calorimeter particle showers with good accuracy and reduced computational resources. Previous works have already explored the generation of calorimeter simulation data with GANs, but in most cases as a centralized perspective (i.e., where the dataset is present on the training node).
This separation creates a disparity between the training data generation (i.e., in distributed clusters) and training (i.e., centralized), introducing a limiting factor to the amount of data the centralized node can use to train. Federated Learning has arisen as a successful decentralized training solution where data is non-necessarily balanced, independent, and identically distributed (IID). Federated Learning is a training method where a group of \textit{collaborators} trains a model by sharing training updates with an \textit{aggregator}. The sparsity and distributed nature of the simulated data pairs favorably with the features of Federated Learning. In this work, we introduce new federated learning-based approaches for GAN training and test them on the 2DGAN model*. This work covers different training schemes for GANs with FL (e.g., centralized discriminator or centralized generator). Our work provides insights into the various architectures by performing model training and extracting performance metrics. The results permit the evaluation of the effectiveness of the different strategies.
The unprecedented volume of data and Monte Carlo simulations at the HL-LHC will pose increasing challenges for data analysis both in terms of computing resource requirements as well as "time to insight". Precision measurements with present LHC data already face many of these challenges today. We will discuss performance scaling and optimization of RDataFrame for complex physics analyses, including interoperability with Eigen, Boost Histograms, and the python ecosystem to enable this.
Neutrino experiments that use liquid argon time projection chamber (LArTPC) detectors are growing bigger and expect to see more neutrinos with next generation beams, and therefore will require more computing resources to reach their physics goals of measuring CP violation in the neutrino sector and exploring anomalies. These resources can be used to their full capacity by incorporating parallelism through multi-threading and vectorization within algorithms, and by running these algorithms on High Performance Computers (HPCs). A HPC workflow is being developed for LArTPC experiments to take advantage of all of levels of parallelism, within and across nodes. It will be used to enhance the statistics available for use in physics analysis and will also make it possible to efficiently incorporate AI algorithms. Additional opportunities to incorporate parallelism within LArTPC algorithms is also being explored.
Ultra-low mass and high granularity Drift Chambers fulfill the requirements for tracking systems of modern High Energy Physics experiments at the future high luminosity facilities (FCC-ee or CEPC).
\indent We present how, in Helium based gas mixtures, by measuring the arrival times of each individual ionization cluster and by using proper statistical tools, it is possible to perform a bias free estimate of the impact parameter and a precise PID. Typically, in a helium-based drift chamber, consecutive ionization clusters are separated in time by a few ns, at small impact parameters up to a few tens of ns, at large impact parameters. For an efficient application of the cluster timing technique, consisting in isolating pulses due to different ionization cluster, it is, therefore, necessary to have read-out interfaces capable of processing high speed signals. We present a full front-end chain, able to treat the low amplitude sense wire signals (a $\sim$few mV), converted from analog to digital with the use of FADCs, with a high bandwidth ($\sim$1 GHz). The requirement of high sampling frequency, together with long drift times, usually of the order of several hundreds of ns, and large number of readout channels, typically of the order of tens of thousand, impose a sizable data reduction, meanwhile preserving all relevant information. Measuring both the amplitude and the arrival time of each peak in the signal associated to each ionization cluster is the minimum requirement on the data transfer for storage to prevent any significant data loss. An electronic board including a Fast ADC and an FPGA for a real-time processing of the drift chamber signals is presented. Various peak finding algorithms, implemented and tested in real time with VHDL code, are also compared.
RooFit is a toolkit for statistical modeling and fitting, and together with RooStats it is used for measurements and statistical tests by most experiments in particle physics, particularly the LHC experiments. As the LHC program progresses, physics analyses become more computationally demanding. Therefore, recent RooFit developments were focused on performance optimization, in particular to speed up the minimization of the negative log likelihood when fitting a model to a dataset.
Two such improvements will be discussed in this session: gradient-based CPU parallelization and batched computations. The former strategy parallelizes the calculation of the gradient in the line search approach (MIGRAD) used for minimum likelihood estimation in RooFit. Here, the parallelization approach and computational tools used will be discussed. The second strategy comprises a restructuring of the computational graph associated with a model and dataset in order to allow for batched computations. With batched computations RooFit can evaluate batches of events simultaneously per computational graph node, rather than event by event. This simultaneous computation can be either supported by vectorization or GPU parallelization.
Throughout this session, there will be an emphasis on detailed benchmarking and how it was used to optimize various parts of the developed performance improvements, including load balancing and the reduction of communication overhead. Benchmarks are primarily shown for cutting-edge Higgs combination fits, where the developed improvements were intended to achieve order-of-magnitude improvements in execution wall time.
There are established classical methods to reconstruct particle tracks from recorded hits on the particle detectors. Current algorithms do this either by cut in some features, like recorded time of the hits, or by the fitting process. This is potentially error prone and resource consuming. For high noise events, these issues are more critical and this method might even fail. We have been developing artificial neural networks which can learn to separate noise from signal in the simulated data. The data sample we use for this purpose is Monte-Carlo simulated Bhabha events generated by BESIII offline software system. We study different types of deep neural networks and their effectiveness to remove the noise which happens in the main drift chamber of BESIII from various origins.
The fully connected networks that we first try find sophisticated cuts in hit features of each cell of the detector. These features include raw time of a hit and the recorded charge associated to it. This leads to about 85 percent efficiency and purity of the signal separation. This sets up a lower limit for us since such a network judges every hit only by its own features. Next, we develop a CNN network and show that with information of only four neighboring cells, the noise removal happens with 99 percent purity and efficiency at the same time. We discuss the effectiveness of the network for events with different noise levels.
The main drift chamber is consisted of 6796 sense wires arranged in 43 layers. The structure of the wire system is known and therefore we also examine the idea of looking at the main drift chamber structure as a graph. We make a model based on graph convolutional layers and chose node classification approach. We include a message passing process in three of the hidden layers and get 95 percent efficiency and purity for the noise removal. We then describe the results of our network for other events such as j/psi to p+ p_ pi+ pi-. In the end, we compare all of this with the classical methods.
The alpaka library is a header-only C++17 abstraction library for development across hardware accelerators (CPUs, GPUs, FPGAs). Its aim is to provide performance portability across accelerators through the abstraction (not hiding!) of the underlying levels of parallelism. In this talk we will show the concepts behind alpaka, how it is mapped to the various underlying hardware models, and show the features introduced over the last year. In addition, we will also (shortly) present the software ecosystem surrounding alpaka.
In recent years, new technologies and new approaches have been developed in academia and industry to face the necessity to both handle and easily visualize huge amounts of data, the so-called “big data”. The increasing volume and complexity of HEP data challenge the HEP community to develop simpler and yet powerful interfaces based on parallel computing on heterogeneous platforms. Good examples are 1) the pandas framework, which is an open source set of data analysis tools allowing the configuration and fast manipulation of data structures, and 2) the Jupyter Notebook, which is a web application that allows users to create and share documents that contain live executable code. Similarly to the python-based pandas, ROOT::RDataFrame offers another parallel data analysis tool also providing a C++ interface as well as Python bindings (thus compatible with the Jupyter Notebook).
In this contribution we aim to document our experience and performance studies in deploying an HEP analysis workflow, in a realtime analysis fashion, being developed within a Jupyter environment (from the selection criteria to extract the physical signal to the fitting tasks). For this purpose we exploit CMS Run1 Open Data to extract the signal associated with the decay of a beauty meson particle.
We will discuss how the combination of HEP specific tools and technologies coming from the much wider data analysis world may result in a powerful and easy-to-use tool for a HEP data analyst. Among these tools we will test the advantage of offloading some of the most compute intensive tasks on heterogeneous architectures through GooFit, a tool that exploits the computational capabilities of GPUs to perform maximum likelihood fits.
Monte Carlo simulation is a vital tool for all physics programmes of particle physics experiments. Their accuracy and reliability in reproducing detector response is of the utmost importance. For the LHCb experiment, which is embarking on a new data-take era with an upgraded detector, a full suite of verifications has been put in place for its simulation software to ensure the quality of the samples produced. The chain of tests exploits the LHCb infrastructure for software quality control.
In this contribution we will describe the procedure and the tests that have been put in place. First-level verifications are performed as soon as new software is submitted for integration in the LHCb GitLab repository. They range from Continous Integration (CI) tests to, so called, 'nightlies': short jobs run overnight to verify the integrity of the software. More in-depth performance and regression tests are carried with dedicated infrastructure (LHCbPR), which compares samples of O(1000) events. Simulation data quality shifters look for anomalies and alert the authors in the case of unexpected changes. Work is also in progress to enable the automatic verification of important variable distributions from a small number of simulated events before the whole production is launched.
We developed supervised and unsupervised quantum machine learning models for anomaly detection tasks at the Large Hadron Collider at CERN. Current Noisy Intermediate Scale Quantum (NISQ) devices have a limited number of qubits and qubit coherence. We designed dimensionality reduction models based on Autoencoders to accommodate the constraints dictated by the quantum hardware. Different designs were investigated, such as convolutional and Sinkhorn Autoencoder architectures, that can compress HEP data while preserving the class structure of the original dataset. The quantum algorithms are trained to identify anomalies in the latent spaces generated by the Autoencoders. A collection of results for a quantum classifier and a set of quantum anomaly detection algorithms is presented. Our study is supported by a performance comparison to the corresponding classical models.
Compared to LHC Run 1 and Run 2, future HEP experiments, e.g. at the HL-LHC, will increase the volume of generated data by an order of magnitude. In order to sustain the expected analysis throughput, ROOT's RNTuple I/O subsystem has been engineered to overcome the bottlenecks of the TTree I/O subsystem, focusing also on a compact data format, asynchronous and parallel requests, and a layered architecture that allows supporting distributed filesystem-less storage systems, e.g. HPC-oriented object stores.
In a previous publication, we introduced and evaluated the RNTuple's native backend for Intel DAOS. Since its first prototype, we carried out a number of improvements both on RNTuple and its DAOS backend aiming to saturate the physical link, such as support for vector writes and an improved RNTuple-to-DAOS mapping, only to name a few. In parallel, the latest developments allow for better integration between RNTuple and ROOT's storage-agnostic, declarative interface to write HEP analyses, RDataFrame.
In this work, we contribute with the following: (i) a redesign and evaluation of the RNTuple DAOS backend, including a mechanism for efficient population of the object store based on existing data; and (ii) an experimental evaluation of single-node and distributed analyses using RDataFrame as a proxy between the user and RNTuple, showing a significant increase in the analysis throughput for typical HEP workflows.
Through its TMVA package, ROOT provides and connects to machine learning tools for data analysis at HEP experiments and beyond. In addition, ROOT provides through its powerful I/O system and RDataFrame analysis tools the capability to efficiently select and query input data from large data sets as typically used in HEP analysis. At the same time, several existing Machine Learning tools exist in a diversified landscape outside of ROOT.
In this talk, we present new developments in ROOT that bridge the gap between external tools and ROOT, by providing better interoperability in a common software ecosystem for Machine Learning in data analysis.
We present recently included features in TMVA allowing for generating batches of events for ROOT I/O and RDataFrame to train efficiently machine learning models using Python tools such as Tensorflow and PyTorch. This will facilitate direct access to the ROOT input data when training using external tools. Another focus is put on fast machine learning inference, which enables analysts to deploy their machine learning models rapidly on large scale datasets. A new tool has been recently developed in ROOT, SOFIE, allowing for generating C++ code for evaluation of deep learning models, which are trained from external tools. This provides the capability to better integrate Machine Learning model evaluation in HEP data analysis.
The new developments are paired with newly designed C++ and Python interfaces for TMVA supporting modern C++ paradigms and providing full interoperability in the Python ecosystem.
MadMiner is a python module that implements a powerful family of multivariate inference techniques that leverage both matrix element information and machine learning.
This multivariate approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the under-lying physics or detector response.
In this paper, we address some of the challenges arising from deploying MadMiner in a real scale HEP analysis with the goal of offering a new tool in HEP that is easily accessible.
The proposed approach streamlines a typical MadMiner pipeline into a parametrized yadage workflow in yaml files. The general workflow is split in two yadage subworkflows, one dealing with the physics dependencies and the other with the ML ones. After that, the worfklow is deployed using REANA, a reproducible research data analysis platform that takes care of flexibility, scalability, reusability and reproducibility features.
To test the performane of our method, we performed scaling experiments for a MadMiner workflow on the National Energy Research Sscientific Computer luster (NERSC) cluster with an HTCondor backend.
All the stages of the physics subworkfow had a linear dependency between resources & walltime and number of event generated. This trend has allowed us to run a typical MadMiner workflow consiting of 1M events and the generation step just used 2930 MB of memory and walltime of 2919s.
The feature complexity of data recorded by particle detectors combined with the availability of large simulated datasets presents a unique environment for applying state-of-the-art machine learning (ML) architectures to physics problems. We present the Simplified Cylindrical Detector (SCD): a fully configurable GEANT4 calorimeter simulation which mimics the granularity and response characteristics of general purpose detectors at the LHC. The SCD will be released as a public software to accelerate development of ML-based reconstruction and calorimeter models. Two use-cases based on data from the SCD are presented: first, an ML-based global particle reconstruction which shows potential to outperform traditional approaches. Second, a fast simulation model transforming a set of truth particles into a set of reconstructed particles.
To sustain the harsher conditions of the high-luminosity LHC, the CMS Collaboration is designing a novel endcap calorimeter system. The new calorimeter will predominantly use silicon sensors to achieve sufficient radiation tolerance and will maintain highly granular information in the readout to help mitigate the effects of the pile up. In regions characterized by lower radiation levels, small scintillator tiles with individual SiPM on-tile readout are employed. A unique reconstruction framework (TICL: The Iterative CLustering) is being developed within the CMS Software CMSSW to fully exploit the granularity and other significant detector features, such as particle identification and precision timing, with a view to mitigating pile up in the very dense environment of HL-LHC. The TICL framework has been thought of with heterogeneous computing in mind: the algorithms and their data structures are designed to be executed on GPUs. In addition, geometry agnostic data structures have been designed to provide fast navigation and searching capabilities. Seeding capabilities (also exploiting information coming from other detectors), dynamic cluster masking, energy calibration, and particle identification are the main components of the framework. To allow for maximal flexibility, TICL allows the composition of different combinations of modules that can be chained together in an iterative fashion. The presenter will describe the design of TICL pattern recognition algorithms and advanced neural networks under development, as well as future plans.
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers- long short-term memory and gated recurrent unit- within the hls4ml [1] framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.
[1] J. Duarte et al., “Fast inference of deep neural networks in FPGAs for particle physics”, JINST 13 (2018) P07027, arXiv:1804.06913
The classification of HEP events, or separating signal events from the background, is one of the most important analysis tasks in High Energy Physics (HEP), and a foundational task in the search for new phenomena. Complex deep learning-based models have been fundamental for achieving accurate and outstanding performance in this classification task. However, the quantification of the uncertainty has traditionally been neglected when deep learning-based methods are used, despite its critical importance in scientific applications [1], [2].
In this work, we propose a Bayesian deep learning-based method for measuring uncertainty when classification of HEP events is performed using a deep neural network classifier. The work is focused on the use of the Monte Carlo Dropout (MC-Dropout) method, a variational inference technique proposed in [3] that is based on Dropout [4], the well-known regularization technique used to overcome overfitting. The Monte Carlo Dropout method allows production of the posterior distribution of the network weights by training a dropout network that approximates Bayesian inference. Thus, a Bayesian deep neural network considers a distribution over network parameters instead of a single point. The traditional dropout method randomly toggles off some neurons, with probability $D_{rate}$ during the training stage. However, the MC-Dropout method toggles off neurons both during the training stage and also during the inference stage.
In this work, we use the publicly available Higgs dataset described in [5]. This is simulated data, and the problem is to distinguish the signal from the background, where the signal corresponds to a Higgs boson decaying to a pair of bottom quarks according to the process: $gg \rightarrow H^0 \rightarrow W^{\mp} H^{\pm} \rightarrow W^{\mp} W^{\pm} h^0 \rightarrow W^{\mp} W^{\pm} b \bar{b}$. Furthermore, we plan to apply the proposed method using simulated data of the $\omega$ meson production off nuclear targets. Here, the problem is that the $\omega$ meson decays into four final-state particles: $\pi^+$ $\pi^-$ $\gamma$ $\gamma$, and the pions can also decay into muons and neutrinos, especially at low momentum [6].
The methodology of this work includes (i) training of Bayesian deep learning-based classifiers for the identification of signal and background (binary classification), using the Monte Carlo Dropout method, (ii) evaluate different $D_{rate}$; (iii) evaluate the classification performance; and (iv) compute three uncertainty measures including variance, mutual information, and predictive entropy. Preliminary results show on average 0.66 accuracy, 0.68 precision, 0.72 recall, and 0.70 F1 score, when a Monte Carlo Dropout model-based is used, with three hidden layers with 300 neurons each, and $D_{rate}=0.5$. We expect to increase the classification performance using hyper-parameters optimization, evaluating different network architectures, and varying the $D_{rate}$ parameter.
[1] Aishik Ghosh, Benjamin Nachman, and Daniel Whiteson. Uncertainty-aware machine learning for high energy physics. Phys. Rev. D, 104:056026, Sep 2021.
[2] Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U. Rajendra Acharya, Vladimir Makarenkov, and Saeid Nahavandi. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243–297, 2021.
[3] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
[4] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
[5] Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5(1):1–9, 2014.
[6] Andrés Bórquez. The $\omega$ hadronization studies in the nuclear medium with the CLAS spectrometer. Master’s thesis, UTFSM, Valparaíso, Chile, 2021.
Precision simulations for collider phenomenology require intensive evaluations of complicated scattering amplitudes. Uncovering hidden simplicity in these basic building blocks of quantum field theory can lead us to new, efficient methods to obtain the necessary theoretical predictions. In this talk I will explore some new approaches to multi-scale loop amplitudes that can overcome conventional bottlenecks in their evaluation. Computational techniques based on evaluations over finite fields are now being used to obtain analytic information from numerical evaluations and can lead to fast and efficient implementations that can be used directly in Monte Carlo simulations. In some cases even the most compact representations of amplitudes can still mean prohibitive evaluation times. Approximating these complicated functions with Machine Learning technology has the potential to provide an order of magnitude improvement in evaluation times yet it remains a challenge to keep deviations from the complete amplitude under quantitative control. I will present some advances in the use of Neural Networks to provide reliable amplitude evaluations.
In this talk I discuss how machine learning can be used for identifying underlying mathematical structures in physical systems. Geared towards relevant structures in Beyond the Standard Model Physics I will focus on how we can use ML to discover symmetries. I discuss how standard ML pipelines have to be adopted to enable such discoveries and comment on further applications of these methods in physics beyond symmetries.
I will discuss fundamental particle physics intersections with quantum science and technology including embedding challenging problems on quantum computation architectures
PHASM is a software toolkit, currently under development, for creating AI-based surrogate models of scientific code. AI-based surrogate models are widely used for creating fast and inverse simulations. The project anticipates an additional, future use case: adapting legacy code to modern hardware. Data centers are investing in heterogeneous hardware such as GPUs and FPGAs; meanwhile, many important codebases are unable to take advantage of this hardware's superior parallellism without undergoing a costly rewrite. An alternative is to train a neural net surrogate model to mimic the computationally intensive functions in the code, and deploy the surrogate on the exotic hardware instead. PHASM addresses three specific challenges: (1) systematically discovering which functions can be effectively replaced with a surrogate, (2) automatically identifying, for a given function, the true space of inputs and outputs including those not apparent from the type signature, and (3) integrating a machine learning model into a legacy codebase cleanly and with a high level of abstraction. In the first year of development, a proof of concept has been developed for each challenge. A surrogate API makes it easy to bring PyTorch models into the C++ ecosystem and uses profunctor optics to establish a two-way data binding between C++ datatypes and tensors. A model variable discovery tool performs a dynamic binary analysis using Intel PIN in order to identify a target function's model variable space, including types, shapes, and ranges, and generate the optics code necessary to bind the model to the function. Future work may include exploring the limits of surrogate models for functions of increasing size and complexity, and adaptively generating synthetic training data based on uncertainty estimates.
A novel data collection system, known as Level-1 (L1) Scouting, is being introduced as part of the L1 trigger of the CMS experiment at the CERN Large Hadron Collider. The L1 trigger of CMS, implemented in FPGA-based hardware, selects events at 100 kHz for full read-out, within a short 3 microsecond latency window. The L1 Scouting system collects and stores the reconstructed particle primitives and intermediate information of the L1 trigger processing chain, at the full 40 MHz bunch crossing rate. This system will provide vast amounts of data for detector diagnostics, luminosity measurements, and the study of otherwise inaccessible signatures, either too common to fit in the L1 accept budget, or with requirements orthogonal to the standard physics triggers. Demonstrator systems consisting of PCIe-based FPGA stream-processing boards and associated host PCs have ben deployed at CMS to capture data from both the Global Muon Trigger (GMT), and Calorimeter Trigger sub-systems. In addition, a neural-network based re-calibration and fake identification engine has been developed using the Micron Deep Learning Accelerator (MDLA) FPGA framework. An overview of the new system, and the first results from 2022 data taking will be shown. Plans and development progress towards the continued expansion of the L1 Scouting system throughout LHC Run 3, and for Phase II of CMS at the High Luminosity LHC, will also be presented.
The data-taking conditions expected in Run 3 of the LHCb experiment will be unprecedented and challenging for the software and computing systems. Accordingly, the LHCb collaboration will pioneer the use of a software-only trigger system to cope with the increased event rate efficiently. The beauty physics programme of LHCb is heavily reliant on topological triggers. These are devoted to selecting beauty-hadron candidates inclusively, based on the characteristic decay topology and kinematic properties expected from beauty decays. We present the Run 3 implementation of the topological triggers using Lipschitz monotonic neural networks. This architecture offers robustness under varying detector conditions and sensitivity to long-lived candidates, opening the possibility of discovering New Physics at LHCb.
In the past four years, the LHCb experiment has been extensively upgraded, and it is now ready to start Run 3 performing a full real-time reconstruction of all collision events, at the LHC average rate of 30 MHz. At the same time, an even more ambitious upgrade is already being planned (LHCb "Upgrade-II"), and intense R&D is ongoing to boost the real-time processing capability of the experiment. The instantaneous luminosity will significantly increase (x5÷x10), and the trigger system should deal with data coming from more granular and complex detectors. In an effort of moving reconstruction and data reduction to the earliest possible stages of processing, heterogeneous computing solutions are being explored. Specialized coprocessors (computing accelerators) will take responsibility for the most intensive and parallelizable tasks, freeing the more flexible general-purpose processors for higher-level functions. In this talk we describe the results obtained with a life-size demonstrator for the reconstruction of pixel tracking detectors, implemented in commercial, PCIe hosted, FPGA cards. They are interconnected by fast optical links and they operate parasitically on live LHCb data from Run 3. This demonstrator is based on a extremely parallel, 'artificial retina' architecture, and is intended as a first life-size test of the technology, to explore its potential for future larger-scale applications in Real-Time reconstruction at LHCb at high luminosity.
APEIRON is a framework encompassing the general architecture of a distributed heterogeneous processing platform and the corresponding software stack, from the low level device drivers up to the high level programming model.
The framework is designed to be efficiently used for studying, prototyping and deploying smart trigger and data acquisition (TDAQ) systems for high energy physics experiments.
The general architecture of such a distributed processing platform includes m data sources, corresponding to the detectors or sub-detectors, feeding a sequence of n stream processing layers, making up the whole data path from readout to trigger processor (or storage server).
The processing platform features a modular and scalable low-latency network infrastructure with configurable topology. This network system represents the key element of the architecture, enabling the low-latency recombination of the data streams arriving from the different input channels through the various processing layers.
Developers can define scalable applications using a dataflow programming model (inspired by Kahn Process Networks) that can be efficiently deployed on a multi-FPGAs system: the APEIRON communication IPs allow low-latency communication between processing tasks deployed on FPGAs, even if hosted on different computing nodes.
Thanks to the use of High Level Synthesis tools in the workflow, tasks are described in high level language (C/C++) while communication between tasks is expressed through a lightweight API based on non-blocking send() and blocking receive() operations.
The mapping between the computational data flow graph and the underlying network of FPGAs is defined by the designer with a configuration tool, by which the framework will produce all project files required for the FPGAs bitstream generation. The interconnection logic is therefore automatically built according to the application needs (in terms of input/output data channels), allowing the designer to focus on the processing tasks expressed in C/C++ .
The aim of the APEIRON project was to develop a flexible framework that could be adopted in the design and implementation of both "traditional" low level trigger systems and of data reduction stages in trigger-less or streaming readout experimental setups characterized by high event rates.
For this purpose we studied and implemented algorithms capable of boosting the efficiency of these classes of online systems based on Neural Networks (NN), trained offline and leveraging the HLS4ML software package for deployment on FPGA.
We have validated the framework on the physics use case represented by the partial particle identification system for the low-level trigger of the NA62 experiment, working on data from its Ring Imaging Cherenkov detector to pick out electrons and number of charged particles.
In high energy physics experiments, the calorimeter is a key detector measuring the energy of particles. These particles interact with the material of the calorimeter, creating cascades of secondary particles, the so-called showers. Describing development of cascades of particles relies on precise simulation methods, which is inherently slow and constitutes a challenge for HEP experiments. Furthermore, with the upcoming high luminosity upgrade of the LHC with more complex events and a much increased trigger rate, the amount of required simulated events will increase. Machine Learning (ML) techniques such as generative models are currently widely explored for faster simulation alternatives. The pipeline of a ML fast simulation solution consists of multiple components starting from data generation and preprocessing to model training, optimization, validation and deployment within C++ framework. In this contribution, we will present our latest developements: to build a portable and a scalable pipeline with Kubeflow, to automate hyperparameter search with Optuna and NAS and to optimize the inference memory footprint in C++ by leveraging quantization and graph optimization strategies for different hardware acceleretors.
The prospect of possibly exponential speed-up of quantum computing compared to classical computing marks it as a promising method when searching for alternative future High Energy Physics (HEP) simulation approaches. HEP simulations like at the LHC at CERN are extraordinarily complex and, therefore, require immense amounts of computing hardware resources and computing time. For some HEP simulations classical machine learning models are already successfully tested leading to speed-ups in the order of magnitudes. In this research we proceed to the next step and test if quantum computing can further improve HEP machine learning simulations.
With a small prototype model we showcase a full quantum Generative Adversarial Network (GAN) model for successfully generating real calorimeter shower images with high precision. The advantage compared to previous other quantum models is, that with employing angle encoding the pixel to qubit ratio scales linear and the model generates real images with pixel energy values instead of simple probability distributions. The model is constructed and evaluated for images with eight pixels and requires only eight qubits for the generator and discriminator quantum circuit. The quantum circuits make use of the properties of entanglement and superposition to learn and reproduce the correlations in the images.
To complete the picture, the results of the full quantum GAN model are compared to other quantum and hybrid quantum-classical models.
The Beijing Spectrometer III (BESIII) [1] is a particle physics experiment at the Beijing Electron–Positron Collider II (BEPC II) [2] which aims to study physics in the tau-charm region precisely. Currently, the BESIII has collected an unprecedented number of data and the statistical uncertainty is reduced significantly. Therefore, systematic uncertainty is key for getting more precise results. In the BESIII, the measurement of energy deposition per unit length (so-called dE/dx) from the drift chamber is used for charged particles identification (PID) which is quite important for most analyses [3]. Due to the Geant4 can not simulate the energy loss of charged particles in thin gas precisely, a sampling method using experimental data is adopted for dE/dx simulation and it works smoothly [3]. In order to reduce the systematic uncertainty from dE/dx PID, advanced machine learning techniques can be tried for accurate dE/dx simulation.
This contribution will present the dE/dx simulation model based on normalizing flows [4] which are stable in training and easy to convergent. Plenty of dE/dx measurements from the experiment are used for training. The metrics for judging the quality of the simulation include the comparison of dE/dx distribution and the dE/dx PID performance between data and simulation. Performance studies show that the simulation has very high fidelity and the dE/dx PID systematic can be reduced to within 1%.
Besides, due to the lack of understanding about dE/dx measurements at a very low beta * gamma region, the expected dE/dx value and resolution can not be fitted well using the traditional method which decreases the dE/dx PID efficiency, especially for protons(anti-protons). To overcome the barrier, fully-connected neural networks are trained to predict the expected dE/dx value and resolution accurately. With this method, the efficiency of dE/dx PID at a very low beta*gamma region can be restored to ~100%.
Reference:
[1]: BESIII Collaboration, Design and Construction of the BESIII Detector. Nucl.Instrum.Meth.A614:345-399,2010
[2]: For BEPC II Team, BEPC II: construction and commissioning, Chinese Phys. C 33 60, 2009
[3]: Cao Xue-Xiang,et al. Studies of dE/dx measurements with the BESIII. Chinese Phys. C 34 1852,2010
[4]: I. Kobyzev, S. J. D. Prince and M. A. Brubaker, “Normalizing Flows: An Introduction and Review of Current Methods,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 3964-3979, 1 Nov. 2021, doi: 10.1109/TPAMI.2020.2992934.
In contemporary high energy physics (HEP) experiments the analysis of vast amounts of data represents a major challenge. In order to overcome this challenge various machine learning (ML) methods are employed. However, in addition to the choice of the ML algorithm a multitude of algorithm-specific parameters, referred to as hyperparameters, need to be specified in practical applications of ML methods. The optimization of these hyperparameters, which is often performed manually, has a significant impact on the performance of the ML algorithm. In this talk we explore several evolutionary algorithms that allow to determine optimal hyperparameters for a given ML task in a fully automated way. Additionally, we study the capability of the two most promising hyperparameter optimisation algorithms, particle swarm optimization and bayesian optimization, for utilising the highly parallel computing architecture that is typical for the field of HEP.
Deep Learning algorithms are widely used among the experimental high energy physics communities and have proved to be extremely useful in addressing a variety of tasks. One field of application for which Deep Neural Networks can give a significant improvement is event selection at trigger level in collider experiments. In particular, trigger systems benefit from the implementation of Deep Learning models on FPGAs. However, this task poses specific challenges to Deep Learning algorithm design, due to the microsecond latency requirements and limited resources of FPGA-based trigger systems. Before being implemented on an FPGA, Neural Networks may need to be appropriately compressed in order to reduce the number of neurons and synapses. A widespread technique to reduce the size of Deep Neural Networks is pruning. Numerous approaches have been developed to create a pruned model from an untrained one. Nearly all of them use a similar procedure, according to which the network is first trained to convergence, then single weights are removed on the basis of a particular ranking. To recover from accuracy loss, pruned networks are finally retrained. The pruning and retraining process is repeated iteratively, shrinking the network’s size. This procedure however can be quite long and resource demanding. Moreover, the relative importance of parameters changes along iterations and this may lead to converging to sub-optimal configurations.
Here we propose a different pruning strategy, which proved to be a mathematically rigorous and faster method for optimizing Neural Networks under size constraints. Our approach works by overlaying a shadow network on the one that has to be optimized. The shadow network is very simple to incorporate into already developed Deep Neural Networks and can be used to prune the whole network or just a portion. Through the training process, the combined optimization of the shadow and standard networks takes place. As a result, the pruning procedure occurs along with the training, and not in two different phases. The proposed method performs a pruning of the nodes, rather than of the single connections, allowing for a determination of an ideal network layout, with the number of total nodes determined by the user so to match the FPGA resources available. After finding the optimal network layout, the reduced network can be retrained as a new independent model. Preliminary results will be presented, along with new developments and applications.
A decade of data-taking from the LHC has seen great progress in ruling out archetypal new-physics models up to high direct-production energy scales, but few persistent deviations from the SM have been seen. So as we head into the new data-taking era, it is of paramount importance to look beyond such archetypes and consider general BSM models that exhibit multiple phenomenological signatures. But typically each such signature will appear at lower strength than the archetypical simplified models: to significantly constrain them requires a move away from single, "silver-bullet" analyses, to a holistic approach in which many analyses are combined into composite likelihoods. Such combinations require understanding analysis overlaps, and identifying optimal analysis combinations for each point in model space. In this contribution, we present the TACO method, which uses computational statistics in combination with LHC data-reinterpretation tools to estimate analysis correlations, and hence find their optimal combinations. Across several BSM-model scenarios, we show that the TACO approach can significantly increase both exclusion and observation power.
We introduce a restricted version of the Riemann-Theta Boltzmann machine, a generalization of the Boltzmann machine with continuous visible and discrete integer valued hidden states. Though the normalizing higher dimensional Riemann-Theta function does not factorize, the restricted version can be trained efficiently with the method of score matching, which is based on the Fisher divergence. At hand of several common two dimensional datasets, we show that the quality of the fits obtained are comparable to state-of-the-art density estimation techniques such as normalizing flows or kernel density estimation. We also discuss how some of these methods can converge to an overfitted solution and we try to quantify this overfitting behavior.
Furthermore, we show that our model is less likely to converge to such non ideal solutions.
We also prove that the recursive calculation of the one dimensional Riemann-Theta function can be extended to the calculation of the first and second order gradients.
We also hint at the possibility of using the density estimated by this model
to perform multi-dimensional integration using Monte Carlo methods with a particular focus on High Energy Physics applications.
Continuously comparing theory predictions to experimental data is a common task in analysis of particle physics such as fitting parton distribution functions (PDFs). However, typically, both the computation of scattering amplitudes and the evolution of candidate PDFs from the fitting scale to the process scale are non-trivial, computing intesive tasks. We develop a new stack of software tools that aim to facilitate the theory predictions by computing FastKernel (FK) tables that reduce the theory computation to a linear algebra operation. Specifically, I present PineAPPL, our workhorse for grid operations, EKO, a new DGLAP solver, and yadism, a new DIS library. Alongside, I review several projects that become available with the new tools.
In this presentation I will show how one can perform parametric integrations using a neural network. This could be applied for example to perform the integration over the auxiliary parameters in the integrals that result from the sector decomposition of multi-loop integrals.
The Belle II experiment has been taking data at the SuperKEKB collider since 2018. Particle identification is a key component of the reconstruction, and several detector upgrades from Belle to Belle II were designed to maintain performance with the higher background rates.
We present a method for a data-driven calibration that improves the overall particle identification performance and is resilient against imperfections in the calibration of individual detectors. Our framework also defines a “blame” metric that identifies the detectors with largest contributions to correctly and incorrectly assigned particle hypotheses.
The size, complexity, and duration of telescope surveys are growing beyond the capacity of traditional methods for scheduling observations. Scheduling algorithms must have the capacity to balance multiple (often competing) observational and scientific goals, address both short-term and long-term considerations, and adapt to rapidly changing stochastic elements (e.g., weather). Reinforcement learning (RL) methods have the potential to significantly automate the scheduling and operation of telescope campaigns and greatly reduce the amount of human effort needed to vet schedules produced via costly simulation work.
In this work, we present the application of an RL-based scheduler, which uses a Markov decision process framework to construct scheduling policies in a way that is scalable, recoverable in the case of interruptions during observation, and computationally efficient for surveys that can include over a hundred observations.
We simulate surveys of objects in the Galactic equator, assuming the location and optics of Stone Edge Observatory. We present schedules generated by our RL technique. While initial results are not comparable to human-tuned schedules, we are encouraged by the technique’s scalable, automated approach. We examine how well an RL agent’s produced schedules compare to human-designed schedules by comparing different formulations of cumulative reward for these schedules. We also investigate the success of our model as we vary the complexity of the telescope environment and as we vary the reward function. We present this work as a motivation to explore more complex situations and surveys.
In this work we present the adaptation of the popular clustering algorithm DBSCAN to reconstruct the primary vertex (PV) at the hardware trigger level in collisions at the High-Luminosity LHC. Nominally, PV reconstruction is performed by a simple histogram-based algorithm. The main challenge in PV reconstruction is that the particle tracks need to be processed in a low-latency environment $\mathcal{O}$(1 μs). To achieve this an accelerated version of the DBSCAN algorithm was developed to run in a Field Programmable Gate Array (FPGA). A CPU-optimized version of DBSCAN was implemented in C++ to serve as a benchmark for comparison. The CPU version of DBSCAN resulted in an average PV reconstruction latency of 93 μs, while the FPGA firmware only had a latency of 0.73 μs resulting in a 127x speedup. The speedup is a result of running all the input tracks in parallel, which ultimately results in high resource consumption, of up to 48.6 % of the available logic. Most of the logic was attributed to the use of sorting networks that allows for the parallel processing of the input tracks. To tune the firmware for a specific latency and resource usage constraints, the firmware has been parametrized by the number of input tracks to consider at a time. The accelerated DBSCAN method yielded a higher PV reconstruction efficiency when compared to the simpler histogram-based method. As clustering applications are prominent in High Energy Physics, we modified the accelerated DBSCAN algorithm for higher-dimensional datasets.
Binned template-fitting is one of the most important tools in the High-Energy physics (HEP) statistics toolbox. Statistical models based on combinations of histograms are often the last step in a HEP physics analysis. Both model and data can be represented in a standardized format - HistFactory (C++/XML) and more recently pyHF (Python/JSON), have taken advantage of that fact to make template fits both easy and reproducible.
We present a port of pyHF to the Julia programming language much like the way pyHF started out as a port of the C++ HistFactory. The new package, LiteHF.jl, provides an independent, fully compatible implementation of the pyHF JSON specification. Since Julia compiles to native code via LLVM and has a lower function-call overhead than Python, LiteHF.jl can outperform the original pyHF. We utilize Julia's meta-programming capabilities to keep the implementation simple and flexible, and the likelihood gradient is obtained for free via automatic differentiation. LiteHF.jl also makes it easy for the user to add custom template modifiers.
Models generated by LiteHF.jl can be used directly in BAT.jl (Bayesian Analysis Toolkit) in Julia and other Julia inference packages. This enables full Bayesian inference with a few simple commands. BAT.jl provides a full suite of analysis tools including MCMC, nested sampling, automatic re-parametrization, Bayesian evidence calculation, and plotting. A user-friendly likelihoodist inference path for LiteHF.jl is available as well.
The usage of Deep Neural Networks (DNNs) as multi-classifiers is widespread in modern HEP analyses. In standard categorisation methods, the high-dimensional output of the DNN is often reduced to a one-dimensional distribution by exclusively passing the information about the highest class score to the statistical inference method. Correlations to other classes are hereby omitted.
Moreover, in common statistical inference tools, the classification values need to be binned, which relies on the researcher's expertise and is often non-trivial. To overcome the challenge of binning multiple dimensions and preserving the correlations of the event-related classification information, we perform K-means clustering on the high-dimensional DNN output to create bins without marginalising any axes.
We evaluate our method in the context of a simulated cross section measurement at the CMS experiment, showing an increased expected sensitivity over the standard binning approach.
To support the needs of novel collider analyses such as long-lived particle searches, considerable computing resources are spent forward-copying data products from low-level data tiers like CMS AOD and MiniAOD to reduced data formats for end-user analysis tasks. In the HL-LHC era, it will be increasingly difficult to ensure online access to low-level data formats. In this talk, we present a novel online data storage mechanism that obviates the need for data tiers by storing individual data products in column objects using RadosGW, a Ceph object store technology. Benchmarks of the performance of storage and retrieval of the event data through the S3 protocol for a prototype of typical analysis workflows will be presented, and compared with traditional xrootd ROOT file access protocols.
The large statistical fluctuations in the ionization energy loss high energy physics process by charged particles in gaseous detectors implies that many measurements are needed along the particle track to get a precise mean, and this represent a limit to the particle separation capabilities that should be overcome in the design of future colliders. The cluster counting technique (dN/dx) represents a valid alternative which takes advantage of the Poisson nature of the primary ionization process and offers a more statistically robust method to infer mass information. Simulation studies by using Garfield++ and Geant4 prove that the cluster counting allows to reach a resolution two times better than traditional dE/dx method over a wide momentum range in the use-case of a helium-based drift chamber. It consists in singling out, in ever recorded detector signal, the electron peak structures related to the arrival of the electrons belonging to a single primary ionization act (cluster) on the anode wire. However, the search for hundreds of electron peaks and the cluster recognition in real data-driven waveform signals is extremely challenge because of their superimposition in the time scale. The state-of-the-art open-source algorithms fail in finding the expected number even in low-noise conditions. In this talk, we present cutting-edge algorithms to search for electrons peaks and identify ionization clusters in experimental data using the latest available computing tools and physics knowledge. To validate the algorithms and show the advantages of the cluster counting technique, two beam tests has been performed at CERN/H8 facility collecting data with different helium based gas mixtures at different gas gains and angles between the wire direction and the ionizing tracks using a muon beam ranging from 40 GeV/c to 180 GeV/c on a setup made of different size drift tubes, equipped with different diameter sense wires. We show the data analysis results concerning the ascertainment of the Poisson nature of the cluster counting technique, the establishment of the most efficient cluster counting and electrons clustering algorithms among the various ones proposed, and the definition of the limiting effects for a fully efficient cluster counting, like the cluster dimensions, the space charge density around the sense wire and the dependence of the counting efficiency versus the beam particle impact parameter.
Due to the massive nature of HEP data, performance has always been a factor in its analysis and processing. Languages like C++ would be fast enough but are often challenging to grasp for beginners, and can be difficult to iterate quickly in an interactive environment . On the other hand, the ease of writing code and extensive library ecosystem make Python an enticing choice for data analysis. Increasing interoperability between Python and C++, as well as the introduction of libraries such as Numba, had been accelerating Python’s traction in the HEP community.
Vector is a Python library for 2D, 3D, and Lorentz vectors, especially arrays of vectors, designed to solve common physics problems in a NumPy-like way. Vector currently supports pure Python Object, NumPy, Awkward, and Numba-based (Numba-Object, Numba-Awkward) backends.
We are introducing the library, with a focus on the Numba-based Awkward Lorentz vectors to perform operations on HEP data without compromising on the speed and the ease of writing code. Awkward is one of the core libraries of the Scikit-HEP ecosystem that allows data analysis with jagged arrays. Numba, on the other hand, allows Python codebases to harness the power of Just-In-Time compilation, enabling the Python code to be compiled before executing.
The library seamlessly integrates with the existing Scikit-HEP libraries, especially with Awkward. Our talk will start with an introduction to this library, with the main agenda of compiling Awkward Lorentz vectors with Numba. Furthermore, Vector is still under active development and preparing for a 1.0 release; hence, we will also take in user feedback while discussing the overall development roadmap.
Cryogenic phonon detectors are used by direct detection dark matter experiments to achieve sensitivity to light dark matter particle interactions. Such detectors consist of a target crystal equipped with a superconducting thermometer. The temperature of the thermometer and the bias current in its readout circuit need careful optimization to achieve optimal sensitivity of the detector. This task is not trivial and has to be done manually by an expert. In our work, we created a simulation of the detector response as an OpenAI Gym reinforcement learning environment. In the simulation, we test the capability of a soft actor critic agent to perform the task. We accomplish the optimization of a standard detector in the equivalent of 30 minutes of real measurement time, which is faster than most human experts. Our method can improve the scalability of multi-detector setups.
In the past years the CMS software framework (CMSSW) has been extended to offload part of the physics reconstruction to NVIDIA GPUs. This can achieve a higher computational efficiency, but it adds extra complexity to the design of dedicated data centres and the use of opportunistic resources, like HPC centres. A possible solution to increase the flexibility of heterogeneous clusters is to offload part of the computations to GPUs installed in external, dedicated nodes.
Our studies on this topic have been able to achieve high-throughput, low-latency data transfers to and from a remote NVIDIA GPU across Mellanox NICs, using the Remote Direct Memory Access (RDMA) technology to access the GPU memory without involving either nodes' operating system.
In this work we present our approach based on the Open MPI framework, and compare the performance of data transfers of local and remote GPUs from different generations, using different communication libraries and network protocols.
HEPD-02 is a new, upgraded version of the High Energy Particle Detector as part of a suite of instruments for the second mission of the China Seismo-Electromagnetic Satellite (CSES-02) to be launched in 2023. Designed and realized by the Italian Collaboration LIMADOU of the CSES program, it is optimized to identify fluxes of charged particles (mostly electrons and protons) and determine their energy and incoming direction, providing new measurements of cosmic rays at low energies (up to 200 MeV for protons and up to 100 MeV for electrons). As already experienced in the previous version of the detector, i.e. HEPD-01 on board CSES-01, the reconstruction of the collected events will be performed using a strategy based entirely on deep learning~(DL). This choice is motivated by the fact that deep learning models are very effective when working with particle detectors, in which a variety of electrical signals are produced and may be treated as low-level features. The new HEPD-02 DL-based event reconstruction will be trained on dedicated Monte Carlo simulation and tested on both simulated and test-beam data. Moreover, the collaboration is working on new deep-learning approaches to increase the robustness of the performance assessments, especially when passing from simulated samples to real data, and the interpretability of these algorithms to be used in future analysis.
In this contribution, the entire event reconstruction of the HEPD-02 detector will be described and the performance will be reported.
In real-time computing facilities - system, network, and security monitoring are core components to run efficiently and effectively. As there are many diverse functions that can go awry, such as load, network, processes, and power issues, having a well-functioning monitoring system is imperative. In many facilities you will see the standard set of tools such as Ganglia, Grafana, Nagios, etc. While these are noteworthy, the diversity of tools used clearly points to an adequacy gap (none is self-sufficient) and furthermore, they lack in their alerting and anomaly detection capabilities beyond the binary events.
The ELK stack (Elasticsearch, Logstash, & Kibana) is the combination of three open-source projects to ingest, search, and visualize logs and data. The basic free license of ELK enables these features but overall is limited for use in a real-time facility. Instead, by leveraging the full capabilities of ELK, the gained features are significant. ELK offerings provide many enhancements from single sign-on and means to control Authorization for security, including alerting for unusual events, Machine Learning capabilities, and many other tools that are useful for advanced data analytics.
With the advanced set of Machine Learning techniques, the ELK toolbox adds features such as clustering, time series decomposition, and correlation analysis. For example, these Machine Learning techniques can be applied to alerts, providing you with the details of events for an unusual uptick in resource usage, if there is rare or high process activity, or unusual port activity. A standard monitoring tool would typically not have such capability.
In this report, will discuss the details and features of how a facility could benefit from the open source and premium versions of the ELK stack. We will provide procedures and details for configuring these tools, and how it benefits compute facility monitoring postures within a scientific based environment.
We hold these truths to be self-evident: that all physics problems are created unequal, that they are endowed with their unique data structures and symmetries, that among these are tensor transformation laws, Lorentz symmetry, and permutation equivariance. A lot of attention has been paid to the applications of common machine learning methods in physics experiments and theory. However, much less attention is paid to the methods themselves and their viability as physics modeling tools. One of the most fundamental aspects of modeling physical phenomena is the identification of the symmetries that govern them. Incorporating symmetries into a model can reduce the risk of over-parameterization, and consequently improve a model's robustness and predictive power. As usage of neural networks continues to grow in the field of particle physics, more effort will need to be invested in narrowing the gap between the black-box models of ML and the analytic models of physics.
Building off of previous work, we demonstrate how careful choices in the details of network design – creating a model both simpler and more grounded in physics than the traditional approaches – can yield state-of-the-art performance within the context of problems including jet tagging and particle four-momentum reconstruction. We present the Permutation-Equivariant and Lorentz-Invariant or Covariant Aggregator Network (PELICAN), which is based on three key ideas: symmetry under permutations of particles, Lorentz symmetry, and the ambiguity of the aggregation process in Graph Neural Networks. For the first, we use the most general permutation-equivariant layer acting on rank 2 tensors, which can be viewed as a maximal generalization of Message Passing. For the second, we use classical theorems of Invariants Theory to reduce the 4-vector inputs to a tensor of Lorentz-invariant latent quantities. Finally, the flexibility of the aggregation process commonly used in Graph Networks can be leveraged for improved accuracy, in particular to allow variable scaling with the size of the input.
The ever growing increase of computing power necessary for the storage and data analysis of the high-energy physics experiments at CERN requires performance optimization of the existing and planned IT resources.
One of the main computing capacity consumers in the HEP software workflow is the data analysis. To optimize the resource usage, the concept of Analysis Facility (AF) for Run 3 has been introduced. The AFs are special computing centres with a combination of CPU and fast interconnected disk storage resources, allowing for rapid turnaround of analysis tasks on a subset of data. This in turn allows for optimization of the analysis process and the codes before the analysis is performed on the large data samples on the WLCG Grid.
In this paper, the structure and the first benchmark tests of the Wigner AF are presented.
Particle physics experiments spend large amounts of computational effort on Monte Carlo simulations. Due to the computational expense of simulations, they are often executed and stored in large distributed computing clusters. To lessen the computational cost, physicists have introduced alternatives to speed up the simulation. Generative Adversarial Networks (GANs) are an excellent Deep-Learning-based alternative due to their ability to imitate probability distributions. Concretely, one of the more tackled problems is calorimeter simulations since they involve a large portion of the computing power. GANs simulate calorimeter particle showers with good accuracy and reduced computational resources. Previous works have already explored the generation of calorimeter simulation data with GANs, but in most cases as a centralized perspective (i.e., where the dataset is present on the training node).
This separation creates a disparity between the training data generation (i.e., in distributed clusters) and training (i.e., centralized), introducing a limiting factor to the amount of data the centralized node can use to train. Federated Learning has arisen as a successful decentralized training solution where data is non-necessarily balanced, independent, and identically distributed (IID). Federated Learning is a training method where a group of \textit{collaborators} trains a model by sharing training updates with an \textit{aggregator}. The sparsity and distributed nature of the simulated data pairs favorably with the features of Federated Learning. In this work, we introduce new federated learning-based approaches for GAN training and test them on the 2DGAN model*. This work covers different training schemes for GANs with FL (e.g., centralized discriminator or centralized generator). Our work provides insights into the various architectures by performing model training and extracting performance metrics. The results permit the evaluation of the effectiveness of the different strategies.
The unprecedented volume of data and Monte Carlo simulations at the HL-LHC will pose increasing challenges for data analysis both in terms of computing resource requirements as well as "time to insight". Precision measurements with present LHC data already face many of these challenges today. We will discuss performance scaling and optimization of RDataFrame for complex physics analyses, including interoperability with Eigen, Boost Histograms, and the python ecosystem to enable this.
Neutrino experiments that use liquid argon time projection chamber (LArTPC) detectors are growing bigger and expect to see more neutrinos with next generation beams, and therefore will require more computing resources to reach their physics goals of measuring CP violation in the neutrino sector and exploring anomalies. These resources can be used to their full capacity by incorporating parallelism through multi-threading and vectorization within algorithms, and by running these algorithms on High Performance Computers (HPCs). A HPC workflow is being developed for LArTPC experiments to take advantage of all of levels of parallelism, within and across nodes. It will be used to enhance the statistics available for use in physics analysis and will also make it possible to efficiently incorporate AI algorithms. Additional opportunities to incorporate parallelism within LArTPC algorithms is also being explored.
Ultra-low mass and high granularity Drift Chambers fulfill the requirements for tracking systems of modern High Energy Physics experiments at the future high luminosity facilities (FCC-ee or CEPC).
\indent We present how, in Helium based gas mixtures, by measuring the arrival times of each individual ionization cluster and by using proper statistical tools, it is possible to perform a bias free estimate of the impact parameter and a precise PID. Typically, in a helium-based drift chamber, consecutive ionization clusters are separated in time by a few ns, at small impact parameters up to a few tens of ns, at large impact parameters. For an efficient application of the cluster timing technique, consisting in isolating pulses due to different ionization cluster, it is, therefore, necessary to have read-out interfaces capable of processing high speed signals. We present a full front-end chain, able to treat the low amplitude sense wire signals (a $\sim$few mV), converted from analog to digital with the use of FADCs, with a high bandwidth ($\sim$1 GHz). The requirement of high sampling frequency, together with long drift times, usually of the order of several hundreds of ns, and large number of readout channels, typically of the order of tens of thousand, impose a sizable data reduction, meanwhile preserving all relevant information. Measuring both the amplitude and the arrival time of each peak in the signal associated to each ionization cluster is the minimum requirement on the data transfer for storage to prevent any significant data loss. An electronic board including a Fast ADC and an FPGA for a real-time processing of the drift chamber signals is presented. Various peak finding algorithms, implemented and tested in real time with VHDL code, are also compared.
In particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual, complex workflows manually, frequently involving job submission in several stages and interaction with distributed storage systems by hand. This process is not only time-consuming and error-prone, but also leads to undocumented relations between particular workloads, rendering the steering of an analysis a serious challenge.
This contribution presents the Luigi Analysis Workflow (law) Python package which is based on the open-source pipelining tool luigi, originally developed by Spotify. It establishes a generic design pattern for analyses of arbitrary scale and complexity, and shifts the focus from executing to defining the analysis logic. Law provides the building blocks to seamlessly integrate with interchangeable remote resources without, however, limiting itself to a specific choice of infrastructure.
In particular, it introduces the concept of complete separation between analysis algorithms on the one hand, and run locations, storage locations, and software environments on the other hand. To cope with the sophisticated demands of end-to-end HEP analyses, law supports job execution on WLCG infrastructure (ARC, gLite) as well as on local computing clusters (HTCondor, Slurm, LSF), remote file access via various protocols using the Grid File Access Library (GFAL2), and an environment sandboxing mechanism with support for sub-shells and virtual environments, as well as Docker and Singularity containers. Moreover, the novel approach ultimately aims for analysis preservation out-of-the-box.
Law is developed open-source and independent of any experiment or the language of executed code. Over the past years, its user-base increased steadily with applications now ranging from (pre-)processing workflows in CMS physics objects groups, to pipelines performing the statistical inference in most CMS di-Higgs searches, and it serves as the underlying core software for large scale physics analyses across various research groups.
RooFit is a toolkit for statistical modeling and fitting, and together with RooStats it is used for measurements and statistical tests by most experiments in particle physics, particularly the LHC experiments. As the LHC program progresses, physics analyses become more computationally demanding. Therefore, recent RooFit developments were focused on performance optimization, in particular to speed up the minimization of the negative log likelihood when fitting a model to a dataset.
Two such improvements will be discussed in this session: gradient-based CPU parallelization and batched computations. The former strategy parallelizes the calculation of the gradient in the line search approach (MIGRAD) used for minimum likelihood estimation in RooFit. Here, the parallelization approach and computational tools used will be discussed. The second strategy comprises a restructuring of the computational graph associated with a model and dataset in order to allow for batched computations. With batched computations RooFit can evaluate batches of events simultaneously per computational graph node, rather than event by event. This simultaneous computation can be either supported by vectorization or GPU parallelization.
Throughout this session, there will be an emphasis on detailed benchmarking and how it was used to optimize various parts of the developed performance improvements, including load balancing and the reduction of communication overhead. Benchmarks are primarily shown for cutting-edge Higgs combination fits, where the developed improvements were intended to achieve order-of-magnitude improvements in execution wall time.
There are established classical methods to reconstruct particle tracks from recorded hits on the particle detectors. Current algorithms do this either by cut in some features, like recorded time of the hits, or by the fitting process. This is potentially error prone and resource consuming. For high noise events, these issues are more critical and this method might even fail. We have been developing artificial neural networks which can learn to separate noise from signal in the simulated data. The data sample we use for this purpose is Monte-Carlo simulated Bhabha events generated by BESIII offline software system. We study different types of deep neural networks and their effectiveness to remove the noise which happens in the main drift chamber of BESIII from various origins.
The fully connected networks that we first try find sophisticated cuts in hit features of each cell of the detector. These features include raw time of a hit and the recorded charge associated to it. This leads to about 85 percent efficiency and purity of the signal separation. This sets up a lower limit for us since such a network judges every hit only by its own features. Next, we develop a CNN network and show that with information of only four neighboring cells, the noise removal happens with 99 percent purity and efficiency at the same time. We discuss the effectiveness of the network for events with different noise levels.
The main drift chamber is consisted of 6796 sense wires arranged in 43 layers. The structure of the wire system is known and therefore we also examine the idea of looking at the main drift chamber structure as a graph. We make a model based on graph convolutional layers and chose node classification approach. We include a message passing process in three of the hidden layers and get 95 percent efficiency and purity for the noise removal. We then describe the results of our network for other events such as j/psi to p+ p_ pi+ pi-. In the end, we compare all of this with the classical methods.
The alpaka library is a header-only C++17 abstraction library for development across hardware accelerators (CPUs, GPUs, FPGAs). Its aim is to provide performance portability across accelerators through the abstraction (not hiding!) of the underlying levels of parallelism. In this talk we will show the concepts behind alpaka, how it is mapped to the various underlying hardware models, and show the features introduced over the last year. In addition, we will also (shortly) present the software ecosystem surrounding alpaka.
In recent years, new technologies and new approaches have been developed in academia and industry to face the necessity to both handle and easily visualize huge amounts of data, the so-called “big data”. The increasing volume and complexity of HEP data challenge the HEP community to develop simpler and yet powerful interfaces based on parallel computing on heterogeneous platforms. Good examples are 1) the pandas framework, which is an open source set of data analysis tools allowing the configuration and fast manipulation of data structures, and 2) the Jupyter Notebook, which is a web application that allows users to create and share documents that contain live executable code. Similarly to the python-based pandas, ROOT::RDataFrame offers another parallel data analysis tool also providing a C++ interface as well as Python bindings (thus compatible with the Jupyter Notebook).
In this contribution we aim to document our experience and performance studies in deploying an HEP analysis workflow, in a realtime analysis fashion, being developed within a Jupyter environment (from the selection criteria to extract the physical signal to the fitting tasks). For this purpose we exploit CMS Run1 Open Data to extract the signal associated with the decay of a beauty meson particle.
We will discuss how the combination of HEP specific tools and technologies coming from the much wider data analysis world may result in a powerful and easy-to-use tool for a HEP data analyst. Among these tools we will test the advantage of offloading some of the most compute intensive tasks on heterogeneous architectures through GooFit, a tool that exploits the computational capabilities of GPUs to perform maximum likelihood fits.
Monte Carlo simulation is a vital tool for all physics programmes of particle physics experiments. Their accuracy and reliability in reproducing detector response is of the utmost importance. For the LHCb experiment, which is embarking on a new data-take era with an upgraded detector, a full suite of verifications has been put in place for its simulation software to ensure the quality of the samples produced. The chain of tests exploits the LHCb infrastructure for software quality control.
In this contribution we will describe the procedure and the tests that have been put in place. First-level verifications are performed as soon as new software is submitted for integration in the LHCb GitLab repository. They range from Continous Integration (CI) tests to, so called, 'nightlies': short jobs run overnight to verify the integrity of the software. More in-depth performance and regression tests are carried with dedicated infrastructure (LHCbPR), which compares samples of O(1000) events. Simulation data quality shifters look for anomalies and alert the authors in the case of unexpected changes. Work is also in progress to enable the automatic verification of important variable distributions from a small number of simulated events before the whole production is launched.
We developed supervised and unsupervised quantum machine learning models for anomaly detection tasks at the Large Hadron Collider at CERN. Current Noisy Intermediate Scale Quantum (NISQ) devices have a limited number of qubits and qubit coherence. We designed dimensionality reduction models based on Autoencoders to accommodate the constraints dictated by the quantum hardware. Different designs were investigated, such as convolutional and Sinkhorn Autoencoder architectures, that can compress HEP data while preserving the class structure of the original dataset. The quantum algorithms are trained to identify anomalies in the latent spaces generated by the Autoencoders. A collection of results for a quantum classifier and a set of quantum anomaly detection algorithms is presented. Our study is supported by a performance comparison to the corresponding classical models.
Compared to LHC Run 1 and Run 2, future HEP experiments, e.g. at the HL-LHC, will increase the volume of generated data by an order of magnitude. In order to sustain the expected analysis throughput, ROOT's RNTuple I/O subsystem has been engineered to overcome the bottlenecks of the TTree I/O subsystem, focusing also on a compact data format, asynchronous and parallel requests, and a layered architecture that allows supporting distributed filesystem-less storage systems, e.g. HPC-oriented object stores.
In a previous publication, we introduced and evaluated the RNTuple's native backend for Intel DAOS. Since its first prototype, we carried out a number of improvements both on RNTuple and its DAOS backend aiming to saturate the physical link, such as support for vector writes and an improved RNTuple-to-DAOS mapping, only to name a few. In parallel, the latest developments allow for better integration between RNTuple and ROOT's storage-agnostic, declarative interface to write HEP analyses, RDataFrame.
In this work, we contribute with the following: (i) a redesign and evaluation of the RNTuple DAOS backend, including a mechanism for efficient population of the object store based on existing data; and (ii) an experimental evaluation of single-node and distributed analyses using RDataFrame as a proxy between the user and RNTuple, showing a significant increase in the analysis throughput for typical HEP workflows.
Through its TMVA package, ROOT provides and connects to machine learning tools for data analysis at HEP experiments and beyond. In addition, ROOT provides through its powerful I/O system and RDataFrame analysis tools the capability to efficiently select and query input data from large data sets as typically used in HEP analysis. At the same time, several existing Machine Learning tools exist in a diversified landscape outside of ROOT.
In this talk, we present new developments in ROOT that bridge the gap between external tools and ROOT, by providing better interoperability in a common software ecosystem for Machine Learning in data analysis.
We present recently included features in TMVA allowing for generating batches of events for ROOT I/O and RDataFrame to train efficiently machine learning models using Python tools such as Tensorflow and PyTorch. This will facilitate direct access to the ROOT input data when training using external tools. Another focus is put on fast machine learning inference, which enables analysts to deploy their machine learning models rapidly on large scale datasets. A new tool has been recently developed in ROOT, SOFIE, allowing for generating C++ code for evaluation of deep learning models, which are trained from external tools. This provides the capability to better integrate Machine Learning model evaluation in HEP data analysis.
The new developments are paired with newly designed C++ and Python interfaces for TMVA supporting modern C++ paradigms and providing full interoperability in the Python ecosystem.
MadMiner is a python module that implements a powerful family of multivariate inference techniques that leverage both matrix element information and machine learning.
This multivariate approach neither requires the reduction of high-dimensional data to summary statistics nor any simplifications to the under-lying physics or detector response.
In this paper, we address some of the challenges arising from deploying MadMiner in a real scale HEP analysis with the goal of offering a new tool in HEP that is easily accessible.
The proposed approach streamlines a typical MadMiner pipeline into a parametrized yadage workflow in yaml files. The general workflow is split in two yadage subworkflows, one dealing with the physics dependencies and the other with the ML ones. After that, the worfklow is deployed using REANA, a reproducible research data analysis platform that takes care of flexibility, scalability, reusability and reproducibility features.
To test the performane of our method, we performed scaling experiments for a MadMiner workflow on the National Energy Research Sscientific Computer luster (NERSC) cluster with an HTCondor backend.
All the stages of the physics subworkfow had a linear dependency between resources & walltime and number of event generated. This trend has allowed us to run a typical MadMiner workflow consiting of 1M events and the generation step just used 2930 MB of memory and walltime of 2919s.
The feature complexity of data recorded by particle detectors combined with the availability of large simulated datasets presents a unique environment for applying state-of-the-art machine learning (ML) architectures to physics problems. We present the Simplified Cylindrical Detector (SCD): a fully configurable GEANT4 calorimeter simulation which mimics the granularity and response characteristics of general purpose detectors at the LHC. The SCD will be released as a public software to accelerate development of ML-based reconstruction and calorimeter models. Two use-cases based on data from the SCD are presented: first, an ML-based global particle reconstruction which shows potential to outperform traditional approaches. Second, a fast simulation model transforming a set of truth particles into a set of reconstructed particles.
To sustain the harsher conditions of the high-luminosity LHC, the CMS Collaboration is designing a novel endcap calorimeter system. The new calorimeter will predominantly use silicon sensors to achieve sufficient radiation tolerance and will maintain highly granular information in the readout to help mitigate the effects of the pile up. In regions characterized by lower radiation levels, small scintillator tiles with individual SiPM on-tile readout are employed. A unique reconstruction framework (TICL: The Iterative CLustering) is being developed within the CMS Software CMSSW to fully exploit the granularity and other significant detector features, such as particle identification and precision timing, with a view to mitigating pile up in the very dense environment of HL-LHC. The TICL framework has been thought of with heterogeneous computing in mind: the algorithms and their data structures are designed to be executed on GPUs. In addition, geometry agnostic data structures have been designed to provide fast navigation and searching capabilities. Seeding capabilities (also exploiting information coming from other detectors), dynamic cluster masking, energy calibration, and particle identification are the main components of the framework. To allow for maximal flexibility, TICL allows the composition of different combinations of modules that can be chained together in an iterative fashion. The presenter will describe the design of TICL pattern recognition algorithms and advanced neural networks under development, as well as future plans.
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers- long short-term memory and gated recurrent unit- within the hls4ml [1] framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.
[1] J. Duarte et al., “Fast inference of deep neural networks in FPGAs for particle physics”, JINST 13 (2018) P07027, arXiv:1804.06913
The classification of HEP events, or separating signal events from the background, is one of the most important analysis tasks in High Energy Physics (HEP), and a foundational task in the search for new phenomena. Complex deep learning-based models have been fundamental for achieving accurate and outstanding performance in this classification task. However, the quantification of the uncertainty has traditionally been neglected when deep learning-based methods are used, despite its critical importance in scientific applications [1], [2].
In this work, we propose a Bayesian deep learning-based method for measuring uncertainty when classification of HEP events is performed using a deep neural network classifier. The work is focused on the use of the Monte Carlo Dropout (MC-Dropout) method, a variational inference technique proposed in [3] that is based on Dropout [4], the well-known regularization technique used to overcome overfitting. The Monte Carlo Dropout method allows production of the posterior distribution of the network weights by training a dropout network that approximates Bayesian inference. Thus, a Bayesian deep neural network considers a distribution over network parameters instead of a single point. The traditional dropout method randomly toggles off some neurons, with probability $D_{rate}$ during the training stage. However, the MC-Dropout method toggles off neurons both during the training stage and also during the inference stage.
In this work, we use the publicly available Higgs dataset described in [5]. This is simulated data, and the problem is to distinguish the signal from the background, where the signal corresponds to a Higgs boson decaying to a pair of bottom quarks according to the process: $gg \rightarrow H^0 \rightarrow W^{\mp} H^{\pm} \rightarrow W^{\mp} W^{\pm} h^0 \rightarrow W^{\mp} W^{\pm} b \bar{b}$. Furthermore, we plan to apply the proposed method using simulated data of the $\omega$ meson production off nuclear targets. Here, the problem is that the $\omega$ meson decays into four final-state particles: $\pi^+$ $\pi^-$ $\gamma$ $\gamma$, and the pions can also decay into muons and neutrinos, especially at low momentum [6].
The methodology of this work includes (i) training of Bayesian deep learning-based classifiers for the identification of signal and background (binary classification), using the Monte Carlo Dropout method, (ii) evaluate different $D_{rate}$; (iii) evaluate the classification performance; and (iv) compute three uncertainty measures including variance, mutual information, and predictive entropy. Preliminary results show on average 0.66 accuracy, 0.68 precision, 0.72 recall, and 0.70 F1 score, when a Monte Carlo Dropout model-based is used, with three hidden layers with 300 neurons each, and $D_{rate}=0.5$. We expect to increase the classification performance using hyper-parameters optimization, evaluating different network architectures, and varying the $D_{rate}$ parameter.
[1] Aishik Ghosh, Benjamin Nachman, and Daniel Whiteson. Uncertainty-aware machine learning for high energy physics. Phys. Rev. D, 104:056026, Sep 2021.
[2] Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U. Rajendra Acharya, Vladimir Makarenkov, and Saeid Nahavandi. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243–297, 2021.
[3] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
[4] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
[5] Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5(1):1–9, 2014.
[6] Andrés Bórquez. The $\omega$ hadronization studies in the nuclear medium with the CLAS spectrometer. Master’s thesis, UTFSM, Valparaíso, Chile, 2021.
There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly moving their data analyses to Python.
The version 2 of Awkward Array, a Scikit-HEP Python library, introduces a set of header-only C++ libraries that do not depend on any application binary interface. The users can directly include these libraries in their compilation, rather than linking against platform-specific libraries. This new development makes the integration of Awkward Arrays into other projects easier and more portable as the implementation is easily separable from the rest of the Awkward Array codebase.
The code is minimal, it does not include all of the code needed to use Awkward Arrays in Python, nor does it include references to Python or pybind11. The C++ users can use it to make arrays and then copy them to Python without any specialised data types - only raw buffers, strings, and integers. This C++ code also simplifies the process of JIT-compilation in ROOT. This implementation approach solves some of the drawbacks like packaging projects where native dependencies can be challenging.
In this talk, we will demonstrate the techniques of exposing C++ classes and their methods to Python and vice versa. We will also describe the implementation of a new LayoutBuilder and a GrowableBuffer that are more performant in building the Awkward Arrays as compared to the previous approach. Furthermore, examples of wrapping the C++ data into Awkward Arrays and exposing Awkward Arrays to C++ without copying them will be discussed.
Particle transport simulations are a cornerstone of high-energy physics (HEP), constituting almost half of the entire computing workload performed in HEP. To boost the simulation throughput and energy efficiency, GPUs as accelerators have been explored in recent years, further driven by the increasing use of GPUs on HPCs. The Accelerated demonstrator of electromagnetic Particle Transport (AdePT) is an advanced prototype for offloading the simulation of electromagnetic showers in Geant4 to GPUs, and still undergoes continuous development and optimization. Improving memory layout and data access is vital to use modern, massively parallel GPU hardware efficiently, contributing to the challenge of migrating traditional CPU based data structures to GPUs in AdePT. The low-level abstraction of memory access (LLAMA) is a C++ library that provides a zero-runtime-overhead data structure abstraction layer, focusing on multidimensional arrays of nested, structured data. It provides a framework for defining and switching custom memory mappings at compile time to define data layouts and instrument data access, making LLAMA an ideal tool to tackle the memory-related optimization challenges in AdePT. Our contribution shares insights gained with LLAMA when instrumenting data access inside AdePT, complementing traditional GPU profiler outputs. We demonstrate traces of read/write counts to data structure elements as well as memory heatmaps. The acquired knowledge allowed for subsequent data layout optimizations.
To achieve better computational efficiency and exploit a wider range of computing resources, the CMS software framework (CMSSW) has been extended to offload part of the physics reconstruction to NVIDIA GPUs, while the support for AMD and Intel GPUs is under development. To avoid the need to write, validate and maintain a separate implementation of the reconstruction algorithms for each back-end, CMS decided to adopt a performance portability framework. After evaluating different alternative, it was decided to adopt Alpaka as the solution for Run-3.
Alpaka (Abstraction Library for Parallel Kernel Acceleration) is a header-only C++ library that provides performance portability across different back-ends, abstracting the underlying levels of parallelism. It supports serial and parallel execution on CPUs, and extremely parallel execution on GPUs.
This contribution will show how Alpaka is used inside CMSSW to write a single code base; to use different toolchains to build the code for each supported back-end, and link them into a single application; and to select the best back-end at runtime. It will highlight how the alpaka-based implementation achieves near-native performance, and will conclude discussing the plans to support additional back-ends.
Utilizing the computational power of GPUs is one of the key ingredients to meet the computing challenges presented to the next generation of High-Energy Physics (HEP) experiments. Unlike CPUs, developing software for GPUs often involves using architecture-specific programming languages promoted by the GPU vendors and hence limits the platform that the code can run on. Various portability solutions have been developed to achieve portable, performant software across different GPU vendors. Given the rapid evolution of these portability solutions, an early adoption of them in simple HEP testbed applications will help us understand the strengths and weaknesses of respective approaches.
We apply several portability solutions, such as Kokkos, SYCL, std::execution::par and Alpaka, on kernels for track propagation extracted from the mkFit project. We report on the development experience of the same application with different portability solutions, as well as their performance on GPUs, measured as the throughput of the kernels, from different manufacturers such as NVIDIA, AMD and Intel.
The CMS ECAL has achieved an impressive performance during the LHC Run1 and Run2. In both runs, the ultimate performance has been reached after a lengthy calibration procedure required to correct ageing-induced changes in the response of the channels. The CMS ECAL will continue its operation far beyond the ongoing LHC Run3: its barrel section will be upgraded for the LHC Phase-2 and it will be operated for the entire duration of the High Luminosity HLC program. With the increase of instantaneous luminosity, the ageing effects will increase, and so will the required frequency of calibrations: it is therefore crucial for the CMS ECAL community to reduce the time and resources needed for this task, in order to ensure with limited personpower a smooth operation and excellent performance on the long term. A new system has been developed during the LHC second long shut down to automatically execute the calibration workflows on a daily basis during the data taking. The new system is based on industry standard tools (Openshift, Jenkins, Influxdb, and Grafana) and provides a general interface to orchestrate standalone workflows written in different programming languages. It also provides interfaces to other existing CMS systems to steer the processing of selected data streams and to upload newly computed calibration into the database used for the data processing for physics analyses. The new system is designed with the ambitious goal of cutting the time needed to provide the best possible performance for physics analyses by one order of magnitude. The system offers an extensive suite of diagnostic tools that provide a constant monitoring of its status as well as the option to send alerts in case of problems. In this talk, the general structure of the system will be presented, along with the results from the first year of operation. The detail of the monitoring and alert system will also be discussed.
We have developed and implemented a machine learning based system to calibrate and control the GlueX Central Drift Chamber at Jefferson Lab, VA, in near real-time. The system monitors environmental and experimental conditions during data taking and uses those as inputs to a Gaussian process (GP) with learned prior. The GP predicts calibration constants in order to recommend a high voltage (HV) setting for the detector that maintains consistent detector performance (gain and resolution) throughout data taking. This approach is in stark contrast to traditional detector operations in which the detector operates at fixed HV and its calibration parameters vary quite considerably with time. Additionally, the ML based system utilizes uncertainty quantification to correct the recommended control parameters when appropriate. We will present results from the ML system autonomously during the Charged Pion Polarizability (CPP) experiment conducted in Hall D at Jefferson Lab.
The online Data Quality Monitoring (DQM) system of the CMS electromagnetic calorimeter (ECAL) is a vital operations tool that allows ECAL experts to quickly identify, localize, and diagnose a broad range of detector issues that would otherwise hinder physics-quality data taking. Although the existing ECAL DQM system has been continuously updated to respond to new problems, it remains one step behind new and never-before-seen issues. As the ECAL electronics continue to age, previously rare and obscure failure modes have become more common, emphasizing the need for a more robust anomaly detection system. Using unsupervised deep learning, a real-time autoencoder-based anomaly detection system is developed that is able to detect ECAL anomalies unseen in past data. After accounting for spatiotemporal variations in the response of the ECAL, the new system is able to efficiently detect anomalies while maintaining an estimated false discovery rate between 10^{-2} to 10^{-4}, besting existing benchmarks by several orders of magnitude. The real-world performance of the system is validated using anomalies found in 2018 data taking and with early data taken from 2022 collisions.
Cryogenic phonon detectors are used by direct detection dark matter experiments to achieve sensitivity to light dark matter particle interactions. Such detectors consist of a target crystal equipped with a superconducting thermometer. The temperature of the thermometer and the bias current in its readout circuit need careful optimization to achieve optimal sensitivity of the detector. This task is not trivial and has to be done manually by an expert. In our work, we created a simulation of the detector response as an OpenAI Gym reinforcement learning environment. In the simulation, we test the capability of a soft actor critic agent to perform the task. We accomplish the optimization of a standard detector in the equivalent of 30 minutes of real measurement time, which is faster than most human experts. Our method can improve the scalability of multi-detector setups.
Quantum annealing provides an optimization framework with the potential to outperform classical algorithms in finding the global minimum of non-convex functions. The availability of quantum annealers with thousands of qubits makes it possible today to tackle real-world problems using this technology. In this talk, I will review the quantum annealing paradigm and its use in the minimization of general functions. I will then discuss some of the applications of this method in high-energy physics, including training neural networks for classification, and fitting effective field theories to experimental data.
Over the last 20 years, thanks to the development of quantum technologies, it has been
possible to deploy quantum algorithms and applications, that before were only
accessible through simulation, on real quantum hardware. The current devices available are often refereed to as noisy intermediate-scale quantum (NISQ) computers and they require
calibration routines in order to obtain consistent results.
In this context, we present the latest developments of Qibo, an open-source framework for quantum computing.
Qibo was initially born as a simulation tool in order to simulate quantum circuits.
Through its modular layout for backend abstraction it is possible to change effortlessly between different backends, including a high-performance simulator based on just-in-time compilation able to simulate circuit with large number of qubits (greater than 35).
The latest addiction has been the possibility to employ the language developed by Qibo to execute quantum circuit on real quantum hardware.
Given the necessity to apply calibration routines to characterize the experimental setup, we've
also developed a plugin for Qibo, which implements both basic and more advanced calibration routines, including
randomized benchmarking and gate set tomography.
The variational quantum eigensolver (VQE) is an algorithm to compute ground and excited state energy of quantum many-body systems. A key component of the algorithm and an active research area is the construction of a parametrized trial wavefunction – a so called variational ansatz. The wavefunction parametrization should be expressive enough, i.e. represent the true eigenstate of a quantum system for some choice of parameter values. On the other hand, it should be trainable, i.e. the number of parameters should not grow exponentially with the size of the system. Here, we apply VQE to the problem of finding ground and excited state energies of the odd-odd nucleus 6Li. We study the effects of ordering fermionic excitation operators in the unitary coupled clusters
ansatz on the VQE algorithm convergence by using only operators preserving the Jz quantum number. The accuracy is improved by two order of magnitude in the case of descending order. We first compute optimal ansatz parameter values using a classical state-vector simulator with arbitrary measurement accuracy and then use those values to evaluate energy eigenstates of 6Li on a superconducting quantum chip from IBM. We post-process the results by using error mitigation techniques and are able to reproduce the exact energy with an error of 3.8% and 0.1% for the ground state and for the first excited state of 6Li, respectively.
The present work is based on the research within the framework of cooperation between Intel Labs and Deggendorf Institute of Technology, since the Intel® Quantum SDK (Software Development Kit) has recently released. Transport phenomena e.g. heat transfer and mass transfer are nowadays the most challenging unsolved problems in computational physics due to the inherent nature of fluid complexity. As the revolutionary technology, quantum computing opens a grand new perspective for numerical simulation including the computational fluid dynamics (CFD). It is true that the current CFD algorithms based on the different scales (e.g. macroscopic or microscopic) need to be translated into quantum system. In the current work the quantum algorithms have been preliminarily implemented for fluid dynamics using the Intel Quantum SDK, one mesoscopic approach has been applied i.e. to solve the lattice Boltzmann equation. Taking the simplest transport phenomena as a starting point, the preliminary quantum simulation results have been validated with the analytical solution and the classical numerical simulation. The potential of quantum in simulating fluid will be discussed.
See https://indico.cern.ch/event/1106990/contributions/4998162/
See https://indico.cern.ch/event/1106990/contributions/5097014/
See https://indico.cern.ch/event/1106990/contributions/4991353/
The Japanese flagship supercomputer Fugaku started its operation in early 2021.
After one and half years of production runs it is producing some initial results in Lattice QCD applications, such as thermodynamics, heavy and light quark flavor physics, and hadron structures and interactions.
In this talk, we first touch on the basis of Fugaku and its software status.
Discussion is given on the ongoing projects highlighting some initial results, mainly focusing on those using domain wall fermions, a practical chiral fermion formulation on the lattice.
Over the last decade the C++ programming language has evolved significantly into safer, easier to learn and better supported by tools general purpose programming language capable of extracting the last bit of performance from bare metal. The emergence of technologies such as LLVM and Clang have advanced tooling support for C++ and its ecosystem grew qualitatively. C++ has an important role in the field of scientific computing as the language design principles promote efficiency, reliability and backward compatibility - a vital tripod for any long-lived codebase. Other ecosystems such as Python have prioritized better usability and safety while making some tradeoffs on efficiency and backward compatibility. That has led developers to believe that there is a binary choice between performance and usability.
In this talk we would like to present the advancements in the C++ ecosystem; its relevance for scientific computing and beyond; and foreseen challenges. The talk introduces three major components for data science – interpreted C++; automatic language bindings; and differentiable programming. We outline how these components help Python and C++ ecosystems interoperate making a little compromise on either performance or usability. We elaborate on a future hybrid Python/C++ differentiable programming analysis framework which might accelerate science discovery in HEP by amplifying the power and physics sensitivity of data analyses into end-to-end differentiable pipelines.