The scientific computing community is suffering from a lack of good development tool that can handle well the unique problems of coding for high performance computing. It is much more difficult for domain experts to parallelize inherited serial codes written in FORTRAN which are very common in CSE research field. An automatic parallel programming IDE is developed for rapid development of...
Triple-GEM detectors are a well known technology in high energy physics. In order to have a complete understanding of their behavior, in parallel with on-beam testing, a Monte Carlo code has to be developed to simulate their response to the passage of particles. The software must take into account all the physical processes involved from the primary ionization up to the signal formation,...
The ATLAS experiment at the LHC has a complex heterogeneous distributed
computing infrastructure, which is used to process and analyse exabytes of data. Metadata are collected and stored at all stages of physics analysis and data processing. All metadata could be divided into operational metadata to be used for the quasi on-line monitoring, and archival to study the systems’ behaviour over a...
A Tier-3g Facility within the computing resources of Istanbul Aydin
University has been planned and installed in collaboration with TR-ULAKBIM national
Tier-2 center. The facility is intended to provide an upgraded data analysis
infrastructure to CERN researchers considering the recent nation-wide projects of ATLAS and
CMS experiments. The fundamental design of Tier-3g has been detailed in...
MATHUSLA has been proposed as a second detector that sits over 100m from an LHC interaction point, on the surface, to look for ultra long-lived particles. A test stand was constructed with 2 layers of scintillator paddles and 3 layers of RPC's, on loan from the DZERO and Argo-YBJ. Downward and upward going tracks from cosmics and muons from the interaction point have been reconstructed. To...
The ATLAS experiment implemented an ensemble of neural networks
(NeuralRinger algorithm) dedicated to improve the performance of
filtering events containing electrons in the high-input rate online
environment of the Large Hadron Collider at CERN, Geneva.
This algorithm has been used online to select electrons with transverse energies
above 15 GeV since 2017 and is extended to electrons...
During the last years we have carried out a renewal of the Building Management System (BMS) software of our data center with the aim of improving the data collection capability. Considering the complex physical distribution of the technical plants and the limits of the actual building hosting our center, a system that simply monitors and collects all the necessary information and provides...
We will present our experiences and preliminary studies on LHC high
energy physics data analysis with quantum simulators and IBM quantum
computer hardware using IBM Qiskit. The performance is compared with the
results using a classical machine learning method applied to a physics
process in Higgs-coupling-to–two-top-quarks as an example. This work is a
collaboration between University of...
CernVM-FS is a solution to scalable, reliable and low-maintenance software distribution that is widely used in various High Energy Physics collaborations. The information that can be distributed by CernVM-FS is not limited to software but any other data. By default, the whole CernVM-FS repository containing all subdirectories and files is available to all users in read-only mode after...
Athena is the software framework used in the ATLAS experiment throughout the data processing path, from the software trigger system through offline event reconstruction to physics analysis. The shift from high-power single-core CPUs to multi-core systems in the computing market means that the throughput capabilities of the framework have become limited by the available memory per process. For...
The Telescope Array experiment, located in Utah, USA, is aimed to the ultra-high-energy cosmic rays study with the detection of the extensive air showers (EAS). The surface detector of the Telescope Array provides multivariate data reconstructed from the waveforms of signals of the detectors which took part in a particular event. Moreover, a number of variables are composition-sensitive and...
Data-intensive end-user analyses in High Energy Physics requires high data throughput to reach short turnaround cycles.
This leads to enormous challenges for storage and network infrastructure, especially when facing the tremendously increasing amount of data to be processed during High-Luminosity LHC runs.
Including opportunistic resources with volatile storage systems into the traditional...
The INFN CNAF Tier-1 Long Term Data Preservation (LTDP) project was established at the end of 2012 in close collaboration with Fermi National Accelerator Laboratory (FNAL) with the purpose of saving, distributing and maintaining over time the CDF Tevatron analysis framework and all the relevant scientific data produced by the experiment activity. During recent years, a complete copy of all CDF...
Linux containers have gained widespread use in High Energy Physics, be it for services using container engines such as containerd/kubernetes, for production jobs using container engines such as Singularity or Shifter, or for development workflows using Docker as a local container engine. Thus the efficient distribution of the container images, whose size usually ranges from a few hundred...
The ATLAS experiment at the Large Hadron Collider (LHC) operated successfully from 2008 to 2018, which included Run 1 (2008-2013), a shutdown period and the Run 2 (2016-2018). In the course of the Run 2, the ATLAS data taking achieved an overall data taking efficiency of 97%, largely constraint by the irreducible dead-time introduced to accommodate the limitations of the detector read-out...
Nowadays, any physicist performing an analysis of the LHC data, needs to be well-versed in programming, at the level of both a system programmer and a software developer to handle the vast amounts of collision and simulation events. Even the simplest programming mistake in any of these areas can create big confusions on the analysis results. Moreover, a multitude of different analysis...
The ALICE experiment at the CERN LHC focuses on studying the quark-gluon plasma produced by heavy-ion collisions. After the Long Shutdown 2 in 2019-2020, the ALICE Experiment will see its data input throughput increase a hundredfold, up to 3.4 TB/s. In order to cope with such a large amount of data, a new online-offline computing system, called O2, will be deployed. By reconstructing the data...
The detector description is an essential component in simulation, reconstruction and analysis of data resulting from particle collisions in high energy physics experiments. The main motivation behind DD4hep is to provide an integrated solution for all these stages and addresses detector description in a broad sense, including the geometry and the materials used in the device, and additional...
The extensive physics program of the ATLAS experiment at the Large Hadron Collider (LHC) relies on large scale and high fidelity simulation of the detector response to particle interactions. Current full simulation techniques using Geant4 provide accurate modeling of the underlying physics processes, but are inherently resource intensive. In light of the high-luminosity upgrade of the LHC and...
Drift chamber is the main tracking detector for high energy physics experiment like BESIII. Due to the high luminosity and high beam intensity, drift chamber is suffer from the background from the beam and electronics which represent a computing challenge to the reconstruction software. Deep learning developments in the last few years have shown tremendous improvements in the analysis of data...
The NEWSdm (Nuclear Emulsions for WIMP Search directional measure) is an underground Direct detection Dark Matter (DM) search experiment. The usage of recent developments in the nuclear emulsions allows probing new regions in the WIMP parameter space. The prominent feature of this experiment is a potential of recording the signal direction, which gives a chance of overcoming the "neutrino...
The inner drift chamber of the BESIII experiment is encountering an aging problem after running of several years. A Cylindrical Gas Electron Multiplier Inner Tracker (CGEM-IT) has been an important candidate for the upgrade of the inner drift chamber. In order to understand the specific detection behavior of CGEM-IT and to build a digitization model for it, a detailed simulation study with the...
Scientific user communities are widely using computing and storage resources provided by large grid infrastructures. More and more capacity is provided by these infrastructures in a form of cloud resources. Cloud resources are much more flexible for usage but provide completely different access interfaces. Furthermore, grid infrastructure users are often getting access to extra computing...
All grid middleware require external packages to interact with computing elements, storage sites… In the case of the DIRAC middleware this was historically divided into two bundles, one called externals containing Python and standard binary libraries and the other called the LCGBundle containig libraries form the grid world (gfal, arc, etc). The externals were provided for several platforms...
We introduce two new loss functions designed to directly optimise the statistical significance of the expected number of signal events when training neural networks to classify events as signal or background in the scenario of a search for new physics at a particle collider. The loss functions are designed to directly maximise commonly used estimates of the statistical significance, s/√(s+b),...
In 2019 Belle II will start the planned physics runs with the entire detector installed. Compared to current collider experiments at the LHC, where all critical services are provided by the CERN as host lab and only storage and CPU resources are provided externally, Belle II and KEK chose a different, more distributed strategy. In particular, it provides easier access to existing expertise and...
Information concerning the operation, configuration and behaviour of the ATLAS experiment need to be reported, gathered and shared reliably with the whole ATLAS community which comprises over three thousand scientists geographically distributed all over the world. To provide such functionality, a logbook facility, Electronic Logbook for the information storage of ATLAS (ELisA), has been...
The Trigger and Data Acquisition (TDAQ) system of the ATLAS experiment the Large Hadron Collider (LHC) at CERN currently is composed of a large number of distributed hardware and software components (about 3000 machines and more than 25000 applications) which, in a coordinated manner, provide the data-taking functionality of the overall system.
During data taking runs, a huge flow of...
ROOT is a large code base with a complex set of build-time dependencies; there is a significant difference in compilation time between the “core” of ROOT and the full-fledged deployment. We present results on a “delayed build” for internal ROOT packages and external packages. This gives the ability to offer a “lightweight” core of ROOT, later extended by building additional modules to extend...
Maintaining the huge computing grid facilities for LHC
experiments and replacing their hardware every few years has been very
expensive. The California State University (CSU) ATLAS group just
received $250,000 AWS cloud credit from the CSU Chancellor’s Office to
build the first virtual US ATLAS Tier 3 to explore cloud solutions for
ATLAS. We will use this award to set up full ATLAS...
Deep learning has shown a promising future in physics’ data analysis and is anticipated to revolutionize LHC discoveries.
Designing an optimal algorithm may seem to be the most challenging task in machine learning progress especially in HEP due to the high dimensionality and extreme complexity of the data.
Physical knowledge can be employed in designing and modifying of the algorithm’s...
The German CMS community (DCMS) as a whole can benefit from the various compute resources, available to its different institutes. While Grid-enabled and National Analysis Facility resources are usually shared within the community, local and recently enabled opportunistic resources like HPC centers and cloud resources are not. Furthermore, there is no shared submission infrastructure...
We present an open source GPU-accelerated cross-platform FITS 2D image viewer FIPS. Unlike other FITS viewers, FIPS uses GPU hardware via OpenGL to provide functionality such as zooming, panning and level adjustments. FIPS is the first end-to-end GPU FITS image viewer: FITS image data is fully offloaded to GPU memory as is, and then processed by OpenGL shaders.
The executables and the source...
The Geant4 toolkit is used extensively in high energy physics to simulate the
passage of particles through matter and to estimate effects such as detector
responses, efficiencies and smearing. Geant4 uses many underlying models to predict
particle interaction kinematics, and uncertainty in these models leads to uncertainty in the interpretation of experiment measurements. The Geant4...
In High Energy Physics, tests of homogeneity are used primarily in two cases: for verification that data sample does not differ significantly from numerically produced Monte Carlo sample and for verifying separation of signal from background. Since Monte Carlo samples are usually weighted, it is necessary to modify classical homogeneity tests in order to apply them to weighted samples. In...
MPGD are the new frontier in gas trackers. Among this kind of
devices, the GEM chambers are widely used. The experimental signals acquired with the detector must obviously be reconstructed and analysed. In this
contribution, a new offline software to perform reconstruction,
alignment and analysis on the data collected with APV-25 and TIGER ASICs will
be presented. GRAAL (Gem Reconstruction And...
In recent years the usage of machine learning techniques within data-intensive sciences in general and high-energy physics in particular has rapidly increased, in part due to the availability of large datasets on which such algorithms can be trained as well as suitable hardware, such as graphics or tensor processing units which greatly accelerate the training and execution of such algorithms....
The ever growing amount of HEP data to be analyzed in the future requires as of today the allocation of additional, potentially only temporary available non-HEP dedicated resources. These so-called opportunistic resources are also well-suited to cover the typical unpredictable peak demands for computing resources in end-user analyses. However, their temporary availability requires a dynamic...
Hadronic decays of vector bosons and top quarks are increasingly important to the ATLAS physics program, both in measurements of the standard model and searches for new physics. At high energies, these decays are collimated into a single overlapping region of energy deposits in the detector, referred to as a jet. However, vector boson and top quarks are hidden under an enormous background of...
The traditional partial wave analysis (PWA) algorithm is designed to process data serially which requires a large amount of memory that may exceed the memory capacity of one single node to store runtime data. It is quite necessary to parallelize this algorithm in a distributed data computing framework to improve its performance. Within an existing production-level Hadoop cluster, we implement...
The Belle II experiment at the SuperKEKB e+e- collider has completed its first-collisions run in 2018. The experiment is currently preparing for physics data taking in 2019. With many scientists now preparing their analysis, the user friendliness of the Belle II software framework is of great importance.
Jupyter Notebooks allow for mixed code, documentation, and output like plots in a easy to...
The poster focuses on our experience in usage and extending of JupyterLab
in combination with EOS and CVMFS for HEP analysis within a local university group.
We started with a copy of CERN SWAN environment, after that our project evolved independently.
A major difference is that we switched from classic Jupyter Notebook
to JupyterLab, because our users are more insterested in text editor...
One of the problems of scientific software development is lack of proper language tools to do it conveniently. Among the modern languages only few are able (have flexibility and most importantly libraries) to handle scientific tasks: C++, Python and Java. Also in some cases some niche languages like C# or Julia could be used.
The major problem of C++ is the complexity of the language and...
We present a new approach to identification of boosted neutral particles using electromagnetic calorimeters of the LHCb detector. The identification of photons and neutral pions is currently based on expected properties of the objects reconstructed in the calorimeter. This allows to distinguish single photons in the electromagnetic calorimeter from overlapping photons produced from high...
Particle identification is a key ingredient of most of LHCb results. Muon identification in particular is used at every stage of the LHCb triggers. The objective of the muon identification is to distinguish muons from the rest of the particles using only information from the Muon subdetector under strict timing constraints. We use state-of-the-art gradient boosting algorithm and real data with...
RooFit and RooStats, the toolkits for statistical modelling in ROOT, are used in most searches and measurements at the Large Hadron Collider. The data to be collected in Run 3 will enable measurements with higher precision and models with larger complexity, but also require faster data processing.
In this talk, first results on vectorising and multi-threading likelihood fits in RooFit will be...
The Compact Muon Solenoid (CMS) is one of the general-purpose detectors at the CERN Large Hadron Collider (LHC) which collects enormous amounts of physics data. Before the final physics analysis can proceed, data has to be checked for quality (certified) by passing a number of automatic (like physics objects reconstruction, histogram preparation) and manual (checking, comparison and decision...
In this work are presented the result of the comparing two versions of GEANT4 by the use of experimental data of experiment HARP. The comparison is performed with help of a new method of statistical comparison of data sets. The method provides more information for data analysis than methods based on the chi-squared distribution.
Real time monitoring of Compact Muon Solenoid (CMS) trigger system is a vital task to ensure the quality of all physics results published by the collaboration. Today, the trigger monitoring software reports on potential problems given the time evolution of the reported rates. The anomalous rates are identified given the deviation from the prediction which is calculated using a regression model...
ATLAS is one of the generic-purpose experiments observing hadron
collisions at the LHC at CERN. Its trigger and data acquisition system
(TDAQ) is responsible for selecting and transporting interesting
physics events from the detector to permanent storage where the data
are used for physics analysis. The transient storage of ATLAS TDAQ is
the last component of the online data-flow system. It...
HEPSPEC-06 is a decade old suite used to benchmark CPU resources for WLCG.
Its adoption spans from the hardware vendors, to the site managers, funding agencies and software experts.
It is stable, reproducible, accurate, however it is reaching the end of its life.
Initial hints of lack of correlations with the HEP applications have been collected.
Looking for suitable alternatives the HEPiX...
Data Quality Monitoring (DQM) is a very significant component of all high-
energy physics (HEP) experiments. Data recorded by Data Acquisition (DAQ) sensors and devices are sampled to perform live monitoring of the status of each detector during data collection. This gives to the system and scientists the ability to identify problems with extremely low latency, minimizing the amount of data...
HEP computing is a typical data intensive computing. Performance of distributed storage system, can largely defines the efficiency of HEP data processing and analysis. There is a large number of parameters that can be adjusted in a distributed storage system. The setting of these parameters has a great influence on the performance. At present, these parameters are either set with static values...
Over the next few years, the LHC will prepare for the upcoming High-Luminosity upgrade
in which it is expected to deliver ten times more p-p collisions. This will create a harsher
radiation environment and higher detector occupancy. In this context, the ATLAS
experiment, one of the general purpose experiments at the LHC, plans substantial upgrades
to the detectors and to the trigger system in...
Beginning in 2021, the upgraded LHCb experiment will use a triggerless readout system collecting data at an event rate of 30 MHz. A software-only High Level Trigger will enable unprecedented flexibility for trigger selections. During the first stage (HLT1), a sub-set of the full offline track reconstruction for charged particles is run to select particles of interest based on single or...
Mitigation of the effect of the multiple parasitic proton collisions produced during bunch crossing at the LHC is a major endeavor towards the realization of the physics program at the collider. The pileup affects many physics observable derived during the online and offline reconstruction. We propose a graph neural network machine learning model, based on the PUPPI approach, for identifying...
The pixel vertex detector is an essential part of the Belle II experiment, allowing us to determinate the location of particle trajectories and decay vertices. The combined data from the innermost Pixel Vertex Detector (PXD), followed by the Silicon Vertex Detector (SVD), and the outermost Central Drift Chamber (CDC) are crucial in the event reconstruction phase to determine particle types,...
The Cherenkov Telescope Array (CTA) will be the largest ground-based, gamma-ray observatory. CTA will detect the signature of gamma rays and cosmic rays hadrons and electrons interacting with the earth atmosphere. Making the best possible use of this facility requires to be able to separate events generated by gamma rays from the particle-induced background. Deep neural networks produced...
The mass Monte Carlo data production is the most CPU intensive process in the data analysis of for the high energy physics. The use of large scale computational resources at HPC in China is expected to increase substantially the cost-efficiency of the processing. TianheII, the second fastest HPC in China, which used to ranks first in the TOP500. We report on the technical challenges and...
Event reconstruction for NOvA experiment is a critical step preceding further data analysis. We describe the complex NOvA reconstruction pipeline (containing several unsupervised learning techniques) with focus on the specific step of so-called "prong matching". In this step, we are combining 2D prongs (projections of particle trajectories) into 3D prong objects. In order to find the best...
A large class of statistical models in high energy physics can be expressed a simultaneous measurement of binned observables. A popular framework for such binned analysis is HistFactory. So far the only implementation of the model has been within the ROOT ecosystem, limiting adoption and extensibility. We present a complete and extensible implementation of the HistFactory class of models in...
The LHC’s Run3 will push the envelope on data-intensive workflows and, at the lowest level, this data is managed using the ROOT software framework. At the beginning of Run 1, all data was compressed with the ZLIB algorithm: ROOT has since added support for multiple new algorithms (such as LZMA and LZ4), each with unique strengths. Work is continuing as industry introduces new techniques -...
Measurements in Liquid Argon TPC (LArTPC) neutrino detectors, such as the MicroBooNE detector at Fermilab, feature large, high fidelity event images. Deep learning techniques have been extremely successful in classification tasks of photographs, but their application to LArTPC event images is challenging, due to the large size of the events. Events in these detectors are typically two orders...
Increasing data rates opens up new opportunities for astroparticle physics by improving the precision of data analysis and by deploying advanced analysis techniques that demand relatively large data volumes, e.g. deep learning . One of the ways to increase statistics is to combine data from different experimental setups for joint analysis. Moreover, such data integration provides us with an...
Distinct HEP workflows have distinct I/O needs; while ROOT I/O excels at serializing complex C++ objects common to reconstruction, analysis workflows typically have simpler objects and can sustain higher event rates. To meet these workflows, we have developed a “bulk I/O” interface, allowing multiple events’ data to be returned per library call. This reduces ROOT-related overheads and...
BESIII experiment studies physics in the tau-charm energy region. Since 2009, BESIII has collected large scale data samples and many important physics results have been achieved based on these samples. Gaudi is used as BESIII offline software underlying framework, for both data production and data analysis. As data set accumulated year by year, efficiency of data analysis becomes more and more...
We show how Interaction Networks could be used for jet tagging at the Large Hadron Collider.
We take as an example the problem of identifying high-pT H->bb decays exploiting both jet substructure and secondary vertices from b quarks. We consider all tracks produced in the hadronization of the two b’s and represent the jet both as a track-to-track and a track-to-vertex interaction. The...
The SuperKEKB collider and the Belle II experiment have finished the second phase of their runs in 2018, which was an essential step to study the e+ e- beam collisions and prepare for the third phase of the runs. The third phase starts at the beginning of 2019, and it is planned to collected a data sample of 50/ab during the following decade.
The simulation library of the Belle II experiment...
The design and performance of the ATLAS Inner Detector (ID) trigger algorithms running online on the high level trigger (HLT) processor farm for 13 TeV LHC collision data with high pile-up are discussed.
The HLT ID tracking is a vital component in all physics signatures in the ATLAS trigger for the precise selection of the rare or interesting events necessary for physics analysis...
The SND detector has been operating at the VEPP-2000 collider (BINP, Russia) for several years unveiling amazing knowledge. Being a scientific facility it experiences constant improvements. One of the improvements worth mentioning is the DQM system for the SND detector.
First, information is collected automatically by DQM scripts and then could be corrected/confirmed by the detector operators...
I describe the charged-track extrapolation and muon-identification modules in the Belle II data-analysis code framework (basf2). These modules use GEANT4E to extrapolate reconstructed charged tracks outward from the Belle II Central Drift Chamber into the outer particle-identification detectors, the electromagnetic calorimeter, and the K-long and muon detector (KLM). These modules propagate...
Track finding and fitting are amongst the most complex part of
event reconstruction in high-energy physics, and dominates usually
the computing time in high luminosity environment. A central part of
track reconstruction is the transport of a given track parameterisation
(i.e. the parameter estimation and associated covariances) through the
detector, respecting the magnetic field setup and the...
The LHCb experiment is dedicated to the study of the c- and b-hadrons decays, including long living particles such as Ks and strange baryons (Lambda, Xi, etc... ). These kind of particles are difficult to reconstruct from LHCb tracking systems since they escape the detection in the first tracker. A new method to evaluate the performance in terms of efficiency and throughput of the different...
In software development Continuous Integration (CI), the practice of bringing together multiple developers’ code modifications into a single repository, and Continuous Delivery (CD), the practice of automatedly creating and testing releases are well known. CI/CD pipelines are available in many automation tools (such as GitLab) and act to enhance and speed up software development.
Continuous...
Probability distribution functions (PDFs) are very used in modeling random
processes and physics simulations. It can be demonstrated that the relation-
ships between PDFs are linked through functional parameters. Improving the
performance of the generation of many random numbers to be used as input
by the PDFs is often a very challenging task as it involves algorithms with
acceptance-rejection...