Conveners
Track 5: Software Development: 5.1 - Chair: Steffen Luitz
- Concetta Cartaro (SLAC)
- Florian Uhlig (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
- Alberto Aimar (CERN)
Track 5: Software Development: 5.2 - Chair: Andreas Petzold
- Alberto Aimar (CERN)
- Florian Uhlig (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
- Concetta Cartaro (SLAC)
Track 5: Software Development: 5.3 - Chair: Jakob Blomer
- Alberto Aimar (CERN)
- Florian Uhlig (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
- Concetta Cartaro (SLAC)
Track 5: Software Development: 5.4 - Chair: Malachi Schram
- Alberto Aimar (CERN)
- Concetta Cartaro (SLAC)
- Florian Uhlig (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
Track 5: Software Development: 5.5 - Chair: Alberto Aimar
- Concetta Cartaro (SLAC)
- Alberto Aimar (CERN)
- Florian Uhlig (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
Track 5: Software Development: 5.6 - Chair: Andrea Dotti
- Alberto Aimar (CERN)
- Concetta Cartaro (SLAC)
- Florian Uhlig (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
Track 5: Software Development: 5.7 - Chair: Marcus Ebert
- Alberto Aimar (CERN)
- Concetta Cartaro (SLAC)
- Florian Uhlig (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
Based on GooFit, a GPU-friendly framework for doing maximum-likelihood fits, we have developed a tool for extracting model-independent S-wave amplitudes from three-body decays such as D+ --> h(')-,h+,h+. A full amplitude analysis is done where the magnitudes and phases of the S-wave amplitudes (or alternatively, the real and imaginary components), are anchored at a finite number of...
PODIO is a C++ library that supports the automatic creation and efficient handling of HEP event data, developed as a new EDM toolkit for future particle physics experiments in the context of the AIDA2020 EU programme. Event
data models (EDMs) are at the core of every HEP experiment’s software framework, essential for providing a communication channel between different algorithms in the data...
The instantaneous luminosity of the LHC is expected to increase at HL-LHC so that the amount of pile-up can reach a level of 200 interaction per bunch crossing, almost a factor of 10 w.r.t the luminosity reached at the end of run 1. In addition, the experiments plan a 10-fold increase of the readout rate. This will be a challenge for the ATLAS and CMS experiments, in particular for the...
Radiotherapy is planned with the aim of delivering a lethal dose of radiation to a tumour, while keeping doses to nearby healthy organs at an acceptable level. Organ movements and shape changes, over a course of treatment typically lasting four to eight weeks, can result in actual doses being different from planned. The UK-based VoxTox project aims to compute actual doses, at the level of...
The use of up-to-date machine learning methods, including deep neural networks, running directly on raw data has significant potential in High Energy Physics for revealing patterns in detector signals and as a result improving reconstruction and the sensitivity of the final physics analyses. In this work, we describe a machine-learning analysis pipeline developed and operating at the National...
The observation of neutrino oscillation provides evidence of physics beyond the standard model, and the precise measurement of those oscillations remains an important goal for the field of particle physics. Using two finely segmented liquid scintillator detectors located 14 mrad off-axis from the NuMI muon-neutrino beam, NOvA is in a prime position to contribute to precision measurements of...
With ROOT 6 in production in most experiments, ROOT has changed gear during the past year: the development focus on the interpreter has been redirected into other areas.
This presentation will summarize the developments that have happened in all areas of ROOT, for instance concurrency mechanisms, the serialization of C++11 types, new graphics palettes, new "glue" packages for multivariate...
ROOT is one of the core software tool for physicists. For more than a decade it has a central position in the physicists' analysis code and the experiments' frameworks thanks in parts to its stability and simplicity of use. This allowed software development for analysis and frameworks to use ROOT as a "common language" for HEP, across virtually all experiments.
Software development in...
ROOT version 6 comes with a C++ compliant interpreter cling. Cling needs to know everything about the code in libraries to be able to interact with them.
This translates into increased memory usage with respect to previous versions of
ROOT.
During the runtime automatic library loading process, ROOT6 re-parses a
set of header files, which describe the library; and enters "recursive"...
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments.
The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In...
Notebooks represent an exciting new approach that will considerably facilitate collaborative physics analysis.
They are a modern and widely-adopted tool to express computational narratives comprising, among other elements, rich text, code and data visualisations. Several notebook flavours exist, although one of them has been particularly successful: the Jupyter open source project.
In this...
ROOT provides advanced statistical methods needed by the LHC experiments to analyze their data. These include machine learning tools for classification, regression and clustering. TMVA, a toolkit for multi-variate analysis in ROOT, provides these machine learning methods.
We will present new developments in TMVA, including parallelisation, deep-learning neural networks, new features and...
ROOT provides an extremely flexible format used throughout the HEP community. The number of use cases – from an archival data format to end-stage analysis – has required a number of tradeoffs to be exposed to the user. For example, a high “compression level” in the traditional DEFLATE algorithm will result in a smaller file (saving disk space) at the cost of slower decompression (costing CPU...
We present rootJS, an interface making it possible to seamlessly integrate ROOT 6 into applications written for Node.js, the JavaScript runtime platform increasingly commonly used to create high-performance Web applications. ROOT features can be called both directly from Node.js code and by JIT-compiling C++ macros. All rootJS methods are invoked asynchronously and support callback functions,...
HEP applications perform an excessive amount of allocations/deallocations within short time intervals which results in memory churn, poor locality and performance degradation. These issues are already known for a decade, but due to the complexity of software frameworks and the large amount of allocations (which are in the order of billions for a single job), up until recently no efficient...
The recent progress in parallel hardware architectures with deeper
vector pipelines or many-cores technologies brings opportunities for
HEP experiments to take advantage of SIMD and SIMT computing models.
Launched in 2013, the GeantV project studies performance gains in
propagating multiple particles in parallel, improving instruction
throughput and data locality in HEP event simulation....
As the ATLAS Experiment prepares to move to a multi-threaded framework
(AthenaMT) for Run3, we are faced with the problem of how to migrate 4
million lines of C++ source code. This code has been written over the
past 15 years and has often been adapted, re-written or extended to
the changing requirements and circumstances of LHC data taking. The
code was developed by different authors, many of...
Some data analysis methods typically used in econometric studies and in ecology have been evaluated and applied in physics software environments. They concern the evolution of observables through objective identification of change points and trends, and measurements of inequality, diversity and evenness across a data set. Within each one of these analysis areas, several statistical tests and...
The IT Analysis Working Group (AWG) has been formed at CERN across individual computing units and the experiments to attempt a cross cutting analysis of computing infrastructure and application metrics. In this presentation we will describe the first results obtained using medium/long term data (1 months - 1 year) correlating box level metrics, job level metrics from LSF and HTCondor, I/O...
The goal of the comparison is to summarize the state-of-the-art techniques of deep learning which is boosted with modern GPUs. Deep learning, which is also known as deep structured learning or hierarchical learning, is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers composed of multiple...
In the midst of the multi- and many-core era, the computing models employed by
HEP experiments are evolving to embrace the trends of new hardware technologies.
As the computing needs of present and future HEP experiments -particularly those
at the Large Hadron Collider- grow, adoption of many-core architectures and
highly-parallel programming models is essential to prevent degradation...
Around the year 2000, the convergence on Linux and commodity x86_64 processors provided a homogeneous scientific computing platform which enabled the construction of the Worldwide LHC Computing Grid (WLCG) for LHC data processing. In the last decade the size and density of computing infrastructure has grown significantly. Consequently, power availability and dissipation have become important...
Exascale computing resources are roughly a decade away and will be capable of 100 times more computing than current supercomputers. In the last year, Energy Frontier experiments crossed a milestone of 100 million core-hours used at the Argonne Leadership Computing Facility, Oak Ridge Leadership Computing Facility, and NERSC. The Fortran-based leading-order parton generator called Alpgen was...
ALICE (A Large Ion Collider Experiment) is a heavy-ion detector studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). After the second long shut-down of the LHC, the ALICE detector will be upgraded to cope with an interaction rate of 50 kHz in Pb-Pb collisions, producing in the online computing system (O2) a sustained throughput...
CERN openlab is a unique public-private partnership between CERN and leading IT companies and research institutes. Several of the CERN openlab projects investigate technologies that have the potential to become game changers in HEP software development (like Intel Xeon-FPGA, Intel 3DXpoint memory, Micron Automata Processor, etc.). In this presentation I will highlight a number of these...
Over the last seven years the software stack of the next generation B factory experiment Belle II has grown to over 400,000 lines of C++ and python code, counting only the part included in offline software releases. There are several thousand commits to the central repository by about 100 individual developers per year. To keep a coherent software stack of high quality such that it can be...
In particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads.
We present a generic analysis design pattern...
The VecGeom geometry library is a relatively recent effort aiming to provide
a modern and high performance geometry service for particle-detector simulation
in hierarchical detector geometries common to HEP experiments.
One of its principal targets is the effective use of vector SIMD hardware
instructions to accelerate geometry calculations for single-track as well
as multiple-track...
The Toolkit for Multivariate Analysis (TMVA) is a component of the ROOT data analysis framework and is widely used for classification problems. For example, TMVA might be used for the binary classification problem of distinguishing signal from background events.
The classification methods included in TMVA are standard, well-known machine learning techniques which can be implemented in other...
We investigate the combination of a Monte Carlo Tree Search, hierarchical space decomposition, Hough Transform techniques and
parallel computing to the problem of line detection and shape recognition in general.
Paul Hough introduced in 1962 a method for detecting lines in binary images. Extended in the 1970s to the detection of space forms, what
came to be known as the Hough Transform...
Events visualisation in ALICE - current status and strategy for Run 3
Jeremi Niedziela for the ALICE Collaboration
A Large Ion Collider Experiment (ALICE) is one of the four big experiments running at the Large Hadron Collider (LHC), which focuses on the study of the Quark-Gluon Plasma (QGP) being produced in heavy-ion collisions.
The ALICE Event Visualisation Environment (AliEVE) is...
Today’s analyses for high energy physics experiments involve processing a large amount of data with highly specialized algorithms. The contemporary workflow from recorded data to final results is based on the execution of small scripts - often written in Python or ROOT macros which call complex compiled algorithms in the background - to perform fitting procedures and generate plots. During...
At the beginning, HEP experiments made use of photographical images both to record and store experimental data and to illustrate their findings. Then the experiments evolved and needed to find ways to visualize their data. With the availability of computer graphics, software packages to display event data and the detector geometry started to be developed. Here a brief history of event displays...
Modern web browsers are powerful and sophisticated applications that support an ever-wider range of uses. One such use is rendering high-quality, GPU-accelerated, interactive 2D and 3D graphics in an HTML canvas. This can be done via WebGL, a JavaScript API based on OpenGL ES. Applications delivered via the browser have several distinct benefits for the developer and user. For example, they...
ParaView [1] is a high performance visualization application not widely used in HEP. It is a long standing open source project led by Kitware[2] and involves several DOE and DOD laboratories and has been adopted by many DOE supercomputing centers and other sites. ParaView is unique in speed and efficiency by using state-of-the-art techniques developed by the academic visualization community...
Reproducibility is a fundamental piece of the scientific method and increasingly complex problems demand ever wider collaboration between scientists. To make research fully reproducible and accessible to collaborators a researcher has to take care of several aspects: research protocol description, data access, preservation of the execution environment, workflow pipeline, and analysis script...
Experimental Particle Physics has been at the forefront of analyzing the world’s largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called “Big Data” technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles...
European Strategy for Particle Physics update 2013, the study explores different designs of circular colliders for the post-LHC era. Reaching unprecedented energies and luminosities require to understand system reliability behaviour from the concept phase onwards and to design for availability and sustainable operation. The study explores industrial approaches to model and simulate the...
The CMS experiment has implemented a computing model where distributed monitoring infrastructures are collecting any kind of data and metadata about the performance of the computing operations. This data can be probed further by harnessing Big Data analytics approaches and discovering patterns and correlations that can improve the throughput and the efficiency of the computing model.
CMS...
The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal environment in particular for the first steps of skimming rapidly through hundreds of TB of low relevance data to find and extract the much smaller data volume that is...
Big Data technologies have proven to be very useful for storage, processing and visualization of derived
metrics associated with ATLAS distributed computing (ADC) services. Log file data and database records, and
metadata from a diversity of systems have been aggregated and indexed to create an analytics platform for
ATLAS ADC operations analysis. Dashboards, wide area data access cost...
This contribution is about sharing our recent experiences of building Hadoop based application. Hadoop ecosystem now offers myriad of tools which can overwhelm new users, yet there are successful ways these tools can be leveraged to solve problems. We look at factors to consider when using Hadoop to model and store data, best practices for moving data in and out of the system and common...