The region of South-East Europe has a long history of successful collaboration in sharing resources and managing distributed electronic infrastructures for the needs of research communities. The HPC resources like supercomputers and big clusters with low-latency interconnection are an especially valuable and scarce resource in the region. Building upon the successfully tested operational and...
As the result of joint R&D work with 10 of Europe’s leading public research organisations, led by CERN and funded by the EU, T-Systems provides a hybrid cloud solution, enabling science users to seamlessly extend their existing e-Infrastructures with one of the leading European public cloud services based on OpenStack – the Open Telekom Cloud.
With this new approach large-scale data-intensive...
Ten of Europe’s leading public research organisations led by CERN launched the Helix Nebula Science Cloud (HNSciCloud) Pre-Commercial Procurement to establish a European hybrid cloud platform that will support the high-performance, data-intensive scientific use-cases of this “Buyers Group” and of the research sector at large. It calls for the design and implementation of innovative...
Faster alternatives to a full, GEANT4-based simulation are being pursued within the LHCb experiment. In this context the integration of the Delphes toolkit in the LHCb simulation framework is intended to provide a fully parameterized option.
Delphes is a modular software designed for general-purpose experiments such as ATLAS and CMS to quickly propagate stable particles using a parametric...
IceCube Neutrino Observatory is a neutrino detector located at the South Pole. Here we present experiences acquired when using HTCondor to run IceCube’s GPU simulation worksets on the Titan supercomputer. Titan is a large supercomputer geared for High Performance Computing (HPC). Several factors make it challenging to use Titan for IceCube’s High Throughput Computing (HTC) workloads: (1) Titan...
IceCube is a cubic kilometer neutrino detector located at the south pole. Every year, 29 TB of data are transmitted via satellite, and 365 TB of data are shipped on archival media, to the data warehouse in Madison, WI, USA. The JADE Long Term Archive (JADE-LTA) software indexes and bundles IceCube files and transfers the archive bundles for long term storage and preservation into tape silos...
The WLCG unites resources from over 169 sites spread across the world and the number is expected to grow in the coming years. However, setting up and configuring new sites to support WLCG workloads is still no straightforward task and often requires significant assistance from WLCG experts. A survey presented in CHEP 2016 revealed a strong wish among site admins for reduction of overheads...
The High Luminosity LHC (HL-LHC) represents an unprecedented computing challenge. For the program to succeed the current estimates from the LHC experiments for the amount of processing and storage required are roughly 50 times more than are currently deployed. Although some of the increased capacity will be provided by technology improvements over time, the computing budget is expected to...
Data AQuisition (DAQ) systems are a vital component of every experiment. The purpose of the underlying software of these systems is to coordinate all the hardware components and detector states, providing the means of data readout, triggering, online processing, persistence, user control and the routing of data. These tasks are made more challenging when also considering fault tolerance,...
The ATLAS experiment at the LHC relies on a complex and distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data. The High Level Trigger (HLT) component of the TDAQ system is responsible for executing advanced selection algorithms, reducing the data rate to a level suitable for recording to permanent storage. The HLT functionality is provided by a...
ATLAS relies on very large samples of simulated events for delivering high-quality
and competitive physics results, but producing these samples takes much time and
is very CPU intensive when using the full GEANT4 detector simulation.
Fast simulation tools are a useful way of reducing CPU requirements when detailed
detector simulations are not needed. During the LHC Runs 1 and 2, a...
In this talk, we will describe the latest additions to the Toolkit for Multivariate Analysis (TMVA), the machine learning package integrated into the ROOT framework. In particular, we will focus on the new deep learning module that contains robust fully-connected, convolutional and recurrent deep neural networks implemented on CPU and GPU architectures. We will present performance of these new...
Recently, a stability of Data Acquisition System (DAQ) has become a vital precondition for a successful data taking in high energy physics experiments. The intelligent, FPGA-based Data Acquisition System (iFDAQ) of the COMPASS experiment at CERN is designed to be able to readout data at the maximum rate of the experiment and running in a mode without any stops. DAQ systems fulfilling such...
Many HEP experiments are moving beyond experimental studies to making large-scale production use of HPC resources at NERSC including the knights landing architectures on the Cori supercomputer. These include ATLAS, Alice, Belle2, CMS, LSST-DESC, and STAR among others. Achieving this has involved several different approaches and has required innovations both on NERSC and the experiments’ sides....
In HEP experiments CPU resources required by MC simulations are constantly growing and becoming a very large fraction of the total computing power (greater than 75%). At the same time the pace of performance improvements given by technology is slowing down, so the only solution is a more efficient use of resources. Efforts are ongoing in the LHC experiment collaborations to provide multiple...
Weak gravitational lensing is an extremely powerful probe for gaining insight into the nature of two of the greatest mysteries of the universe -- dark energy and dark matter. To help prepare for the massive amounts of data coming from next generation surveys like LSST that hope to advance our understanding of these mysteries, we have developed an automated and seamless weak lensing cosmic...
The cloud computing paradigm allows scientists to elastically grow or shrink computing resources as requirements demand, so that resources only need to be paid for when necessary. The challenge of integrating cloud computing into distributed computing frameworks used by HEP experiments has led to many different solutions in the past years, however none of these solutions offer a complete,...
The Scikit-HEP project is a community-driven and community-oriented effort with the aim of providing Particle Physics at large with a Python scientific toolset containing core and common tools. The project builds on five pillars that embrace the major topics involved in a physicist’s analysis work: datasets, data aggregations, modelling, simulation and visualisation. The vision is to build a...
The Titan supercomputer at Oak Ridge National Laboratory prioritizes the scheduling of large leadership class jobs, but even when the supercomputer is fully loaded and large jobs are standing in the queue to run, 10 percent of the machine remains available for a mix of smaller jobs, essentially ‘filling in the cracks’ between the very large jobs. Such utilisation of the computer resources is...
IceCube is a cubic kilometer neutrino detector located at the south pole. CVMFS is a key component to IceCube’s Distributed High Throughput Computing analytics workflow for sharing 500GB of software across datacenters worldwide. Building the IceCube software suite across multiple platforms and deploying it into CVMFS has until recently been a manual, time consuming task that doesn’t fit well...
The Trigger and DAQ (TDAQ) system of the ATLAS experiment is a complex
distributed computing system, composed of O(30000) of applications
running on a farm of computers. The system is operated by a crew of
operators on shift. An important aspect of operations is to minimize
the downtime of the system caused by runtime failures, such as human
errors, unawareness, miscommunication, etc.
The...
The goal to obtain more precise physics results in current collider experiments drives the plans to significantly increase the instantaneous luminosity collected by the experiments . The increasing complexity of the events due to the resulting increased pileup requires new approaches to triggering, reconstruction, analysis,
and event simulation. The last task brings to a critical problem:...
HIPSTER (Heavily Ionising Particle Standard Toolkit for Event Recognition) is an open source Python package designed to facilitate the use of TensorFlow in a high energy physics analysis context. The core functionality of the software is presented, with images from the MoEDAL experiment Nuclear Track Detectors (NTDs) serving as an example dataset. Convolutional neural networks are selected as...
The software for detector simulation, reconstruction and analysis of physics data is an essential part of each high-energy physics experiment. A new generation of the experiments for the relativistic nuclear physics is expected to be started up in the nearest years at the Nuclotron-based Ion Collider facility (NICA) being under construction at the Joint Institute for Nuclear Research in Dubna:...
Machine Learning techniques have been used in different applications by the HEP community: in this talk, we discuss the case of detector simulation. The amount of simulated events, expected in the future for LHC experiments and their High Luminosity upgrades, is increasing dramatically and requires new fast simulation solutions. We will describe an R&D activity, aimed at providing a...
Technical details of the directly manipulated systems and the impact on non-obviously connected systems are required knowledge when preparing an intervention in a complex experiment like ATLAS. In order to improve the understanding of the parties involved in an intervention a rule-based expert system has been developed. On the one hand this helps to recognize dependencies that are not always...
Reducing time and cost, through setup and operational efficiency increase is a key nowadays while exploiting private or commercial clouds. In turn this means that reducing the learning curve as well as the operational cost of managing community-specific services running on distributed environments became a key to success and sustainability, even more for communities seeking to exploit...
In the traditional HEP analysis paradigm, code, documentation, and results are separate entities that require significant effort to keep synchronized, which hinders reproducibility. Jupyter notebooks allow these elements to be combined into a single, repeatable narrative. HEP analyses, however, commonly rely on complex software stacks and the use of distributed computing resources,...
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this talk we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system...
In the context of the common online-offline computing infrastructure for Run3 (ALICE-O2), ALICE is reorganizing its detector simulation software to be based on FairRoot, offering a common toolkit to implement simulation based on the Virtual-Monte-Carlo (VMC) scheme. Recently, FairRoot has been augmented by ALFA, a software framework developed in collaboration between ALICE and FAIR, offering...
In the framework of the H2020 INDIGO-DataCloud project we have implemented an advanced solution for the automatic deployment of digital data repositories based on Invenio, the digital library framework developed by Cern. Exploiting cutting-edge technologies, like docker and Apache Mesos, and standard interfaces like TOSCA we are able to provide a service that simplifies the process of creating...
In 2019, the ATLAS experiment at CERN is planning an upgrade
in order to cope with the higher luminosity requirements. In this
upgrade, the installation of the new muon chambers for the end-cap
muon system will be carried out. Muon track reconstruction performance
can be improved, and fake triggers can be reduced. It is also
necessary to develop readout system of trigger data for the...
AMI (ATLAS Metadata Interface) is a generic ecosystem for metadata aggregation, transformation and cataloguing. Benefitting from more than 15 years of feedback in the LHC context, the second major version was recently released. We describe the design choices and their benefits for providing high-level metadata-dedicated features. In particular, we focus on the implementation of the Metadata...
In spite of the fact that HEP computing has evolved considerably over the years, the understanding of the evolution process seems to be still incomplete. There is no clear procedure to replace an established product with a new one, and most of the successful major transitions (e.g. PAW to Root or Geant3 to Geant4) have involved a large dose of serendipity and have caused splits in the...
The FabrIc for Frontier Experiments (FIFE) project within the Fermilab Scientific Computing Division is charged with integrating offline computing components into a common computing stack for the non-LHC Fermilab experiments, supporting experiment offline computing, and consulting on new, novel workflows. We will discuss the general FIFE onboarding strategy, the upgrades and enhancements in...
Detector simulation has become fundamental to the success of modern high-energy physics (HEP) experiments. For example, the Geant4-based simulation applications developed by the ATLAS and CMS experiments played a major role for them to produce physics measurements of unprecedented quality and precision with faster turnaround, from data taking to journal submission, than any previous hadron...
LHCb is one of the 4 experiments at the LHC accelerator at CERN, specialized in b-physics. During the next long shutdown period, the LHCb experiment will be upgraded to a trigger-less readout system with a full software trigger in order to be able to record data with a much higher instantaneous luminosity. To achieve this goal, the upgraded systems for trigger, timing and fast control (TFC)...
Neural networks, and recently, specifically deep neural networks, are attractive candidates for machine learning problems in high energy physics because they can act as universal approximators. With a properly defined objective function and sufficient training data, neural networks are capable of approximating functions for which physicists lack sufficient insight to derive an analytic,...
The XCache (XRootD Proxy Cache) provides a disk-based caching proxy for data access via the XRootD protocol. This can be deployed at WLCG Tier-2 computing sites to provide a transparent cache service for the optimisation of data access, placement and replication.
We will describe the steps to enable full read/write operations to storage endpoints consistent with the distributed data...
Data acquisition and control play an important role in science applications especially in modern Experiments of high energy physics (HEP). A comprehensive and efficient monitoring system is a vital part of any HEP experiment. In this paper we describe the software web-based framework which is currently used by CMD-3 Collaboration during data taking with the CMD-3 Detector at the VEPP-2000...
High Energy Physics experiments often rely on Monte-Carlo event generators. Such generators often contain a large number of parameters and need fine-tuning to closely match experimentally observed data. This task traditionally requires expert knowledge of the generator and the experimental setup as well as vast computing power.Generative Adversarial Networks (GAN) is a powerful method to match...
The CMS full simulation using Geant4 has delivered billions of simulated events for analysis during Runs 1 and 2 of the LHC. However, the HL-LHC dataset will be an order of magnitude larger, with a similar increase in occupancy per event. In addition, the upgraded CMS detector will be considerably more complex, with an extended silicon tracker and a high granularity calorimeter in the endcap...
HEPCloud is rapidly becoming the primary system for provisioning compute resources for all Fermilab-affiliated experiments. In order to reliably meet peak demands of the next generation of High Energy Physics experiments, Fermilab must either plan to locally provision enough resources to cover the forecasted need, or find ways to elastically expand its computational capabilities. Commercial...
The fraction of general internet traffic carried over IPv6 continues to grow rapidly. The transition of WLCG central and storage services to dual-stack IPv4/IPv6 is progressing well, thus enabling the use of IPv6-only CPU resources as agreed by the WLCG Management Board and presented by us at CHEP2016. By April 2018, all WLCG Tier 1 data centres will provide access to their services over IPv6....
The need for good software training is essential in the HEP community. Unfortunately, current training is non-homogeneous and the definition of a common baseline is unclear, making it difficult for newcomers to proficiently join large collaborations such as ALICE or LHCb.
In the last years, both collaborations have started separate efforts to tackle this issue through training workshops, via...
High throughput and short turnaround cycles are core requirements for the efficient processing of I/O-intense end-user analyses. Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to end-users. This situation is even compounded by taking into account opportunistic resources...
The certification of the CMS data as usable for physics analysis is a crucial task to ensure the quality of all physics results published by the collaboration. Currently, the certification conducted by human experts is labor intensive and can only be segmented on a run by run basis. This contribution focuses on the design and prototype of an automated certification system assessing data...
In order to meet the challenges of the Run-3 data rates and volumes, the ALICE collaboration is merging the online and offline infrastructures into a common framework: ALICE-O2.
O2 is based on FairRoot and FairMQ, a message-based, multi-threaded and multi-process control framework.
In FairMQ, processes (possibly on different machines) exchange data via message queues either through 0MQ or...
The amount of data to be processed by experiments in high energy physics is tremendously increasing in the coming years. For the first time in history the expected technology advance itself will not be sufficient to cover the arising gap between required and available resources based on the assumption of maintaining the current flat budget hardware procurement strategy. This leads to...
The high-luminosity data produced by the LHC leads to many proton-proton interactions per beam
crossing in ATLAS, known as pile-up. In order to understand the ATLAS data and extract the physics
results it is important to model these effects accurately in the simulation. As the pile-up rate continues
to grow towards an eventual rate of 200 for the HL-LHC, this puts increasing demands on...
This paper presents the Detector Control System (DCS) that is being designed and implemented for the NP04 experiment at CERN. NP04, also known as protoDUNE Single Phase (SP), aims at validating the engineering processes and detector performance of a large LAr Time Projection Chamber in view of the DUNE experiment. The detector is under construction and will be operated on a tertiary beam of...
Data-intensive science collaborations still face challenges when transferring large data sets between globally distributed endpoints. Many issues need to be addressed to orchestrate the network resources in order to better explore the available infrastructure. In multi-domain scenarios, the complexity increases because network operators rarely export the network topology to researchers and...
The interest in using Big Data solutions based on Hadoop ecosystem is constantly growing in HEP community. This drives the need for increased reliability and availability of the central Hadoop service and underlying infrastructure provided to the community by the CERN IT department.
This contribution will report on the overall status of the Hadoop platform and the recent enhancements and...
To address the challenges of the major upgrade of the experiment, the ALICE simulations must be able to make efficient use of computing and opportunistic supercomputing resources available on the GRID. The Geant4 transport package, the performance of which has been demonstrated in a hybrid multithreading (MT) and multiprocessing (MPI) environment with up to ¼ million threads, is therefore of a...
Collaboration in research is essential for it is saving time and money. The field of high-energy physics (HEP) is no different. The higher level of collaboration the stronger community. The HEP field encourages organizing various events in format and size such as meetings, workshops and conferences. Making attending a HEP event easier leverages cooperation and dialogue and this is what makes...
Recent years have seen the mass adoption of streaming in mobile computing, an increase in size and frequency of bulk long-haul data transfers
in science in general, and the usage of big data sets in job processing
demanding real-time long-haul accesses that can be greatly affected by
variations in latency. It has been shown in the Physics and climate research communities that the need to...
Online Data Quality Monitoring (DQM) in High Energy Physics experiment is a key task which, nowadays, is extremely expensive in terms of human resources and required expertise.
We investigate machine learning as a solution for automatised DQM. The contribution focuses on the peculiar challenges posed by the requirement of setting up and evaluating the AI algorithms in the online environment;...
During the Run-2 of the Large Hadron Collider (LHC) the instantaneous luminosity exceeds the nominal value of 10^{34} cm^{−2} s^{−1} with a 25 ns bunch crossing period and the number of overlapping proton-proton interactions per bunch crossing increases up to about 80. These conditions pose a challenge to the trigger system of the experiments that has to control rates while keeping a good...
LZ is a Dark Matter experiment based at the Sanford Underground Research Facility. It is currently under construction and aims to start data taking in 2020. Its computing model is based on two data centres, one in the USA (USDC) and one in the UK (UKDC), both holding a complete copy of its data. During stable periods of running both data centres plan to concentrate on different aspects of...
In 2015 ATLAS Distributed Computing started to migrate its monitoring systems away from Oracle DB and decided to adopt new big data platforms that are open source, horizontally scalable, and offer the flexibility of NoSQL systems. Three years later, the full software stack is in place, the system is considered in production and operating at near maximum capacity (in terms of storage capacity...
The LHCb experiment, one of the four operating in the LHC, will be enduring a major upgrade of its electronics during the third long shutdown period of the particle accelerator. One of the main objectives of the upgrade effort is to implement a 40MHz readout of collision data. For this purpose, the Front-End electronics will make extensive use of a radiation resistant chipset, the Gigabit...
Networking is foundational to the ATLAS distributed infrastructure and there are many ongoing activities related to networking both within and outside of ATLAS. We will report on the progress in a number of areas exploring ATLAS's use of networking and our ability to monitor the network, analyze metrics from the network, and tune and optimize application and end-host parameters to make the...
The Jiangmen Underground Neutrino Observatory (JUNO) is a multi-purpose neutrino experiment. It consists of a central detector, a water pool and a top tracker. The central detector, which is used for neutrino detection, consists of 20 kt liquid scintillator (LS) and about 18,000 20-inch photomultiplier tubes (PMTs) to collect lights from LS.
Simulation software is one of the important parts...
Computing in the field of high energy physics requires usage of heterogeneous computing resources and IT, such as grid, high performance computing, cloud computing and big data analytics for data processing and analysis. The core of the distributed computing environment at the Joint Institute for Nuclear Research is the Multifunctional Information and Computing Complex (MICC). It includes...
The NOvA experiment is a two-detectors, long-baseline neutrino experiment operating since 2014 in the NuMI muon neutrino beam (FNAL, USA). NOvA has already collected about 25% of its expected statistics in both neutrino and antineutrino modes for electron-neutrino appearance and muon-neutrino disappearance analyses. Careful simulation of neutrino events and backgrounds are required for precise...
The First-level Event Selector (FLES) is the main event selection
system of the upcoming CBM experiment at the future FAIR facility in
Germany. As the central element, a high-performance compute
cluster analyses free-streaming, time-stamped data delivered from the
detector systems at rates exceeding 1 TByte/s and selects data
for permanent storage.
While the detector systems are located in a...
LHC@home has provided computing capacity for simulations under BOINC since 2005. Following the introduction of virtualisation with BOINC to run HEP Linux software in a virtual machine on volunteer desktops, initially started on the test BOINC projects, like Test4Theory and ATLAS@home, all CERN applications distributed to volunteers have been consolidated under a single LHC@home BOINC project....
The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time modules to layer between components. We believe additional layering formalisms will benefit ROOT and its users.
We present the modularization strategy for ROOT which aims to formalize the description...
The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by the researcher to produce the original scientific results in the first place.
REANA (=Reusable Analyses) is a nascent platform enabling researchers to...
The Electromagnetic Calorimeter (ECAL) is one of the sub-detectors of the Compact Muon Solenoid (CMS) experiment of the Large Hadron Collider (LHC) at CERN. Since more than 10 years, the ECAL Detector Control System (DCS) and the ECAL Safety System (ESS) have supported the experiment operation, contributing to its high availability and safety. The evolution of both systems to fulfill new...
Processing ATLAS event data requires a wide variety of auxiliary information from geometry, trigger, and conditions database systems. This information is used to dictate the course of processing and refine the measurement of particle trajectories and energies to construct a complete and accurate picture of the remnants of particle collisions. Such processing occurs on a worldwide computing...
The GooFit highly parallel fitting package for GPUs and CPUs has been substantially upgraded in the past year. Python bindings have been added to allow simple access to the fitting configuration, setup, and execution. A Python tool to write custom GooFit code given a (compact and elegant) MINT3/AmpGen amplitude description allows the corresponding C++ code to be written quickly and correctly. ...
We present recent work within the ATLAS collaboration to centrally provide tools to facilitate analysis management and highly automated container-based analysis execution in order to both enable non-experts to benefit from these best practices as well as the collaboration to track and re-execute analyses independently, e.g. during their review phase.
Through integration with the ATLAS GLANCE...
The Edinburgh (UK) Tier-2 computing site has provided CPU and storage resources to the Worldwide LHC Computing Grid (WLCG) for close to 10 years. Unlike other sites, resources are shared amongst members of the hosting institute rather than being exclusively provisioned for Grid computing. Although this unconventional approach has posed challenges for troubleshooting and service delivery there...
Network performance is key to the correct operation of any modern datacentre infrastructure or data acquisition (DAQ) system. Hence, it is crucial to ensure the devices employed in the network are carefully selected to meet the required needs.
The established benchmarking methodology [1,2] consists of various tests that create perfectly reproducible traffic patterns. This has the advantage of...
The volunteer computing project ATLAS@Home has been providing a stable computing resource for the ATLAS experiment since 2013. It has recently undergone some significant developments and as a result has become one of the largest resources contributing to ATLAS computing, by expanding its scope beyond traditional volunteers and into exploitation of idle computing power in ATLAS data centres....
The distributed data management system Rucio manages all data of the ATLAS collaboration across the grid. Automation such as replication and rebalancing are an important part to ensure the minimum workflow execution times. In this paper, a new rebalancing algorithm based on machine learning is proposed. First, it can run independently of the existing rebalancing mechanism and can be...
The BESIII detector is a magnetic spectrometer operating at BEPCII, a
double-ring e+e- collider with center-of-mass energies between 2.0 and
4.6 GeV and a peak luminosity $10^{33}$ cm$^{-2}$ s$^{-1}$. The event rate
is about 4 kHz after the online event filter (L3 trigger) at J/$\psi$
peak.
The BESIII online data quality monitoring (DQM) system is used to
monitor the data and the detector in...
Machine Learning (known as Multi Variate Analysis) has been used somewhat in HEP since the nighties. If Boosted Decision Trees are now common place, there is now an explosion of novel algorithms following the « deep learning revolution » in industry, applicable to data taking, triggering and handling, reconstruction, simulation and analysis. This talk will review some of these algorithms and...
Initial studies have suggested generative adversarial networks (GANs) have promise as fast simulations within HEP. These studies, while promising, have been insufficiently precise and also, like GANs in general, suffer from stability issues. We apply GANs to to generate full particle physics events (not individual physics objects), and to large weak lensing cosmology convergence maps. We...
Machine learning methods are becoming ubiquitous across particle physics. However, the exploration of such techniques in low-latency environments like L1 trigger systems has only just begun. We present here a new software, based on High Level Synthesis (HLS), to generically port several kinds of network models (BDTs, DNNs, CNNs) into FPGA firmware. As a benchmark physics use case, we consider...
The design of readout electronics for the LAr calorimeters of the ATLAS detector to be operated at the future High-Luminosity LHC (HL-LHC) requires a detailed simulation of the full readout chain in order to find optimal solutions for the analog and digital processing of the detector signals. Due to the long duration of the LAr calorimeter pulses relative to the LHC bunch crossing time,...
In the last stages of data analysis, only order-of-magnitude computing speedups translate into increased human productivity, and only if they're not difficult to set up. Producing a plot in a second instead of an hour is life-changing, but not if it takes two hours to write the analysis code. Fortunately, HPC-inspired techniques can result in such large speedups, but unfortunately, they can be...
The ATLAS experiment is approaching mid-life: the long shutdown period (LS2) between LHC Runs 1 and 2 (ending in 2018) and the future collision data-taking of Runs 3 and 4 (starting in 2021). In advance of LS2, we have been assessing the future viability of existing computing infrastructure systems. This will permit changes to be implemented in time for Run 3. In systems with broad impact...
Scalable multithreading poses challenges to I/O, and the performance of a thread-safe I/O strategy
may depend upon many factors, including I/O latencies, whether tasks are CPU- or I/O-intensive, and thread count.
In a multithreaded framework, an I/O infrastructure must efficiently supply event data to and collect it from many threads processing multiple events in flight.
In particular,...
The CERN OpenStack Cloud provides over 200.000 CPU cores to run data processing analyses for the Large Hadron Collider (LHC) experiments. To deliver these services, with high performance and reliable service levels, while at the same time ensuring a continuous high resource utilization has been one of the major challenges for the CERN Cloud engineering team.
Several optimizations like...
The CERN OpenStack cloud has been delivering a wide variety of services to its 3000 customers since it entered in production in 2013. Initially, standard resources such a Virtual Machines and Block Storage were offered. Today, the cloud offering includes advanced features since as Container Orchestration (for Kubernetes, Docker Swarm mode, Mesos/DCOS clusters), File Shares and Bare Metal, and...
In spring 2018 the SuperKEKB electron-positron collider at High Energy Accelerator Research Organization (KEK, Tsukuba, Japan) will deliver its first collisions to the Belle II experiment. The aim of Belle II is to collect a data sample 50 times larger than the previous generation of B-Factories taking advantage of the unprecedented SuperKEKB design luminosity of 8x10^35 cm^-2 s^-1. The Belle...
The LHCb experiment is a fully instrumented forward spectrometer designed for
precision studies in the flavour sector of the standard model with proton-proton
collisions at the LHC. As part of its expanding physics programme, LHCb collected data also during the LHC proton-nucleus collisions in 2013 and 2016 and
during nucleus-nucleus collisions in 2015. All the collected datasets are...
The OSG has long maintained a central accounting system called Gratia. It uses small probes on each computing and storage resource in order to usage. The probes report to a central collector which stores the usage in a database. The database is then queried to generate reports. As the OSG aged, the size of the database grew very large. It became too large for the database technology to...
Hydra is a templatized header-only, C++11-compliant library for data analysis on massively parallel platforms targeting, but not limited to, the field High Energy Physics research.
Hydra supports the description of particle decays via the generation of phase-space Monte Carlo, generic function evaluation, data fitting, multidimensional adaptive numerical integration and histograming.
Hydra is...
The Belle II experiment at KEK is preparing for first collisions in early 2018. Processing the large amounts of data that will be produced requires conditions data to be readily available to systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. This was accomplished by relying on industry-standard tools and methods: the conditions database...
The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new...
HammerCloud is a testing service and framework to commission, run continuous tests or on-demand large-scale stress tests, and benchmark computing resources and components of various distributed systems with realistic full-chain experiment workflows.
HammerCloud, userd by the ATLAS and CMS experiments in production, has been a useful service to commission both compute resources and various...
Opticks is an open source project that integrates the NVIDIA OptiX
GPU ray tracing engine with Geant4 toolkit based simulations.
Massive parallelism brings drastic performance improvements with optical photon simulation speedup expected to exceed 1000 times Geant4 with workstation GPUs.
Optical physics processes of scattering, absorption, reemission and
boundary processes are implemented...
Many analyses on CMS are based on the histogram, used throughout the workflow from data validation studies to fits for physics results. Binned data frames are a generalisation of multidimensional histograms, in a tabular representation where histogram bins are denoted by category labels. Pandas is an industry-standard tool, providing a data frame implementation that allows easy access to "big...
The Simulation at Point1 (Sim@P1) project was built in 2013 to take advantage of the ATLAS Trigger and Data Acquisition High Level Trigger (HLT) farm. The HLT farm provides more than 2,000 compute nodes, which are critical to ATLAS during data taking. When ATLAS is not recording data, this large compute resource is used to generate and process simulation data for the experiment. The Sim@P1...
The calibration of the detector in almost real time is a key to the exploitation of the large data volumes at the LHC experiments. For this purpose the CMS collaboration deployed a complex machinery involving several components of the processing infrastructure and of the condition DB system. Accurate reconstruction of data start only once all the calibrations become available for consumption...
LHC experiments make extensive use of Web proxy caches, especially for software distribution via the CernVM File System and for conditions data via the Frontier Distributed Database Caching system. Since many jobs read the same data, cache hit rates are high and hence most of the traffic flows efficiently over Local Area Networks. However, it is not always possible to have local Web caches,...
An efficient and fast access to the detector description of the ATLAS experiment is needed for many tasks, at different steps of the data chain: from detector development to reconstruction, from simulation to data visualization. Until now, the detector description was only accessible through dedicated services integrated into the experiment's software framework, or by the usage of external...
The primary goal of the online cluster of the Compart Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is to build event data from the detector and to select interesting collisions in the High Level Trigger (HLT) farm for offline storage. With more than 1100 nodes and a capactity of about 600 kHEPSpec06, the HLT machines represent up to 40% of the combined Tier0/Tier-1...
Since the beginning of the LHC Run 2 in 2016 the CMS data processing framework, CMSSW, has been running with multiple threads during production of data and simulation via the use of Intel's Thread Building Blocks (TBB) library. The TBB library utilizes tasks as concurrent units of work. CMS used these tasks to allow both concurrent processing of events as well as concurrent running of modules...
A key ingredient of the data taking strategy used by the LHCb experiment in Run-II is the novel real-time detector alignment and calibration. Data collected at the start of the fill are processed within minutes and used to update the alignment, while the calibration constants are evaluated hourly. This is one of the key elements which allow the reconstruction quality of the software trigger in...
We report developments for the Geant4 electromagnetic (EM) physics sub-packages for Geant4 release 10.4 and beyond. Modifications are introduced to the models of photo-electric effect, bremsstrahlung, gamma conversion, and multiple scattering. Important developments for calorimetry applications were carried out for the modeling of single and multiple scattering of charged particles....
The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception.
First class analysis tools need to be provided to physicists which are easy to use, exploit the bleeding edge hardware technologies and allow to seamlessly express parallelism.
This contribution...
A new event data format has been designed and prototyped by the CMS collaboration to satisfy the needs of a large fraction of Physics Analyses (at least 50%) with a per event size of order 1 Kb. This new format is more than a factor 20x smaller than the MINIAOD format and contains only top level information typically used in the last steps of the analysis. The talk will review the current...
The ALICE experiment at the Large Hadron Collider (LHC) at CERN is planned to be operated in a continuous data-taking mode in Run 3.This will allow to inspect data from all collisions at a rate of 50 kHz for Pb-Pb, giving access to rare physics signals embedded into a large background.
Based on experience with real-time reconstruction of particle trajectories and event properties in the ALICE...
As the development of cloud computing, more and more clouds are widely applied in the high-energy physics fields. OpenStack is generally considered as the future of cloud computing. However in OpenStack, the resource allocation model assigns a fixed number of resources to each group. It is not very suitable for scientific computing such as high energy physics applications whose demands of...
The development of the GeantV Electromagnetic (EM) physics package has evolved following two necessary paths towards code modernization. A first phase required the revision of the main electromagnetic physics models and their implementation. The main objectives were to improve their accuracy, extend them to the new high-energy frontiers posed by the Future Circular Collider (FCC) programme and...
In high energy physics experiments, silicon detectors are often subjected to a harsh radiation environment, specially at hadronic colliders. Understanding the impact of radiation damage on the detector performance is an indispensable prerequisite for a successful operation throughout the lifetime of the planned experiment.
A dedicated irradiation programme followed by detailed studies with...
The WLCG Information System (IS) is an important component of the huge heterogeneous distributed infrastructure. Considering the evolution of LHC computing towards high luminosity era and analyzing experience accumulated by the computing operations teams and limitations of the current information system, the WLCG IS evolution task force came up with the proposal to develop Computing Resource...
To improve hardware utilization and save man power in system management, we have migrated most of the web services in our institute (Institute of High Energy Physics, IHEP) to a private cloud build upon OpenStack since last few years. However, cyber security attacks becomes a serious threats to the cloud progressively. Therefore, a detection and monitoring system for cyber security threats is...
The HEP community is preparing for the LHC’s Run 3 and 4. One of the big challenges for physics analysis will be developing tools to efficiently express an analysis and able to efficiently process the x10 more data expected. Recently, interest has focused on declarative analysis languages: a way of specifying a physicists’ intent, and leaving everything else to the underlying system. The...
Beijing Spectrometer (BESIII) experiment has produced hundreds of billions of events. It has collected the world's largest data samples of J/ψ, ψ(3686), ψ(3770) andψ(4040) decays. The typical branching fractions for interesting physics channels are of the order of O(10^-3). The traditional event-wise accessing of BOSS (Bes Offline Software System) is not effective for the selective accessing...
The CMS experiment dedicates a significant effort to supervise the quality of its data, online and offline. A real-time data quality (DQ) monitoring is in place to spot and diagnose problems as promptly as possible to avoid data loss. The evaluation a posteriori of processed data is designed to categorize the data in term of their usability for physics analysis. These activities produce DQ...
Within the field of dark matter direct detection, there has been very little penetration of machine learning. This is primarily due to the difficulty of modeling such low-energy detectors for training sets (the keV energies are $10^{-10}$ smaller than LHC). Xenon detectors have been leading the field of dark matter direct detection for the last decade. The current front runner is XENON1T,...
During 2017 LHCb developed the ability to interrupt Monte Carlo
simulation jobs and cause them to finish cleanly with the events
simulated so far correctly uploaded to grid storage. We explain
how this functionality is supported in the Gaudi framework and handled
by the LHCb simulation framework Gauss. By extending DIRAC, we have been
able to trigger these interruptions when running...
Majority of currently planned or considered hadron colliders are expected to deliver data in collisions with hundreds of simultaneous interactions per beam bunch crossing on average, including the high luminosity LHC upgrade currently in preparation and the possible high energy LHC upgrade or a future circular collider FCC-hh. Running of charged particle track reconstruction for the general...
The design and performance of the ATLAS Inner Detector (ID) trigger
algorithms running online on the High Level Trigger (HLT) processor
farm for 13 TeV LHC collision data with high pileup are discussed.
The HLT ID tracking is a vital component in all physics signatures
in the ATLAS trigger for the precise selection of the rare or
interesting events necessary for physics analysis...
The multi-purpose R$^{3}$B (Reactions with Relativistic Radioactive Beams) detector at the future FAIR facility in Darmstadt will be used for various experiments with exotic beams in inverse kinematics. The two-fold setup will serve for particle identification and momentum measurement up- and downstream the secondary reaction target. In order to perform a high-precision charge identification...
The CERN ATLAS experiment grid workflow system manages routinely 250 to
500 thousand concurrently running production and analysis jobs
to process simulation and detector data. In total more than 300 PB
of data is distributed over more than 150 sites in the WLCG.
At this scale small improvements in the software and computing
performance and workflows can lead to significant resource usage...
The life cycle of the scientific data is well defined: data is collected, then processed,
archived and finally deleted. Data is never modified. The original data is used or new,
derived data is produced: Write Once Read Many times (WORM). With this model in
mind, dCache was designed to handle immutable files as efficiently as possible. Currently,
data replication, HSM connectivity and...
The Cherenkov Telescope Array (CTA) is the next generation of ground-based gamma-ray telescopes for gamma-ray astronomy. Two arrays will be deployed composed of 19 telescopes in the Northern hemisphere and 99 telescopes in the Southern hemisphere. Observatory operations are planned to start in 2021 but first data from prototypes should be available already in 2019. Due to its very high...
Track reconstruction at the CMS experiment uses the Combinatorial Kalman Filter. The algorithm computation time scales exponentially with pile-up, which will pose a problem for the High Level Trigger at the High Luminosity LHC. FPGAs, which are already used extensively in hardware triggers, are becoming more widely used for compute acceleration. With a combination of high perfor- mance, energy...
In order to profit from the largely increased instantaneous luminosity provided by the accelerator in Run III (2021-2023), the upgraded LHCb detector will make usage of a fully software based trigger, with a real-time event reconstruction and selection performed at the bunch crossing rate of the LHC (~30 MHz). This assumption implies much tighter timing constraints for the event reconstruction...
Hundreds of physicists analyse data collected by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) using the CMS Remote Analysis builder (CRAB) and the CMS GlideinWMS global pool to exploit the resources of the World LHC Computing Grid. Efficient use of such an extensive and expensive resource is crucial. At the same time the CMS collaboration is committed on...
With the accumulation of large datasets at energy of 13 TeV, the LHC experiments can search for rare processes, where the extraction of the signal from the copious and varying Standard Model backgrounds poses increasing challenges. Techniques based on machine learning promise to achieve optimal search sensitivity and signal-to-background ratios for such searches. Taking the search for the...
The High-Luminosity Large Hadron Collider (HL-LHC) at CERN will be characterized by higher event rate, greater pileup of events, and higher occupancy. Event reconstruction will therefore become far more computationally demanding, and given recent technology trends, the extra processing capacity will need to come from expanding the parallel capabilities in the tracking software. Existing...
In preparation for Run 3 of the LHC, scheduled to start in 2021, the ATLAS
experiment is revising its offline software so as to better take advantage
of machines with many cores. A major part of this effort is migrating the
software to run as a fully multithreaded application, as this has been
shown to significantly improve the memory scaling behavior. This talk will
outline changes made to...
CERN has been using ITIL Service Management methodologies and ServiceNow since early 2011. Initially a joint project between just the Information Technology and the General Services Departments, now most of CERN is using this common methodology and tool, and all departments are represented totally or partially in the CERN Service Catalogue.
We will present a summary of the current situation...
Boosted Decision Trees are used extensively in offline analysis and reconstruction in high energy physics. The computation time of ensemble inference has previously prohibited their use in online reconstruction, whether at the software or hardware level. An implementation of BDT inference for FPGAs, targeting low latency by leveraging the platform’s enormous parallelism, is presented. Full...
ATLAS Distributed Computing (ADC) uses the pilot model to submit jobs to Grid computing resources. This model isolates the resource from the workload management system (WMS) and helps to avoid running jobs on faulty resources. A minor side-effect of this isolation is that the faulty resources are neglected and not brought back into production because the problems are not visible to the WMS. In...
Data Quality Assurance (QA) is an important aspect of every High-Energy Physics experiment, especially in the case of the ALICE Experiment at the Large Hadron Collider (LHC) whose detectors are extremely sophisticated and complex devices. To avoid processing low quality or redundant data, human experts are currently involved in assessing the detectors’ health during the collisions’ recording....
In early 2018, e+e- collisions of the SuperKEKB B-Factory will be recorded by the Belle II detector in Tsukuba (Japan) for the first time. The new accelerator and detector represent a major upgrade from the previous Belle experiment and will achieve a 40-times higher instantaneous luminosity. Special considerations and challenges arise for track reconstruction at Belle II due to multiple...
PANDA is one of the main experiments of the future FAIR accelerator facility at Darmstadt. It utilizes an anti-proton beam with a momentum up to 15 GeV/c on a fixed proton or nuclear target to investigate the features of strong QCD.
The reconstruction of charged particle tracks is one of the most challenging aspects in the online and offline reconstruction of the data taken by PANDA. Several...
In this paper, we present micro-services framework to develop data processing applications.
We discuss functional decomposition strategies that help transitioning of existing data processing applications into a micro-services environment. We will also demonstrate advantages and disadvantages of this framework in terms of operational elasticity, vertical and horizontal scalability,...
Data from B-physics experiments at the KEKB collider have a substantial background from $e^{+}e^{-}\to q \bar{q}$ events. To suppress this we employ deep neural network algorithms. These provide improved signal from background discrimination. However, the neural network develops a substantial correlation with the $\Delta E$ kinematic variable used to distinguish signal from background in the...
The ATLAS Fast TracKer (FTK) is a hardware based track finder for the ATLAS trigger infrastructure currently under installation and commissioning. FTK sits between the two layers of the current ATLAS trigger system, the hardware-based Level 1 Trigger and the CPU-based High-Level Trigger (HLT). It will provide full-event tracking to the HLT with a design latency of 100 µs at a 100 kHz event...
CERNBox is the CERN cloud storage hub. It allows synchronising and sharing files on all major desktop and mobile platforms (Linux, Windows, MacOSX, Android, iOS) aiming to provide universal access and offline availability to any data stored in the CERN EOS infrastructure.
With more than 12000 users registered in the system, CERNBox has responded to the high demand in our diverse community to...
Prometheus is a leading open source monitoring and alerting tool. Prometheus also utilizes a pull model, in the sense is pulls metrics from monitored entities, rather than receives them as a push. But sometimes this can be a major headache, even without security in mind, when performing network gymnastics to reach your monitored entities. Not only that, but sometimes system metrics might be...
The modern security landscape for distributed computing in High Energy Physics (HEP) includes a wide range of threats employing different attack vectors. The nature of these threats is such that the most effective method for dealing with them is to work collaboratively, both within the HEP community and with partners further afield - these can, and should, include institutional and campus...
Experimental science often has to cope with systematic errors that coherently bias data. We analyze this issue on the analysis of data produced by experiments of the Large Hadron Collider at CERN as a case of supervised domain adaptation. The dataset used is a representative Higgs to tau tau analysis from ATLAS and released as part of the Kaggle Higgs ML challenge. Perturbations have been...
The Belle II experiment is ready to take data in 2018, studying e+e- collisions at the KEK facility in Tsukuba (Japan), in a center of mass energy range of the Bottomonium states. The tracking system includes a combination of hit measurements coming from the vertex detector, made of pixel detectors and double-sided silicon strip detectors, and a central drift chamber, inside a solenoid of 1.5...
This presentation discusses some of the metrics used in HEP and other scientific domains for evaluating the relative quality of binary classifiers that are built using modern machine learning techniques. The use of the area under the ROC curve, which is common practice in the evaluation of diagnostic accuracy in the medical field and has now become widespread in many HEP applications, is...
In the last few years we have been seeing constant interest for technologies providing effective cloud storage for scientific use, matching the requirements of price, privacy and scientific usability. This interest is not limited to HEP and extends out to other scientific fields due to the fast data increase: for example, "big data" is a characteristic of modern genomics, energy and financial...
With the explosion of the number of distributed applications, a new dynamic server environment emerged grouping servers into clusters, which utilization depends on the current demand for the application.
To provide reliable and smooth services it is crucial to detect and fix possible erratic behavior of individual servers in these clusters. Use of standard techniques for this purpose delivers...
CMS offline event reconstruction algorithms cover simulated and acquired data processing starting from the detector raw data on input and providing high level reconstructed objects suitable for analysis. The landscape of supported data types and detector configuration scenarios has been expanding and covers the past and expected future configurations including proton-proton collisions and...
The Cherenkov Telescope Array (CTA), currently under construction, is the next- generation instrument in the field of very high energy gamma-ray astronomy. The first data are expected by the end of 2018, while the scientific operations will start in 2022 for a duration of about 30 years. In order to characterise the instrument response to the Cherenkov light emitted by atmospheric cosmic ray...
The application of deep learning techniques using convolutional neu-
ral networks to the classification of particle collisions in High Energy Physics is
explored. An intuitive approach to transform physical variables, like momenta of
particles and jets, into a single image that captures the relevant information, is
proposed. The idea is tested using a well known deep learning framework on a...
The European Open Science Cloud (EOSC) aims to enable trusted access to services and the re-use of shared scientific data across disciplinary, social and geographical borders. The EOSC-hub will realise the EOSC infrastructure as an ecosystem of research e-Infrastructures leveraging existing national and European investments in digital research infrastructures. EGI Check-in and EUDAT B2ACCESS...
Numerical stability is not only critical to the correctness of scientific computations, but also has a direct impact on their software efficiency as it affects the convergence of iterative methods and the available choices of floating-point precision.
Verrou is a Valgrind-based tool which challenges the stability of floating-point code by injecting random rounding errors in computations (a...
Development of the JANA multi-threaded event processing framework began in 2005. It’s primary application has been for GlueX, a major Nuclear Physics experiment at Jefferson Lab. Production data taking began in 2016 and JANA has been highly successful in analyzing that data on the JLab computing farm. Work has now begun on JANA2, a near complete rewrite emphasizing features targeted for large...
The Alpha Magnetic Spectrometer (AMS) is a high energy physics experiment installed and operating on board of the International Space Station (ISS) from May 2011 and expected to last through Year 2024 and beyond. The Science Operation Centre is in charge of the offline computing for the AMS experiment, including flight data production, Monte-Carlo simulation, data management, data backup, etc....
We have entered the Noisy Intermediate-Scale Quantum Era. A plethora of quantum processor prototypes allow evaluation of potential of the Quantum Computing paradigm in applications to pressing computational problems of the future. Growing data input rates and detector resolution foreseen in High-Energy LHC (2030s) experiments expose the often high time and/or space complexity of classical...
In position-sensitive detectors with segmented readout (pixels or strips), charged particles activate in general several adjacent read-out channels. The first step in the reconstruction of the hit position is thus to identify clusters of active channels associated to one particle crossing the detector. In conventionally triggered systems, where the association of raw data to events is given by...
The Historic Data Quality Monitor (HDQM) of the CMS experiment is a framework developed by the Tracker group of the CMS collaboration that permits a web-based monitoring of the time evolution of measurements ( S/N ratio, cluster size etc) in the Tracker silicon micro-strip and pixel detectors. It addition, it provides a flexible way for the implementation of HDQM to the other detector systems...
The offline software framework of the ATLAS experiment (Athena) consists of many small components of various types like Algorithm, Tool or Service. To assemble these components into an executable application for event processing, a dedicated configuration step is necessary. The configuration of a particular job depends on the workflow (simulation, reconstruction, high-level trigger, overlay,...
Muon reconstruction is currently all done offline for ALICE. In Run3 this is supposed to move online, with ALICE running in continuous readout with a minimum bias Pb-Pb interaction rate of 50kHz.
There are numerous obstacles to getting the muon software to achieve the required performance, with the muon cluster finder being replaced and moved to run on a GPU inside the new O2 computing...
We introduce SWiF - Simplified Workload-intuitive Framework - a workload-centric, application programming framework designed to simplify the large-scale deployment of FPGAs in end-to-end applications. SWiF intelligently mediates access to shared resources by orchestrating the distribution and scheduling of tasks across a heterogeneous mix of FPGA and CPU resources in order to improve...
The LHC experiments produce petabytes of data each year, which must be stored, processed and analyzed. This requires a significant amount of storage and computing resources. In addition to that, the requirements to these resources are increasing over the years, at each LHC running period.
In order to predict the resource usage requirements of the ALICE Experiment for a particular LHC Run...
The Message Queue architecture is an asynchronous communication scheme that provides an attractive solution for certain scenarios in the distributed computing model. The introduction of the intermediate component (queue) in-between the interacting processes, allows to decouple the end-points making the system more flexible and providing high scalability and redundancy. The message queue...
The GridKa Tier 1 data and computing center hosts a significant share of WLCG processing resources. Providing these resources to all major LHC and other VOs requires an efficient, scalable and reliable cluster management. To satisfy this, GridKa has recently migrated its batch resources from CREAM-CE and PBS to ARC-CE and HTCondor. This contribution discusses the key highlights of the adoption...
Modern workload management systems that are responsible for central data production and processing in High Energy and Nuclear Physics experiments have highly complicated architectures and require a specialized control service for resource and processing components balancing. Such a service represents a comprehensive set of analytical tools, management utilities and monitoring views aimed at...
Abstract:
ALICE (A Large Ion Collider Experiment) is one of the four big experiments at the Large Hadron Collider (LHC). For ALICE Run 3 there will be a major upgrade for several detectors as well as the compute infrastructure with a combined Online-Offline computing system (O2) to support continuous readout at much higher data rates than before (3TB/s). The ALICE Time Projection Chamber...
AlphaTwirl is a python library that loops over event data and summarizes them into multi-dimensional categorical (binned) data as data frames. Event data, input to AlphaTwirl, are data with one entry (or row) for one event: for example, data in ROOT TTree with one entry per collision event of an LHC experiment. Event data are often large -- too large to be loaded in memory -- because they have...
The ATLAS experiment records data from the proton-proton collisions produced by the Large Hadron Collider (LHC). The Tile Calorimeter is the hadronic sampling calorimeter of ATLAS in the region |eta| < 1.7. It uses iron absorbers and scintillators as active material. Jointly with the other calorimeters it is designed for reconstruction of hadrons, jets, tau-particles and missing transverse...
Many areas of academic research are increasingly catching up with the LHC experiments
when it comes to data volumes, and just as in particle physics they require large data sets to be moved between analysis locations.
The LHC experiments have built a global e-Infrastructure in order to handle hundreds of
petabytes of data and massive compute requirements. Yet, there is nothing particle physics...
We investigate novel approaches using Deep Learning (DL) for efficient execution of workflows on distributed resources. Specifically, we studied the use of DL for job performance prediction, performance classification, and anomaly detection to improve the utilization of the computing resources.
- Performance prediction:
- capture performance of workflows on multiple resources
-...
The ATLAS Distributed Computing (ADC) Project is responsible for the off-line processing of data produced by the ATLAS experiment at the Large Hadron Collider (LHC) at CERN. It facilitates data and workload management for ATLAS computing on the Worldwide LHC Computing Grid (WLCG).
ADC Central Services operations (CSops)is a vital part of ADC, responsible for the deployment and configuration...
PanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by a central Physics Coordination group based on the organization’s calendar. The Physics...
PowerPC and high performance computers (HPC) are important resources for computing in the ATLAS experiment. The future LHC data processing will require more resources than Grid computing, currently using approximately 100,000 cores at well over 100 sites, can provide. Supercomputers are extremely powerful as they use resources of hundreds of thousands CPUs joined together. However their...
The Czech national HPC center IT4Innovations located in Ostrava provides two HPC systems, Anselm and Salomon. The Salomon HPC is amongst the hundred most powerful supercomputers on Earth since its commissioning in 2015. Both clusters were tested for usage by the ATLAS experiment for running simulation jobs. Several thousand core hours were allocated to the project for tests, but the main aim...
In 2018 the Belle II detector will begin collecting data from $e^+e^-$ collisions at the SuperKEKB electron-positron collider at the High Energy Accelerator Research Organization (KEK, Tsukuba, Japan). Belle II aims to collect a data sample 50 times larger than the previous generation of B-Factories, taking advantage of the SuperKEKB design luminosity of $8\times10^{35} cm^{-2} s^{-1}$.
It is...
One of the main challenges the CMS collaboration must overcome during the phase-2 upgrade is the radiation damage to the detectors from the high integrated luminosity of the LHC and the very high pileup. The LHC will produce collisions at a rate of about 5x10^9/s. The particles emerging from these collisions and the radioactivity they induce will cause significant damage to the detectors and...
CERN's current Backup and Archive Service hosts 11 PB of data in more than 2.1 billion files. We have over 500 clients which back up or restore an average of 80 TB of data each day. At the current growth rate, we expect to have about 13 PB by the end of 2018.
In this contribution we present CERN's Backup and Archive Service based on IBM Spectrum Protect (previously known as Tivoli Storage...
The SHiP experiment is new general purpose fixed target experiment designed to complement collider experiments in the search for new physics. A 400 GeV/c proton beam from the CERN SPS will be dumped on a dense target to accumulate $2\times10^{20}$ protons on target in five years.
A crucial part of the experiment is the active muon shield, which allows the detector to operate at a very high...
Over the last seven years the software stack of the next generation B factory experiment Belle II has grown to over one million lines of C++ and Python code, counting only the part included in offline software releases. This software is used by many physicists for their analysis, many of which will be students with no prior experience in HEP software. A beginner-friendly and up-to-date...
Several data samples from a Belle II experiment will be available to the general public as a part of experiment outreach activities. Belle2Lab is designed as an interactive graphical user interface to reconstructed particles, offering users basic particle selection tools. The tool is based on a Blockly JavaScript graphical code generator and can be run in a HTML5 capable browser. It allows...
I describe a novel interactive virtual reality visualization of subatomic particle physics, designed as an educational tool for learning about and exploring the subatomic particle collision events of the Belle II experiment. The visualization is designed for untethered, locomotive virtual reality, allowing multiple simultaneous users to walk naturally through a virtual model of the Belle II...
Tape is an excellent choice for archival storage because of the capacity, cost per GB and long retention intervals, but its main drawback is the slow access time due to the nature of sequential medium. Modern enterprise tape drives now support Recommended Access Ordering (RAO), which is designed to improve recall/retrieval times.
BNL's mass storage system currently holds more than 100 PB of...
AMI (ATLAS Metadata Interface) is a generic ecosystem for metadata
aggregation, transformation and cataloguing. Often, it is interesting
to share up-to-date metadata with other content services such as wikis.
Here, we describe the cross-domain solution implemented in the AMI Web
Framework: a system of embeddable controls, communicating with the
central AMI service and based on the AJAX and...
The analysis and understanding of resources utilization in shared infrastructures, such as cloud environments, is crucial in order to provide better performance, administration and capacity planning.
The management of resource usage of the OpenStack-based cloud infrastructures hosted at INFN-Padova, the Cloud Area Padovana and the INFN-PADOVA-STACK instance of the EGI Federated Cloud, started...
CERN Document Server (CDS, cds.cern.ch) is the CERN Institutional Repository based on the Invenio open source digital repository framework. It is a heterogeneous repository, containing more than 2 million records, including research publications, audiovisual material, images, and the CERN archives. Its mission is to store and preserve all the content produced at CERN as well as to make it...
The observation of neutrino oscillations provides evidence of physics beyond the Standard Model, and the precise measurement of those oscillations remains an essential goal for the field of particle physics. The NOvA experiment is a long-baseline neutrino experiment composed of two finely-segmented liquid-scintillator detectors located off-axis from the NuMI muon-neutrino beam having as its...
LHC Run2 began in April 2015. With the restart of the collisions in the CERN Large Hadron Collider. In the perspective of the offline event reconstruction, the most relevant detector updates appeared in 2017: they were the restructuring of the pixel detector, with an additional layer closer to the beams, and the improved photodetectors and readout chips for the hadron calorimeter, which will...
The central production system of CMS is utilizing the LHC grid and effectively about 200 thousand cores, over about a hundred computing centers worldwide. Such a wide and unique distributed computing system is bound to sustain a certain rate of failures of various types. These are appropriately addressed with site administrators a posteriori. With up to 50 different campaigns ongoing...
The LHC delivers an unprecedented number of proton-proton collisions
to its experiments. In kinematic regimes first studied by earlier
generations of collider experiments, the limiting factor to more
deeply probing for new physics can be the online and offline
computing, and offline storage, requirements for the recording and
analysis of this data. In this contribution, we describe a...
In preparation for Run 3 of the LHC, the ATLAS experiment is migrating
its offline software to use a multithreaded framework, which will allow
multiple events to be processed simultaneously. This implies that the
handling of non-event, time-dependent (conditions) data,
such as calibrations and geometry, must also be extended to allow
for multiple versions of such data to exist...
Containerization is a lightweight form of virtualization that allows reproducibility and isolation responding to a number of long standing use cases in running the ATLAS software on the grid. The development of Singularity in particular with the capability to run as a standalone executable allows for containers to be integrated in the ATLAS (and other experiments) submission framework....
Foundational software libraries such as ROOT are under intense pressure to avoid software regression, including performance regressions. Continuous performance benchmarking, as a part of continuous integration and other code quality testing, is an industry best-practice to understand how the performance of a software product evolves over time. We present a framework, built from industry best...
The LHC has planned a series of upgrades culminating in the High Luminosity LHC (HL-LHC) which will have an average luminosity 5-7 times larger than the design LHC value. The Tile Calorimeter (TileCal) is the hadronic sampling calorimeter installed in the central region of the ATLAS detector. It uses iron absorbers and scintillators as active material. TileCal will undergo a substantial...
The DUNE Collaboration is pursuing an experimental program (named protoDUNE)
which involves a beam test of two large-scale prototypes of the DUNE Far Detector
at CERN in 2018. The volume of data to be collected by the protoDUNE-SP (the single-phase detector) will amount to a few petabytes and the sustained rate of data sent to mass
storage will be in the range of a few hundred MB per second....
Since the current data infrastructure of the HEP experiments is based on gridftp, most computing centres have adapted and based their own access to the data on the X.509. This is an issue for smaller experiments who do not have the resources to train their researchers about the complexities of X.509 certificates and who clearly would prefer an approach based on username/password.
On the...
Various sites providing storage for experiments in high energy particle physics and photon science deploy dCache as flexible and modern large scale storage system. As such, dCache is a complex and elaborated software framework, which needs a test driven development in order to ensure a smooth and bug-free release cycle. So far, tests for dCache are performed on dedicated hosts emulating the...
The Dynamic Deployment System (DDS) is a tool-set that automates and significantly simplifies a deployment of user-defined processes and their dependencies on any resource management system (RMS) using a given topology. DDS is a part of the ALFA framework.
A number of basic concepts are taken into account in DDS. DDS implements a single responsibility principle command line tool-set and API....
The Belle II detector will begin its data taking phase in 2018. Featuring a state of the art vertex detector with innovative pixel sensors, it will record collisions of e+e- beams from the SuperKEKB accelerator which is slated to provide luminosities 40x higher than KEKB.
This large amount of data will come at the price of an increased beam background, as well as an operating point providing...
CMS Tier 3 centers, frequently located at universities, play an important role in the physics analysis of CMS data. Although different computing resources are often available at universities, meeting all requirements to deploy a valid Tier 3 able to run CMS workflows can be challenging in certain scenarios. For instance, providing the right operating system (OS) with access to the CERNVM File...
We investigate the automatic deployment and scaling of grid infrastructure components as virtual machines in OpenStack. To optimize the CVMFS usage per hypervisor, we study different approaches to share CVMFS caches and cache VMs between multiple client VMs.\newline
For monitoring, we study container solutions and extend these to monitor non-containerized applications within cgroups resource...
Beside their increasing complexity and variety of provided resources and services, large data-centers nowadays often belong to a distributed network and need non-conventional monitoring tools. This contribution describes the implementation of a monitoring system able to provide active support for problem solving to the system administrators.
The key components are information collection and...
There is a growing need to incorporate sustainable software practices into High Energy Physics. Widely supported tools offering source code management, continuous integration, unit testing and software quality assurance can greatly help improve standards. However, for resource-limited projects there is an understandable inertia in deviating effort to cover systems maintenance and application...
The Standard Model in particle physics is refined. However, new physics beyond the Standard Model, such as dark matter, requires thousand to million times of simulation events compared to those of the Standard Model. Thus, the development of software is required, especially for the development of simulation tool kits. In addition, computing is evolving. It requires the development of the...
The experience gained in several years of storage system administration has shown that the WLCG distributed grid infrastructure is very performing for the needs of the LHC experiments. However, an excessive number of storage sites leads to inefficiencies in the system administration because of the needs of having experienced manpower in each site and of the increased burden on the central...
Replicability and efficiency of data processing on the same data samples are a major challenge for the analysis of data produced by HEP experiments. High-level data analyzed by end-users are typically produced as a subset of the whole experiment data sample to study interesting selection of data (streams). For standard applications, streams may be eventually copied from servers and analyzed on...
The work is devoted to the result of the creating a first module of the data processing center at the Joint Institute for Nuclear Research for modeling and processing experiments. The issues related to handling the enormous data flow from the LHC experimental installations and troubles of distributed storages are considered. The article presents a hierarchical diagram of the network farm and a...
WLCG, a Grid computing technology used by CERN researchers, is based on two kinds of middleware. One of them, UMD middleware, is widely used in many European research groups to build a grid computing environment. The most widely used system in the UMD middleware environment was the combination of CREAM-CE and the batch job manager "torque". In recent years, however, there have been many...
Transfer Time To Complete (T³C) is a new extension for the data management system Rucio that allows to make predictions about the duration of a file transfer. The extension has a modular architecture which allows to make predictions based on simple to more sophisticated models, depending on available data and computation power. The ability to predict file transfer times with reasonable...
CNAF is the national center of INFN for IT services. The Tier-1 data center operated at CNAF provides computing and storage resources mainly to scientific communities such as those working on the four LHC experiments and 30 more experiments in which INFN is involved.
In past years, every CNAF departments used to choose their preferred tools for monitoring, accounting and alerting. In...
VISPA (Visual Physics Analysis) is a web-platform that enables users to work on any SSH reachable resource using just their web-browser. It is used successfully in research and education for HEP data analysis.
The emerging JupyterLab is an ideal choice for a comprehensive, browser-based, and extensible work environment and we seek to unify it with the efforts of the VISPA project. The primary...
The CERN IT Communication Systems group is in charge of providing various wired and wireless based communication services across the laboratory. Among them, the group designs, installs and manages a large complex of networks: external connectivity, data-centre network (deserving central services and the WLCG), campus network (providing connectivity to users on site), and last but not least...
We describe how the Blackett facility at the University of Manchester
High Energy Physics group has been extended to provide Docker container and
cloud platforms as part of the UKT0 initiative. We show how these new
technologies can be managed using the facility's existing fabric
management based on Puppet and Foreman. We explain how use of the
facility has evolved beyond its origins as a WLCG...
A key aspect of pilot-based grid operations are the pilot (glidein) factories. A proper and efficient use of any central blocks in the grid infrastructure is for operations inevitable, and glideinWMS factories are not the exception. The monitoring package for glideinWMS factory monitoring was originally developed when the factories were serving a couple of VO’s and tens of sites. Nowadays with...
A small Cloud infrastructure for scientific computing likely operates in a saturated regime, which imposes constraints to free applications’ auto-scaling. Tenants typically pay a priori for a fraction of the overall resources. Within this business model, an advanced scheduling strategy is needed in order to optimize the data centre occupancy.
FaSS, a Fair Share Scheduler service for...
IceCube is a cubic kilometer neutrino detector located at the south pole. IceCube’s simulation and production processing requirements far exceed the number of available CPUs and GPUs in house. Collaboration members commit resources in the form of cluster time at institutions around the world. IceCube also signs up for allocations from large clusters in the United States like XSEDE. All of...
During the next major shutdown from 2019-2021, the ATLAS experiment at the LHC at CERN will adopt the Front-End Link eXchange (FELIX) system as the interface between the data acquisition, detector control and TTC (Timing, Trigger and Control) systems and new or updated trigger and detector front-end electronics. FELIX will function as a router between custom serial links from front end ASICs...
Fermilab is developing the Frontier Experiments RegistRY (FERRY) service that provides a centralized repository for the access control and job management attributes such as batch and storage access policies, quotas, batch priorities and NIS attributes for cluster configuration. This paper describes FERRY architecture, deployment and integration with services that consume the stored...
IceCube is a cubic kilometer neutrino detector located at the south pole. Data are processed and filtered in a data center at the south pole. After transfer to a data warehouse in the north, data are further refined through multiple levels of selection and reconstruction to reach analysis samples. So far, the production and curation of these analysis samples has been handled in an ad-hoc way...
g4tools is a collection of pure header classes intended to be a technical low level layer of the analysis category introduced in Geant4 release 9.5 to help Geant4 users to manage their histograms and ntuples in various file formats. In g4tools bundled with the latest Geant4 release (10.4, December 2017), we introduced a new HDF5 IO driver for histograms and column wise paged ntuples as well as...
Efficient handling of large data-volumes becomes a necessity in today's world. It is driven by desire to get more insight from the data and to gain a better understanding of user trends which can be transformed into economic incentives (profits, cost-reduction and various optimization of data workflows and pipelines). In this talk we discuss how modern technologies are transforming a well...
One of the key factors for the successful development of a physics Monte-Carlo is the ability to properly organize regression testing and validation. Geant4, a world-standard toolkit for HEP detector simulation, is one such example that requires thorough validation. The CERN/SFT group, which contributes to the development, testing, deployment and support of the toolkit, is also responsible for...
Central Exclusive Production (CEP) is a class of diffractional processes studied at the Large Hadron Collider, that offers a very clean experimental environment for probing the low energy regime of Quantum Chromodynamics.
As any other analyses in High Energy Physics, it requires a large amount of simulated Monte Carlo data, that is usually created by means of the so-called MC event generators....
A user : with PAW I had the impression to do physics, with ROOT I have the impression to type C++. Then why not returning to do physics?! We will present how gopaw is done, especially putting accent on its portability, its way to handle multiple file formats (including ROOT/IO and HDF5), its unified graphics based on the inlib/sg scene graph manager (see CHEP 2013 for softinex) and its...
ATLAS has developed and previously presented a new computing architecture, the Event Service, that allows real time delivery of fine grained workloads which process
dispatched events (or event ranges) and immediately streams outputs.
The principal aim was to profit from opportunistic resources such as commercial
cloud, supercomputing, and volunteer computing, and otherwise unused cycles on...
The long standing problem of reconciling the cosmological evidence of the existence of dark matter with the lack of any clear experimental observation of it, has recently revived the idea that the new particles are not directly connected with the Standard Model gauge fields, but only through mediator fields or "portals", connecting our world with new "secluded" or "hidden" sectors. One of the...
The LAN and WAN development of DE-KIT will be shown from the very beginning to the current status. DE-KIT is the German Tier-1 center collaborating with the Large Hadron Collider (LHC) at CERN. This includes the local area network capacity level ramp up from 10Gbps over 40 Gbps to 100 Gbps as well as the wide area connections. It will be demonstrated how the deployed setup serves the current...
The higher energy and luminosity from the LHC in Run2 has put increased pressure on CMS computing resources. Extrapolating to even higher luminosities (and thus higher event complexities and trigger rates) beyond Run3, it becomes clear that simply scaling up the the current model of CMS computing alone will become economically unfeasible. High Performance Computing (HPC) facilities, widely...
Software is an essential component of the experiments in High Energy Physics. Due to the fact that it is upgraded on relatively short timescales, software provides flexibility, but at the same time is susceptible to issues introduced during development process, which enforces systematic testing. We present recent improvements to LHCbPR, the framework implemented at LHCb to measure physics and...
HammerCloud is a framework to commission, test, and benchmark ATLAS computing resources and components of various distributed systems with realistic full-chain experiment workflows. HammerCloud contributes to ATLAS Distributed Computing (ADC) Operations and automation efforts, providing the automated resource exclusion and recovery tools, that help re-focus operational manpower to areas which...
Scheduling multi-core workflows in a global HTCondor pool is a multi-dimensional problem whose solution depends on the requirements of the job payloads, the characteristics of available resources, and the boundary conditions such as fair share and prioritization imposed on the job matching to resources. Within the context of a dedicated task force, CMS has increased significantly the...
Over 8000 Windows PCs are actively used on the CERN site for tasks ranging from controlling the accelerator facilities to processing invoices. PCs are managed through CERN's Computer Management Framework and Group Policies, with configurations deployed based on machine sets and a lot of autonomy left to the end-users. While the generic central configuration works well for the majority of the...
The online farm of the ATLAS experiment at the LHC, consisting of
nearly 4000 PCs with various characteristics, provides configuration
and control of the detector and performs the collection, processing,
selection, and conveyance of event data from the front-end electronics
to mass storage.
Different aspects of the farm management are already accessible via
several tools. The status and...
Input data for applications that run in cloud computing centres can be stored at remote repositories, typically with multiple copies of the most popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. In this approach, the closest copy of the data is used based on geographical or other...
High-Performance Computing (HPC) and other research cluster computing resources provided by universities can be useful supplements to the collaboration’s own WLCG computing resources for data analysis and production of simulated event samples. The shared HPC cluster "NEMO" at the University of Freiburg has been made available to local ATLAS users through the provisioning of virtual machines...
The Information Technology department at CERN has been using ITIL Service Management methodologies and ServiceNow since early 2011. In recent years, several developments have been accomplished regarding the data centre and service monitoring, as well as status management.
ServiceNow has been integrated with the data centre monitoring infrastructure, via GNI (General Notification...
IceCube is a cubic kilometer neutrino detector located at the south pole. Data handling has been managed by three separate applications: JADE, JADE North, and JADE Long Term Archve (JADE-LTA). JADE3 is the new version of JADE that merges these diverse data handling applications into a configurable data handling pipeline (“LEGO® Block JADE”). The reconfigurability of JADE3 has enabled...
The new version of JSROOT provides full implementation of the ROOT binary I/O, now including TTree. Powerful JSROOT.TreeDraw functionality provides a simple way to inspect complex data in web browsers directly, without need to involve ROOT-based code.
JSROOT is now fully integrated into Node.js environment. Without binding to any C++ code, one get direct access to all kinds of ROOT data....
Prometheus is a leading open source monitoring and alerting tool. Prometheus's local storage is limited in its scalability and durability, but it integrates very well with other solutions which provide us with robust long term storage. This talk will cover two solutions which interface excellently and do not require us to deal with HBase - KairosDB and Chronix. Intended audience are people who...
The ATLAS EventIndex has been in operation since the beginning of LHC Run 2 in 2015. Like all software projects, its components have been constantly evolving and improving in performance. The main data store in Hadoop, based on MapFiles and HBase, can work for the rest of Run 2 but new solutions are explored for the future. Kudu offers an interesting environment, with a mixture of BigData and...
In the past, several scaling tests have been performed on the HTCondor batch system regarding its job scheduling capabilities. In this talk we report on a first set of scalability measurements of the file transfer capabilities of the HTCondor batch system. Motivated by the GLUEX experiment needs we evaluate the limits and possible use of HTCondor as a solution to transport the output of jobs...
The design of the CMS detector is specially optimized for muon measurements and includes gas-ionization detector technologies to make up the muon system. Cathode strip chambers (CSC) with both tracking and triggering capabilities are installed in the forward region. The first stage of muon reconstruction deals with information from within individual muon chambers and is thus called local...
In the last few years the European Union has launched several initiatives aiming to support the development of an European-based HPC industrial/academic eco-system made of scientific and data analysis application experts, software developers and computer technology providers. In this framework the ExaNeSt and EuroExa projects respectively funded in H2020 research framework programs call...
One of the most important aspects of data processing at LHC experiments is the particle identification (PID) algorithm. In LHCb, several different sub-detector systems provide PID information: the Ring Imaging Cherenkov detectors, the hadronic and electromagnetic calorimeters, and the muon chambers. The charged PID based on the sub-detectors response is considered as a machine learning problem...
Current computing paradigms often involve concepts like microservices, containerisation and, of course, Cloud Computing.
Scientific computing facilities, however, are usually conservatively managed through plain batch systems and as such can cater to a limited range of use cases. On the other side, scientific computing needs are in general orthogonal to each other in several dimensions.
We...
In the latest years, CNAF worked at a project of Long Term Data Preservation (LTDP) for the CDF experiment, that ran at Fermilab after 1985. A part of this project has the goal of archiving data produced during Run I into recent and reliable storage devices, in order to preserve their availability for further access through not obsolete technologies. In this paper, we report and explain the...
The CMS muon system presently consists of three detector technologies equipping different regions of the spectrometer. Drift Tube chambers (DT) are installed in the muon system barrel, while Cathode Strip Chambers (CSC) cover the end-caps; both serve as tracking and triggering detectors. Moreover, Resistive Plate Chambers (RPC) complement DT and CSC in barrel and end-caps respectively and are...
The Cloud Area Padovana (CAP) is, since 2014, a scientific IaaS cloud, spread across two different sites: the INFN Padova Unit and the INFN Legnaro National Labs. It provides about 1100 logical cores and 50 TB of storage. The entire computing facility, owned by INFN, satisfies the computational and storage demands of more than 100 users afferent to about 30 research projects, mainly related to...
Most supercomputers provide computing resources that are shared between users and projects, with utilization determined by predefined policies, load and quotas. The efficiency of the utilization of resources in terms of user/project depends on factors such as particular supercomputer policy and dynamic workload of supercomputer based on users' activities. The load on a resource is...
The CMS muon system presently consists of three detector technologies equipping different regions of the spectrometer. Drift Tube chambers (DT) are installed in the muon system barrel, while Cathode Strip Chambers (CSC) cover the end-caps; both serve as tracking and triggering detectors. Moreover, Resistive Plate Chambers (RPC) complement DT and CSC in barrel and end-caps respectively and are...
At the start of 2017, GridPP deployed VacMon, a new monitoring system
suitable for recording and visualising the usage of virtual machines and
containers at multiple sites. The system uses short JSON messages
transmitted by logical machine lifecycle managers such as Vac and
Vcycle. These are directed to a VacMon logging service which records the
messages in an ElasticSearch database. The...
We want to propose here a smooth migration plan for ROOT in order to have for 2040 at least and last an acceptable histogram class (a goal clearly not stated in the HSF common white paper for HL-LHC for 2020), but also to have a solid rock basement at this time for good part of this toolkit (IO, plotting, graphics, UI, math, etc...). The proposal is going to be technical because centred on a...
Starting with Upgrade 1 in 2021, LHCb will move to a purely software-based trigger system. Therefore, the new trigger strategy is to process events at the full rate of 30MHz. Given that the increase of CPU performance has slowed down in recent years, the predicted performance of the software trigger currently falls short of the necessary 30MHz throughput. To cope with this shortfall, LHCb's...
Muons with high momentum -- above 500 GeV/c -- are an important constituent of new physics signatures in many models. Run-2 of the LHC is greatly increasing ATLAS's sensitivity to such signatures thanks to an ever-larger dataset of such particles. The ATLAS Muon Spectrometer chamber alignment contributes significantly to the uncertainty of the reconstruction of these high-momentum objects. The...
Gas Electron Multiplier (GEM) based detectors have been used in many applications since their introduction in 1997. Large areas of GEM are foreseen in several experiments such as the future upgrade of the CMS muon detection system, where triple GEM based detectors will be installed and operated. During the assembly and operation, GEM foils are stretched in order to keep the vertical distance...
Various workflows used by ATLAS Distributed Computing (ADC) are now using object stores as a convenient storage resource via boto S3 libraries. The load and performance requirement varies widely across the different workflows and for heavier cases it has been useful to understand the limits of the underlying object store implementation. This work describes the performance of various object...
The CBM experiment is a future fixed-target experiment at FAIR/GSI (Darmstadt, Germany). It is being designed to study heavy-ion collisions at extremely high interaction rates of up to 10 MHz. Therefore, the experiment will use a very novel concept of data processing based on free streaming triggerless front-end electronics. In CBM time-stamped data will be collected into a readout buffer in a...
In view of the LHC Run3 starting in 2021, the ALICE experiment is preparing a major upgrade including the construction of an entirely new inner silicon tracker (the Inner Tracking System) and a complete renewal of its Online and Offline systems (O²).
In this context, one of the requirements for a prompt calibration of external detectors and a fast offline data processing is to run online the...
The upcoming LHC Run 3 brings new challenges for the ALICE online reconstruction which will be used also for the offline data processing in the O2 (combined Online-Offline) framework. To improve the accuracy of the existing online algorithms they need to be enhanced with all the necessary offline features, while still satisfying speed requirements of the synchronous data processing.
Here we...
We describe the central operation of the ATLAS distributed computing system. The majority of compute intensive activities within ATLAS are carried out on some 350,000 CPU cores on the Grid, augmented by opportunistic usage of significant HPC and volunteer resources. The increasing scale, and challenging new payloads, demand fine-tuning of operational procedures together with timely...
The University of Adelaide has invested several million dollars in the Phoenix HPC facility. Phoenix features a large number of GPUs, which were
critical to its entry in the June 2016 Top500 supercomputing list. The status of high performance computing in Australia relative to other nations
poses a unique challenge to researchers, in particular those involved in computationally intensive...
Since the start of 2017, the RAL Tier-1’s Echo object store has been providing disk storage to the LHC experiments. Echo provides access via both the GridFTP and XRootD protocols. GridFTP is primarily used for WAN transfers between sites while XRootD is used for data analysis.
Object stores and those using erasure coding in particular are designed to efficiently serve entire objects which...
The processing of ATLAS event data requires access to conditions data which is stored in database systems. This data includes, for example alignment, calibration, and configuration information which may be characterized by large volumes, diverse content, and/or information which evolves over time as refinements are made in those conditions. Additional layers of complexity are added by the...
The LLVM community advances its C++ Modules technology providing an io-efficient, on-disk code representation capable of reducing build times and peak memory usage. Significant amount of efforts were invested in teaching ROOT and its toolchain to operate with clang's implementation of the C++ Modules. Currently, C++ Modules files are used by: cling to avoid header re-parsing; rootcling to...
Lattice QCD (LQCD) is a well-established non-perturbative approach to solving the quantum chromodynamics (QCD) theory of quarks and gluons. It is understood that future LQCD calculations will require exascale computing capacities and workload management system (WMS) in order to manage them efficiently.
In this talk we will discuss the use of the PanDA WMS for LQCD simulations. The PanDA WMS...
With the planned addition of the tracking information in the Level 1 trigger in CMS for the HL-LHC, the algorithms for Level 1 trigger can be completely reconceptualized. Following the example for offline reconstruction in CMS to use complementary subsystem information and mitigate pileup, we explore the feasibility of using Particle Flow-like and pileup per particle identification techniques...
A core component of particle tracking algorithms in LHC experiments is the Kalman Filter. Its capability to iteratively model dynamics (linear or non-linear) in noisy data makes it powerful for state estimation and extrapolation in a combinatorial track builder (the CKF). In practice, the CKF computational cost scales quadratically with the detector occupancy and will become a heavy burden on...
The Alpha Magnetic Spectrometer (AMS) is a high energy physics experiment installed and operating on board of the International Space Station (ISS) from May 2011 and expected to last through Year 2024 and beyond. More than 50 million of CPU hours has been delivered for AMS Monte Carlo simulations using NERSC and ALCF facilities in 2017. The details of porting of the AMS software to the 2nd...
Systems of linear algebraic equations (SLEs) with heptadiagonal (HD), pentadiagonal (PD) and tridiagonal (TD) coefficient matrices arise in many scientific problems. Three symbolic algorithms for solving SLEs with HD, PD and TD coefficient matrices are considered. The only assumption on the coefficient matrix is nonsingularity. These algorithms are implemented using the GiNaC library of C++...
The accurate calculation of the power usage effectiveness (PUE) is the most important factor when trying to analyse the overall efficiency of the power consumption in a big data center. In the INFN CNAF Tier-1, a new monitoring infrastructure as Building Management System (BMS) was implemented during the last years using the Schneider StruxureWare Building Operation (SBO) software. During this...
Despite their frequent use, the hadronic models implemented in Geant4 have shown severe limitations in reproducing the measured yield of secondaries in ions interaction below 100 MeV/A, in term of production rates, angular and energy distributions [1,2,3]. We will present a benchmark of the Geant4 models with double-differential cross section and angular distributions of the secondary...
The CMS experiment has an HTCondor Global Pool, composed of more than 200K CPU cores available for Monte Carlo production and the analysis of data. The submission of user jobs to this pool is handled by either CRAB3, the standard workflow management tool used by CMS users to submit analysis jobs requiring event processing of large amounts of data, or by CMS Connect, a service focused on final...
ALFA is a modern software framework for simulation, reconstruction and analysis of particle physics experiments. ALFA provides building blocks for highly parallelized processing pipelines required by the next generation of experiments, e.g. the upgraded ALICE detector or the FAIR experiments. The FairMQ library in ALFA provides the means to easily create actors (so-called devices) that...
GlideinWMS is a workload management system that allows different scientific communities, or Virtual Organizations (VO), to share computing resources distributed over independent sites. A dynamically sized pool of resources is created by different VO-independent glideinWMS pilot factories, based on the requests made by the several VO-dependant glideinWMS frontends. For example, the CMS VO...
This work describes the technique of remote data access from computational jobs on the ATLAS data grid. In comparison to traditional data movement and stage-in approaches it is well suited for data transfers which are asynchronous with respect to the job execution. Hence, it can be used for optimization of data access patterns based on various policies. In this study, remote data access is...
At IHEP, computing resources are contributed by different experiments including BES, JUNO, DYW, HXMT, etc. The resources were divided into different partitions to satisfy the dedicated experiment data processing requirements. IHEP had a local torque maui cluster with 50 queues serving for above 10 experiments. The separated resource partitions leaded to resource imbalance load. Sometimes, BES...
CERN IT department is providing production services to run container technologies. Given that, the IT-DB team, responsible to run the Java based platforms, has started a new project to move the WebLogic deployments from virtual or bare metal servers to containers: Docker together with Kubernetes allow us to improve the overall productivity of the team, reducing operations time and speeding up...
Tier-1 for CMS was created in JINR in 2015. It is important to keep an eye on the Tier-1 center all the time in order to maintain its performance. The one monitoring system is based on Nagios: it monitors the center on the several levels: engineering infrastructure, network and hardware. It collects many metrics, creates plots and determines some statuses like HDD state, temperatures, loads...
The Italian Tier1 center is mainly focused on LHC and physics experiments in general. Recently we tried to widen our area of activity and established a collaboration with the University of Bologna to set-up an area inside our computing center for hosting expriments with high demands of security and privacy requirements on stored data. The first experiment we are going to host is Harmony, a...
The high data rates expected for the next generation of particle physics experiments (e.g.: new experiments at FAIR/GSI and the upgrade of CERN experiments) call for dedicated attention with respect to design of the needed computing infrastructure. The common ALICE-FAIR framework ALFA is a modern software layer, that serves as a platform for simulation, reconstruction and analysis of particle...
SHiP is a new proposed fixed-target experiment at the CERN SPS accelerator. The goal of the experiment is to search for hidden particles predicted by models of Hidden Sectors. The purpose of the SHiP Spectrometer Tracker is to reconstruct tracks of charged particles from the decay of neutral New Physics objects with high efficiency. Efficiency of the track reconstruction depends on the...
Full MC simulation is a powerful tool for designing new detectors and guide the construction of new prototypes.
Improved micro-structure technology has lead to the rise of Micro-Pattern Gas Detectors (MPGDs), with main features: fexible geometry; high rate capability; excellent spatial resolution; and reduced radiation length. A new detector layout, the Fast Timing MPGD (FTM), could combine...
Software is an essential and rapidly evolving component of modern high energy physics research. The ability to be agile and take advantage of new and updated packages from the wider data science community is allowing physicists to efficiently utilise the data available to them. However, these packages often introduce complex dependency chains and evolve rapidly introducing specific, and...
Since the beginning of the WLCG Project the Spanish ATLAS computer centres have contributed with reliable and stable resources as well as personnel for the ATLAS Collaboration.
Our contribution to the ATLAS Tier2s and Tier1s computing resources (disk and CPUs) in the last 10 years has been around 5%, even though the Spanish contribution to the ATLAS detector construction as well as the number...
Track finding procedure is one of the key steps of events reconstruction in high energy physics experiments. Track finding algorithms combine hits into tracks and reconstruct trajectories of particles flying through the detector. The tracking procedure is considered as an extremely time consuming task because of large combinatorics. Thus, calculation speed is crucial in heavy ion experiments,...
SPT-3G, the third generation camera on the South Pole Telescope (SPT), was deployed in the 2016-2017 Austral summer season. The SPT is a 10-meter telescope located at the geographic South Pole and designed for observations in the millimeter-wave and submillimeter-wave regions of the electromagnetic spectrum. The SPT is primarily used to study the Cosmic Microwave Background (CMB). The upgraded...
The ATLAS experiment is operated daily by many users and experts working concurrently on several aspects of the detector.
The safe and optimal access to the various software and hardware resources of the experiment is guaranteed by a role-based access control system (RBAC) provided by the ATLAS Trigger and Data Acquisition (TDAQ) system. The roles are defined by an inheritance hierarchy....
Events containing muons in the final state are an important signature
for many analyses being carried out at the Large Hadron Collider
(LHC), including both standard model measurements and searches for new
physics. To be able to study such events, it is required to have an
efficient and well-understood muon trigger. The ATLAS muon trigger
consists of a hardware based system (Level 1), as well...
The Online Luminosity software of the ATLAS experiment has been upgraded in the last two years to improve scalability, robustness, and redundancy and to increase automation keeping Run-3 requirements in mind.
The software package is responsible for computing the instantaneous and integrated luminosity for particle collisions at the ATLAS interaction point at the Large Hadron Collider (LHC)....
The ATLAS experiment records about 1 kHz of physics collisions, starting from an LHC design bunch crossing rate of 40 MHz. To reduce the large background rate while maintaining a high selection efficiency for rare and Beyond-the-Standard-Model physics events, a two-level trigger system is used.
Events are selected based on physics signatures, such as the presence
of energetic leptons,...
Physics analyses at the LHC require accurate simulations of the detector response and the event selection processes. The accurate simulation of the trigger response is crucial for determining the overall selection efficiencies and signal sensitivities. For the generation and reconstruction of simulated event data, the most recent software releases are used to ensure the best agreement between...
In HEP experiments at LHC the database applications often become complex by reflecting the ever demanding requirements of the researchers. The ATLAS experiment has several Oracle DB clusters with over 216 database schemes each with its own set of database objects. To effectively monitor them, we designed a modern and portable application with exceptionally good characteristics. Some of them...
University ITMO (ifmo.ru) is developing the cloud of geographically distributed data
centers. The “geographically distributed” means data centers (DC) located in
different places far from each other by hundred or thousand kilometers.
Usage of the geographically distributed data centers promises a number of advantages
for end users such as opportunity to add additional DC and service...
In January 2017, a consortium of European companies, research labs, universities, and education networks started the “Up to University” project (Up2U). Up2U is a 3-year EU-funded project that aims at creating a bridge between high schools and higher education. Up2U addresses both the technological and methodological gaps between secondary school and higher education by (a.) provisioning the...
A tape system usually comprises lots of tape drives, several thousand or even tens of thousands of cartridges, robots, software applications and machines which are running these applications. All involved components are able to log failures and statistical data. However, correlation is a laborious and ambiguous process and a wrong interpretation can easily result in a wrong decision. A single...
The GridKa center is serving the ALICE, ATLAS, CMS, LHCb and Belle-II experiments as one of the biggest WLCG Tier-1 centers world wide with compute and storage resources. It is operated by the Steinbuch Centre for Computing at Karlsruhe Institute of Technology in Germany. In this presentation, we will describe the current status of the compute, online and offline storage resources and we will...
Online selection is an essential step to collect the most interesting collisions among a very large number of events delivered by the ATLAS detector at the Large Hadron Collider (LHC). The Fast TracKer (FTK) is a hardware based track finder, for the ATLAS trigger system, that rapidly identifies important physics processes through their track-based signatures, in the Inner Detector pixel and...
The INFN scientific computing infrastructure is composed of more than 30 sites, ranging from CNAF (Tier-1 for LHC and main data center for nearly 30 other experiments) and 9 LHC Tier-2s to ~20 smaller sites, including LHC Tier-3s and not-LHC experiment farms.
A comprehensive review of the installed resources, together with plans for the near future, has been collected during the second half of...
The Production and Distributed Analysis system (PanDA) is a pilot-based workload management system that was originally designed for the ATLAS Experiment at the LHC to operate on grid sites. Since the coming LHC data taking runs will require more resources than grid computing alone can provide, the various LHC experiments are engaged in an ambitious program to extend the computing model to...
Modern experiments demand a powerful and efficient Data Acquisition System (DAQ). The intelligent, FPGA-based Data Acquisition System (iFDAQ) of the COMPASS experiment at CERN is composed of many processes communicating between each other. The DIALOG library covers a communication mechanism between processes and establishes a communication layer to each of them. It has been introduced to the...
JAliEn (Java-AliEn) is the ALICE’s next generation Grid framework which will be used for the top-level distributed computing resources management during the LHC Run3 and onward. While preserving an interface familiar to the ALICE users, its performance and scalability are an order of magnitude better than the currently used system.
To enhance the JAliEn security, we have developed the...
Cloud computing became a routine tool for scientists in many domains. The JINR cloud infrastructure provides JINR users computational resources for performing various scientific calculations. In order to speed up achievements of scientific results the JINR cloud service for parallel applications was developed. It consists of several components and implements a flexible and modular architecture...
The ProtoDUNE-SP is a single-phase liquid argon time projection chamber (LArTPC) prototype for the Deep Underground Neutrino Experiment (DUNE). Signals from 15,360 electronic channels are received by 60 Reconfigurable Cluster Elements (RCEs), which are processing elements designed at SLAC for a wide range of applications and are based upon the "system-on-chip” Xilinx Zynq family of FPGAs....
The HEP community has voted strongly with its feet to adopt ROOT as the current de facto analysis toolkit. It is used to write out and store our RAW data, our reconstructed data, and to drive our analysis. Almost all modern data models in particle physics are written in ROOT. New tools in industry have are making appearance in particle physics analysis, however, driven by the massive interest...
Up until September 2017 LHCb Online was running on Puppet 3.5 Master/Server non redundant architecture. As a result, we had problem with outages, both planned and unplanned, as well as with scalability issues (How do you run 3000 nodes at the same time? How do you even run 100 without bringing down the Puppet Master). On top of that Puppet 5.0 was released, so we were running now 2 versions...
I describe the charged-track extrapolation and muon-identification modules in the Belle II data-analysis code framework (basf2). These modules use GEANT4E to extrapolate reconstructed charged tracks outward from the Belle II Central Drift Chamber into the outer particle-identification detectors, the electromagnetic calorimeter, and the K-long and muon detector (KLM). These modules propagate...
The Baryonic Matter at Nuclotron (BM@N) experiment represents the 1st phase of Nuclotron-based Ion Collider fAcility (NICA) Mega science project at the Joint Institute for Nuclear Research. It is a fixed target experiment built for studying nuclear matter in conditions of extreme density and temperature.
The tracking system of the BM@N experiment consists of three main detector systems:...
We describe the development of a tool (Trident) using a three pronged approach to analysing node utilisation while aiming to be user friendly. The three areas of focus are data IO, CPU core and memory.
Compute applications running in a batch system node will stress different parts of the node over time. It is usual to look at metrics such as CPU load average and memory consumed. However,...
One of the major challenges for the Compact Muon Solenoid (CMS) experiment, is the task of reducing event rate from roughly 40 MHz down to a more manageable 1 kHz while keeping as many interesting physics events as possible. This is accomplished through the use of a Level-1 (L1) hardware based trigger as well as a software based High-Level-Trigger (HLT). Monitoring and understanding the output...
Hadronic signatures are critical to the ATLAS physics program, and are used extensively for both Standard Model measurements and searches for new physics. These signatures include generic quark and gluon jets, as well as jets originating from b-quarks or the decay of massive particles (such as electroweak bosons or top quarks). Additionally, missing transverse momentum from non-interacting...
The ATLAS Distributed Computing system uses the Frontier system to access the Conditions, Trigger, and Geometry database data stored in the Oracle Offline Database at CERN by means of the http protocol. All ATLAS computing sites use squid web proxies to cache the data, greatly reducing the load on the Frontier servers and the databases. One feature of the Frontier client is that in the event...
We report status of the CMS full simulation for run-2. Initially, Geant4 10.0p02 was used in sequential mode, about 16 billion events were produced for analysis of 2015-2016 data. In 2017, the CMS detector was updated: new tracking pixel detector is installed, hadronic calorimeter electronics is modified, and extra muon detectors are added. Corresponding modifications were introduced in the...
CVMFS helps ATLAS in distributing software to the Grid, and isolating software lookup to batch nodes’ local filesystems. But CVMFS is rarely available in HPC environments. ATLAS computing has experimented with "fat" containers, and later developed an environment to produce such containers for both Shifter and Singularity. The fat containers include most of the recent ATLAS software releases,...
The Queen Mary University of London Grid site has investigated the use of its' Lustre file system to support Hadoop work flows using the newly open sourced Hadoop adaptor for Lustre. Lustre is an open source, POSIX compatible, clustered file system often used in high performance computing clusters and, is often paired with the SLURM batch system as it is at Queen Mary. Hadoop is an open-source...
The use of machine learning techniques for classification is well established. They are applied widely to improve the signal-to-noise ratio and the sensitivity of searches for new physics at colliders. In this study I explore the use of machine learning for optimizing the output of high precision experiments by selecting the most sensitive variables to the quantity being measured. The precise...
Containers are becoming ubiquitous within the WLCG with CMS announcing a requirement for Singularity at supporting sites in 2018. The ubiquity of containers means it is now possible to reify configuration along with applications as a single easy to deploy unit rather than via a myriad of configuration management tools such as Puppet, Ansible or Salt. This allows more use of industry devops...
ZFS is a powerful storage management technology combining filesystem, volume management and software raid technology into a single solution. The WLCG Tier2 computing at Edinburgh was an early adopter of ZFS on Linux, with this technology being used to manage all of our storage systems including servers with aging components. Our experiences of ZFS deployment have been shared with the Grid...
Built upon the Xrootd Proxy Cache (Xcache), we developed additional features to adapt the ATLAS distributed computing and data environment, especially its data management system Rucio, to help improve the cache hit rate, as well as features that make the Xcache easy to use, similar to the way the Squid cache is used by the HTTP protocol. We packaged the software in CVMFS and in singularity...
XRootD is distributed low-latency file access system with its own communication protocol and scalable, plugin based architecture. It is the primary data access framework for the high-energy physics community, and the backbone of the EOS service at CERN.
In order to bring the potential of Erasure Coding (EC) to the XrootD / EOS ecosystem an effort has been undertaken to implement a native EC...
XRootD has been established as a standard for WAN data access in HEP and HENP. Site specific features, like those existing at GSI, have historically been hard to implement with native methods. XRootD allows a custom replacement of basic functionality for native XRootD functions through the use of plug-ins. XRootD clients allow this since version 4.0. In this contribution, our XRootD based...
The ATLAS and CMS experiments at CERN are planning a second phase of upgrades to prepare for the "High Luminosity LHC", with collisions due to start in 2026. In order to deliver an order of magnitude more data than previous runs, protons at 14 TeV center-of-mass energy will collide with an instantaneous luminosity of 7.5 x 10^34 cm^-2 s^-1, resulting in much higher pileup and data rates than...
Most HEP experiments coming in the next decade will have computing requirements that cannot be met by adding more hardware (HL-LHC, FAIR, DUNE...). A major software re-engineering is needed and more collaboration between experiments around software developments is
needed. This was the reason for setting up the HEP Software Foundation (HSF) in 2015. In 2017, the HSF published "A Roadmap for ...
DUNE will be the world's largest neutrino experiment due to take data in 2025. Here are described the data acquisition (DAQ) systems for both of its prototypes, ProtoDUNE single-phase (SP) and ProtoDUNE dual-phase (DP) - due to take data later this year. ProtoDUNE also breaks records as the largest beam test experiment yet constructed, and are the fundamental elements of CERN's Neutrino...
For two decades, ROOT brought its own window system abstraction (for X11, GL, Cocoa, and Windows) together with its own GUI library. X11 is nearing the end of its lifetime; new windowing systems shine with performance and features. To make best use of them, the ROOT team has decided to re-implement its graphics and GUI subsystem using web technology.
This presentation introduces the model,...
LHC Computing Grid was a pioneer integration effort, managed to unite computing and
storage resources all over the world, thus making them available to experiments on the Large Hadron Collider. During decade of LHC computing, Grid software has learned to effectively utilise different types of computing resources, such as classic computing clusters, clouds and hyper power computers. While the...
The upcoming PANDA at FAIR experiment in Darmstadt, Germany will belong to a new generation of accelerator-based experiments relying exclusively on software filters for data selection. Due to the likeness of signal and background as well as the multitude of investigated physics channels, this paradigm shift is driven by the need for having full and precise information from all detectors in...
An essential part of new physics searches at the Large Hadron Collider (LHC) at CERN involves event classification, or distinguishing signal events from the background. Current machine learning techniques accomplish this using traditional hand-engineered features like particle 4-momenta, motivated by our understanding of particle decay phenomenology. While such techniques have proven useful...
The HEP group at the University of Victoria operates a distributed cloud computing system for the ATLAS and Belle II experiments. The system uses private and commercial clouds in North America and Europe that run OpenStack, Open Nebula or commercial cloud software. It is critical that we record accounting information to give credit to cloud owners and to verify our use of commercial resources....
The Belle II detector is currently commissioned for operation in early 2018. It is designed to record collision events with an instantaneous luminosity of up to 8⋅10^35 cm−2*s−1 which is delivered by the SuperKEKB collider in Tsukuba, Japan. Such a large luminosity is required to significantly improve the precision on measurements of B and D mesons and Tau lepton decays to probe for signs of...
he ATLAS experiment is gradually transitioning from the traditional file-based processing model to dynamic workflow management at the event level with the ATLAS Event Service (AES). The AES assigns fine-grained processing jobs to workers and streams out the data in quasi-real time, ensuring fully efficient utilization of all resources, including the most volatile. The next major step in this...
Virtualization is a commonly used solution for utilizing the opportunistic computing resources in the HEP field, as it provides an unified software and OS layer that the HEP computing tasks require over the heterogeneous opportunistic computing resources. However there is always performance penalty with virtualization, especially for short jobs which are always the case for volunteer computing...
Alignment and calibration workflows in CMS require a significant operational effort, due to the complexity of the systems involved. To serve the variety of condition data management needs of the experiment, the alignment and calibration team has developed and deployed a set of web-based applications. The Condition DB Browser is the main portal to search, navigate and prepare a consistent set...
The divergence of windowing systems among modern Linux distributions and OSX is making the current mode of operations difficult to maintain. In order to continue support the CMS experiment event display, aka Fireworks, we need to explore other options beyond the current distribution model of centrally built tarballs.
We think that C++-server web-client event display is a promising direction...
Jet flavour identification is a fundamental component for the physics program of the LHC-based experiments. The presence of multiple flavours to be identified leads to a multiclass classification problem. We present results from a realistic simulation of the CMS detector, one of two multi-purpose detectors at the LHC, and the respective performance measured on data. Our tagger, named DeepJet,...
CMS has worked aggressively to make use of multi-core architectures, routinely running 4 to 8 core production jobs in 2017. The primary impediment to efficiently scaling beyond 8 cores has been our ROOT-based output module, which has been necessarily single threaded. In this presentation we explore the changes made to the CMS framework and our ROOT output module to overcome the previous...
The CERN ATLAS experiment successfully uses a worldwide
computing infrastructure to support the physics program during LHC
Run 2. The grid workflow system PanDA routinely manages 250 to
500 thousand concurrently running production and analysis jobs
to process simulation and detector data. In total more than 300 PB
of data is distributed over more than 150 sites in the WLCG and
handled by the...
The ATLAS Trigger system has been operating successfully during 2017, its excellent performance has been vital for the ATLAS physics program.
The trigger selection capabilities of the ATLAS detector have been significantly enhanced for Run-2 compared to Run-1, in order to cope with the higher event rates and with the large number of simultaneous interactions (pile-up). The improvements at...
This talk is about sharing our recent experiences in providing data analytics platform based on Apache Spark for High Energy Physics, CERN accelerator logging system and infrastructure monitoring. The Hadoop Service has started to expand its user base for researchers who want to perform analysis with big data technologies. Among many frameworks, Apache Spark is currently getting the most...
The Compressed Baryonic Matter (CBM) experiment at the future FAIR facility requires fast and efficient event reconstruction algorithms. CBM will be one of the first HEP experiments which works in a triggerless mode: data received in the DAQ from the detectors will not be associated with events by a hardware trigger anymore. All raw data within a given period of time will be collected...
The Cherenkov Telescope Array (CTA) is the next generation of ground-based gamma-ray telescopes for gamma-ray astronomy. Two arrays will be deployed composed of 19 telescopes in the Northern hemisphere and 99 telescopes in the Southern hemisphere. Observatory operations are planned to start in 2021 but first data from prototypes should be available already in 2019. Due to its very high...
IceCube is a cubic kilometer neutrino detector located at the south pole. IceProd is IceCube’s internal dataset management system, keeping track of where, when, and how jobs run. It schedules jobs from submitted datasets to HTCondor, keeping track of them at every stage of the lifecycle. Many updates have happened in the last years to improve stability and scalability, as well as increase...
For high-throughput computing the efficient use of distributed computing resources relies on an evenly distributed workload, which in turn requires wide availability of input data that is used in physics analysis. In ATLAS, the dynamic data placement agent C3PO was implemented in the ATLAS distributed data management system Rucio which identifies popular data and creates additional, transient...
The process of building software for High Energy Physics is a problem that all experiments must face. It is also an aspect of the technical management of HEP software that is highly suited to sharing knowledge and tools. For this reason the HEP Software Foundation established a working group in 2015 to look at packaging and deployment solutions in the HEP community. The group has examined in...
Measurements of time-dependent CP violation and of $B$-meson mixing at B-factories require a determination of the flavor of one of the two exclusively produced $B^0$ mesons. The predecessors of Belle II, the Belle and BaBar experiments, developed so-called flavor tagging algorithms for this task. However, due to the novel high-luminosity conditions and the increased beam-backgrounds at Belle...
Reconstruction and identification in calorimeters of modern High Energy Physics experiments is a complicated task. Solutions are usually driven by a priori knowledge about expected properties of reconstructed objects. Such an approach is also used to distinguish single photons in the electromagnetic calorimeter of the LHCb detector on LHC from overlapping photons produced from high momentum...
In 2017, NA62 recorded over a petabyte of raw data, collecting around a billion events per day of running. Data are collected in bursts of 3-5 seconds, producing output files of a few gigabytes. A typical run, a sequence of bursts with the same detector configuration and similar experimental conditions, contains 1500 bursts and constitutes the basic unit for offline data processing. A...
The ATLAS EventIndex currently runs in production in order to build a
complete catalogue of events for experiments with large amounts of data.
The current approach is to index all final produced data files at CERN Tier0,
and at hundreds of grid sites, with a distributed data collection architecture
using Object Stores to temporarily maintain the conveyed information, with
references to them...
SWAN (Service for Web-based ANalysis) is a CERN service that allows users to perform interactive data analysis in the cloud, in a "software as a service" model. It is built upon the widely-used Jupyter notebooks, allowing users to write - and run - their data analysis using only a web browser. By connecting to SWAN, users have immediate access to storage, software and computing resources that...
The High-Luminosity LHC will open an unprecedented window on the weak-scale nature of the universe, providing high-precision measurements of the standard model as well as searches for new physics beyond the standard model. Such precision measurements and searches require information-rich datasets with a statistical power that matches the high-luminosity provided by the Phase-2 upgrade of the...
Since its inception in 2010, the art event-based analysis framework and associated software have been delivered to client experiments using a Fermilab-originated system called UPS. Salient features valued by the community include installation without administration privileges, trivially-relocatable binary packages and the ability to use coherent sets of packages together (such as those...
The BESIII detector is a general purpose spectrometer located at BEPCII. BEPCII is a double ring $e^+e^-$ collider running at center of mass energies between 2.0 and 4.6 GeV and reached a peak luminosity of $1\times 10^{33}cm^{-2}s^{-1}$ at $\sqrt{s}$ =3770 MeV.
As an experiment in the high precision frontier of hadron physics, since 2009, BESIII has collected the world's largest data samples...
GridFTP transfers and the corresponding Grid Security Infrastructure (GSI)-based authentication and authorization system have been data transfer pillars of the Worldwide LHC Computing Grid (WLCG) for more than a decade. However, in 2017, the end of support for the Globus Toolkit - the reference platform for these technologies - was announced. This has reinvigorated and expanded efforts to...
In recent years the LHC delivered a record-breaking luminosity to the CMS experiment making it a challenge to successfully handle all the demands for the efficient Data and Monte Carlo processing. In the presentation we will review major issues managing such requests and how we were able to address them. Our main strategy relies on the increased automation and dynamic workload and data...
IceCube is a cubic kilometer neutrino detector located at the south pole. CVMFS is a key component to IceCube’s Distributed High Throughput Computing analytics workflow for sharing 500GB of software across datacenters worldwide. Building the IceCube software suite on CVMFS has historically been accomplished first by a long bash script, then by a more complex set of python scripts. We...
We present an implementation of the ATLAS High Level Trigger (HLT)
that provides parallel execution of trigger algorithms within the
ATLAS multi-threaded software framework, AthenaMT. This development
will enable the HLT to meet future challenges from the evolution of
computing hardware and upgrades of the Large Hadron Collider (LHC) and
ATLAS Detector. During the LHC data-taking period...
In recent years, public clouds have undergone a large transformation. Nowadays, cloud providers compete in delivery specialized scalable and fault tolerant services where resource management is completely on their side. Such computing model called serverless computing is very attractive for users who do not want to worry about OS level management, security patches and scaling resources.
Our...
We present a range of conceptual improvements and extensions to the popular
tuning tool "Professor".
Its core functionality remains the construction of multivariate analytic
approximations to an otherwise computationally expensive function. A typical
example would be histograms obtained from Monte-Carlo (MC) event generators for
standard model and new physics processes.
The fast Professor...
Outside the HEP computing ecosystem, it is vanishingly rare to encounter user X509 certificate authentication (and proxy certificates are even more rare). The web never widely adopted the user certificate model, but increasingly sees the need for federated identity services and distributed authorization. For example, Dropbox, Google and Box instead use bearer tokens issued via the OAuth2...
For over a decade, dCache.org has provided software which is used at more than 80 sites around the world, providing reliable services for WLCG experiments and others. This can be achieved only with a well established process starting from the whiteboard, where ideas are created, all the way through to packages, installed on the production systems. Since early 2013 we have moved to git as our...
We describe the CMS computing model for MC event generation, and technical integration and workflows for generator tools in CMS. We discuss the most commonly used generators, standard configurations, their event tunes, and the technical performance of these configurations for Run II as well as the needs for Run III.
We show how a novel network architecture based on Lorentz Invariance (and not much else) can be used to identify hadronically decaying top quarks. We compare its performance to alternative approaches, including convolutional neural networks, and find it to be very competitive.
We also demonstrate how this architecture can be extended to include tracking information and show its application to...
The HL-LHC will present enormous storage and computational demands, creating a total dataset of up to 200 Exabytes and requiring commensurate computing power to record, reconstruct, calibrate, and analyze these data. Addressing these needs for the HL-LHC will require innovative approaches to deliver the necessary processing and storage resources. The "blockchain" is a recent technology for...
Building, testing and deploying of coherent large software stacks is very challenging, in particular when they consist of the diverse set of packages required by the LHC experiments, the CERN Beams department and data analysis services such as SWAN. These software stacks include several packages (Grid middleware,Monte Carlo generators, Machine Learning tools, Python modules) all required for...
In the recent years, several studies have demonstrated the benefit of using deep learning to solve typical tasks related to high energy physics data taking and analysis. Building on these proofs of principle, many HEP experiments are now working on integrating Deep Learning into their workflows. The computation need for inference of a model once trained is rather modest and does not usually...
The Xenon Dark Matter experiment is looking for non baryonic particle Dark Matter in the universe. The demonstrator is a dual phase time projection chamber (TPC), filled with a target mass of ~2000 kg of ultra pure liquid xenon. The experimental setup is operated at the Laboratori Nazionali del Gran Sasso (LNGS).
We present here a full overview about the computing scheme for data distribution...
VecGeom is a multi-purpose geometry library targeting the optimisation of the 3D-solid's algorithms used extensively in particle transport and tracking applications. As a particular feature, the implementations of these algorithms are templated on the input data type and are explicitly vectorised using VecCore library in case of SIMD vector inputs. This provides additional performance for...
Experience to date indicates that the demand for computing resources in high energy physics shows a highly dynamic behaviour, while the provided resources by the WLCG remain static over the year. It has become evident that opportunistic resources such as High Performance Computing (HPC) centers and commercial clouds are very well suited to cover peak loads. However, the utilization of this...
The Production and Distributed Analysis (PanDA) system has been successfully used in the ATLAS experiment as a data-driven workload management system. The PanDA system has proven to be capable of operating at the Large Hadron Collider data processing scale over the last decade including the Run 1 and Run 2 data taking periods. PanDA was originally designed to be weakly coupled with the WLCG...
In the field of High Energy Physics, the simulation of the interaction of particles in the material of calorimeters is a computing intensive task, even more so with complex and fined grained detectors. The complete and most accurate simulation of particle/matter interaction is primordial while calibrating and understanding the detector at the very low level, but is seldomly required at physics...
This paper is dedicated to the current state of the Geometry Database (Geometry DB) for the CBM experiment. The geometry DB is an information system that supports the CBM geometry. The main aims of Geometry DB are to provide storage of the CBM geometry, convenient tools for managing the geometry modules assembling various versions of the CBM setup as a combination of geometry modules and...
Measurements in LArTPC neutrino detectors feature high fidelity and result in large event images. Deep learning techniques have been extremely successful in classification tasks of photographs, but their application to LArTPC event images is challenging, due to the large size of the events; two orders of magnitude larger than images found in classical challenges like MNIST or ImageNet. This...
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events of 2 MB at a rate of 100 kHz. The event builder collects event fragments from about 740 sources and assembles them into complete events which are then handed to the high-level trigger (HLT) processes running on O(1000) computers. The aging event-building hardware will be replaced...
A goal of LSST (Large Synoptic Survey Telescope) project is to conduct a 10-year survey of the sky that is expected to deliver 200 petabytes of data after it begins full science operations in 2022. The project will address some of the most pressing questions about the structure and evolution of the universe and the objects in it. It will require a large amount of simulations to understand the...
The ROOT Mathematical and Statistical libraries have been recently improved to facilitate the modelling of parametric functions that can be used for performing maximum likelihood fits to data sets to estimate parameters and their uncertainties.
We report here on the new functionality of the ROOT TFormula and TF1 classes to build these models in a convenient way for the users. We show how...
The LHCb physics software has to support the analysis of data taken up to now and at the same time is under active development in preparation for the detector upgrade coming into operation in 2021. A continuous integration system is therefore crucial to maintain the quality of the ~6 millions of lines of C++ and Python, to ensure consistent builds of the software as well as to run the unit and...
Data acquisition (DAQ) systems for high energy physics experiments readout data from a large number of electronic components, typically over thousands of point to point links. They are thus inherently distributed systems. Traditionally, an important stage in the data acquisition chain has always been the so called event building: data fragments coming from different sensors are identified as...
ATLAS is embarking on a project to multithread its reconstruction software in time for use in Run 3 of the LHC. One component that must be migrated is the histogramming infrastructure used for data quality monitoring of the reconstructed data. This poses unique challenges due to its large memory footprint which forms a bottleneck for parallelization and the need to accommodate relatively...
The ALICE computing model for Run3 foresees few big centres, called Analysis Facilities, optimised for fast processing of large local sets of Analysis Object Data (AODs). Contrary to the current running of analysis trains on the Grid, this will allow for more efficient execution of inherently I/O-bound jobs. GSI will host one of these centres and has therefore finalised a first Analysis...
ALICE (A Large Ion Collider Experiment), one of the large LHC experiments, is undergoing a major upgrade during the next long shutdown. Increase in data rates planned for LHC Run3 (3TiB/s for Pb-Pb collisions) with triggerless continuous readout operation requires a paradigm shift in computing and networking infrastructure.
The new ALICE O2 (online-offline) computing facility consists of two...
Analyses of multi-million event datasets are natural candidates to exploit the massive parallelisation available on GPUs. This contribution presents two such approaches to measure CP violation and the corresponding user experience.
The first is the energy test, which is used to search for CP violation in the phase-space distribution of multi-body hadron decays. The method relies on a...
The LHCb Performance Regression (LHCbPR) framework allows for periodic software testing to be performed in a reproducible manner.
LHCbPR provides a JavaScript based web front-end service, built atop industry standard tools such as AngularJS, Bootstrap and Django (https://lblhcbpr.cern.ch).
This framework records the evolution of tests over time allowing for this data to be extracted for...
The Jiangmen Underground Neutrino Observatory (JUNO) is a multipurpose neutrino experiment which will start in 2020. To fasten JUNO data processing over multicore hardware, the JUNO software framework is introducing parallelization based on TBB. To support JUNO multicore simulation and reconstruction jobs in the near future, a new workload scheduling model has to be explored and implemented in...
The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, will start taking physics data in early 2018 and aims to accumulate 50/ab, or approximately 50 times more data than the Belle experiment. The collaboration expects it will manage and process approximately 200 PB of data.
Computing at this scale requires efficient and coordinated use of the compute grids in North America,...
The Data Quality Monitoring Software is a central tool in the CMS experiment. It is used in the following key environments: 1) Online, for real-time detector monitoring; 2) Offline, for the prompt-offline-feedback and final fine-grained data quality analysis and certification; 3) Validation of all the reconstruction software production releases; 4) Validation in Monte Carlo productions. Though...
Even as grid middleware and analysis software has matured over the course of the LHC's lifetime it is still challenging for non-specialized computing centers to contribute resources. Many U.S. CMS collaborators would like to set up Tier-3 sites to contribute campus resources for the use of their local CMS group as well as the collaboration at large, but find the administrative burden of...
This paper describes the current architecture of Continuous Integration (CI) service developed at Fermilab, encountered successes and difficulties, as well as future development plans. Current experiment code has hundreds of contributors that provide new features, bug fixes, and other improvements. Version control systems help developers to collaborate in contributing software for their...
In the proton-proton collisions at the LHC, the associate production of the Higgs boson with two top quarks has not been observed yet. This ttH channel allows directly probing the coupling of the Higgs boson to the top quark. The observation of this process could be a highlight of the ongoing Run 2 data taking.
Unlike to supervised methods (neural networks, decision trees, support vector...
The CMS Submission Infrastructure Global Pool, built on GlideinWMS and HTCondor, is a worldwide distributed dynamic pool responsible for the allocation of resources for all CMS computing workloads. Matching the continuously increasing demand for computing resources by CMS requires the anticipated assessment of its scalability limitations. Extrapolating historical usage trends, by LHC Run III...
IceCube is a cubic kilometer neutrino detector located at the south pole. Metadata for files in IceCube has traditionally been handled on an application by application basis, with no user-facing access. There has been no unified view of data files, and users often just ls the filesystem to locate files. Recently effort has been put into creating such a unified view. Going for a simple...
Monte-Carlo simulation is a fundamental tool for high-energy physics experiments, from the design phase to data analysis. In recent years its relevance has increased due to the ever growing measurements precision. Accuracy and reliability are essential features in simulation and particularly important in the current phase of the LHCb experiment, where physics analysis and preparation for data...
The NA62 experiment at CERN SPS is aimed at measuring the branching ratio of the ultra-rare K+→π+νν decay.
This imposes very tight requirements on the particle identification capabilities of the apparatus in order to reject the considerable background.
To this purpose a centralized level 0 hardware trigger system (L0TP) processes in real-time the streams of data primitives coming from the...
ALICE Overwatch is a project started in late 2015 to provide augmented online monitoring and data quality assurance utilizing time-stamped QA histograms produced by the ALICE High Level Trigger (HLT). The system receives the data via ZeroMQ, storing it for later review, enriching it with detector specific functionality, and visualizing it via a web application. These provided capabilities are...
Good quality track visualization is an important aspect of every High-Energy Physics experiment, where it can be used for quick assessment of recorded collisions. The event display, operated in the Control Room, is also important for visitors and increases public recognition of the experiment. Especially in the case of the ALICE detector at the Large Hadron Collider (LHC), which reconstructs...
CVMFS has proved an extremely effective mechanism for providing scalable, POSIX like, access to experiment software across the Grid. The normal method for file access is http downloads via squid caches from a small number of Stratum 1 servers. In the last couple of years this mechanisms has been extended to allow access of files from any storage offering http access. This has been named...
The HL-LHC program has seen numerous extrapolations of its needed computing resources that each indicate the need for substantial changes if the desired HL-LHC physics program is to be supported within the current level of computing resource budgets. Drivers include large increases in event complexity (leading to increased processing time and analysis data size) and trigger rates needed (5-10...
This contribution reports on the experience acquired from using the Oracle Cloud
Infrastructure (OCI) as an Infrastructure as a Service (IaaS) within the distributed computing environments of the LHC experiments. The bare metal resources provided in the cloud were integrated using existing deployment and computer management tools. The model used in earlier cloud exercises was adapted to the...
MERLIN is a C++ particle tracking software package, originally developed at DESY for use in International Linear Collider (ILC) simulations. MERLIN has more recently been adapted for High-Luminosity Large Hadron Collider (HL-LHC) collimation studies, utilising more advanced scattering physics. However, as is all too common in existing high-energy physics software, recent developments have not...
In the horizon of the High Luminosity Large Hadron Collider phase (HL-LHC), each proton bunch crossing will bring up to 200 simultaneous proton collisions. Performing the charged particle trajectory reconstruction in such dense environment will be computationally challenging because of the nature of the traditional algorithms used. The common combinatorial Kalman Filter state-of-the-art...
Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing, and given the recent advances in big data and emerging trends in cloud computing (e.g., serverless [18]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [6] and Amazon Web Services [10]). To address these domains’ processing needs, recent...
Until recently, the direct visualization of the complete ATLAS experiment geometry and final analysis data was confined within the software framework of the experiment.
To provide a detailed interactive data visualization capability to users, as well as easy access to geometry data, and to ensure platform independence and portability, great effort has been recently put into the modernization...
Unprecedented size and complexity of the ATLAS experiment required
adoption of a new approach for online monitoring system development as
many requirements for this system were not known in advance due to the
innovative nature of the project.
The ATLAS online monitoring facility has been designed as a modular
system consisting of a number of independent components, which can
interact with one...
Charged particle tracks registered in high energy and nuclear physics (HENP) experiments are to be reconstructed on the very important stage of physical analysis named the tracking. It consists in joining into clusters a great number of so-called hits produced on sequential co-ordinate planes of tracking detectors. Each of these clusters joins all hits belonging to the same track, one of many...
The Belle II experiment, based in Japan, is designed for the precise measurement of B and C meson as well as $\tau$ decays and is intended to play an important role in the search for physics beyond the Standard Model. To visualize the collected data, amongst other things, virtual reality (VR) applications are used within the collaboration. In addition to the already existing VR application...
CERN's batch and grid services are mainly focused on High Throughput computing (HTC) for LHC data processing. However, part of the user community requires High Performance Computing (HPC) for massively parallel applications across many cores on MPI-enabled intrastructure. This contribution addresses the implementation of HPC infrastructure at CERN for Lattice QCD application development, as...
SHiP is a new proposed fixed-target experiment at the CERN SPS accelerator. The goal of the experiment is to search for hidden particles predicted by models of Hidden Sectors. Track pattern recognition is an early step of data processing at SHiP. It is used to reconstruct tracks of charged particles from the decay of neutral New Physics objects. Several artificial neural networks and boosting...
LHCb is undergoing major changes in its data selection and processing chain for the upcoming LHC Run 3 starting in 2021. With this in view several initiatives have been launched to optimise the software stack. This contribution discusses porting the LHCb Stack from x86 architecture to aarch64 architecture with the goal to evaluate the performance and the cost of the computing infrastructure...
The Compact Muon Solenoid (CMS) is one of the experiments at the CERN Large Hadron Collider (LHC). The CMS Online Monitoring system (OMS) is an upgrade and successor to the CMS Web-Based Monitoring (WBM) system, which is an essential tool for shift crew members, detector subsystem experts, operations coordinators, and those performing physics analyses. CMS OMS is divided into aggregation and...
The Production and Distributed Analysis system (PanDA) for the ATLAS experiment at the Large Hadron Collider has seen big changes over the past couple of years to accommodate new types of distributed computing resources: clouds, HPCs, volunteer computers and other external resources. While PanDA was originally designed for fairly homogeneous resources available through the Worldwide LHC...
The computing center GridKa is serving the ALICE, ATLAS, CMS and LHCb experiments as one of the biggest WLCG Tier-1 centers world wide with compute and storage resources. It is operated by the Steinbuch Centre for Computing at Karlsruhe Institute of Technology in Germany. In April 2017 a new online storage system was put into operation. In its current stage of expansion it offers the HEP...
The development of data management services capable to cope with very large data resources is a key challenge to allow the future e-infrastructures to address the needs of the next generation extreme scale scientific experiments.
To face this challenge, in November 2017 the H2020 “eXtreme DataCloud - XDC” project has been launched. Lasting for 27 months and combining the expertise of 8 large...
During 2017 support for Docker and Singularity containers was added to
the Vac system, in addition to its long standing support for virtual
machines. All three types of "logical machine" can now be run in
parallel on the same pool of hypervisors, using container or virtual
machine definitions published by experiments. We explain how CernVM-FS
is provided to containers by the hypervisors, to...
Analysis of neutrino oscillation data involves a combination of complex fitting procedures and statistical corrections techniques that are used to determine the full three-flavor PMNS parameters and constraint contours. These techniques rely on computationally intensive “multi-universe” stochastic modeling. The process of calculating these contours and corrections can dominate final stages...
The ALICE experiment will undergo an extensive detector and readout upgrade for the LHC Run3 and will collect a 10 times larger data volume than today. This will translate into increase of the required CPU resources worldwide as well as higher data access and transfer rates. JAliEn (Java ALICE Environment) is the new Grid middleware designed to scale-out horizontally and satisfy the ALICE...
Control and monitoring of experimental facilities as well as laboratory equipment requires handling a blend of different tasks. Often in industrial or scientific fields there are standards or form factor to comply with and electronic interfaces or custom busses to adopt. With such tight boundary conditions, the integration of an off-the-shelf Single Board Computer (SBC) is not always a...
Since Run II, future development projects for the Large Hadron Collider will constantly bring nominal luminosity increase, with the ultimate goal of reaching a peak luminosity of $5 · 10^{34} cm^{−2} s^{−1}$ for ATLAS and CMS experiments planned for the High Luminosity LHC (HL-LHC) upgrade. This rise in luminosity will directly result in an increased number of simultaneous proton collisions...
Interactive 3D data visualization plays a key role in HEP experiments, as it is used in many tasks at different levels of the data chain. Outside HEP, for interactive 3D graphics, the game industry makes heavy use of so-called “game engines”, modern software frameworks offering an extensive set of powerful graphics tools and cross-platform deployment. Recently, a very strong support for...
One of the big challenges in High Energy Physics development is the fact that many potential -and very valuable- students and young researchers live in countries where internet access and computational infrastructure are poor compared to institutions already participating.
In order to accelerate the process, the ATLAS Open Data project releases useful and meaningful data and tools using...
The current scientific environment has experimentalists and system administrators allocating large amounts of time for data access, parsing and gathering
as well as instrument management. This is a growing challenge with more large
collaborations with significant amount of instrument resources, remote instrumentation sites and continuously improved and upgraded scientific...
Virtualization and containers have become the go-to solutions for simplified deployment, elasticity and workflow isolation. These benefits are especially advantageous in containers, which dispense with the resources overhead associated with VMs, applicable in all cases where virtualization of the full hardware stack is not considered necessary. Containers are also simpler to setup and maintain...
The High-Luminosity LHC will see pileup level reaching 200, which will greatly increase the complexity the tracking component of the event reconstruction.
To reach out to Computer Science specialists, a Tracking Machine Learning challenge (trackML) is being set up on Kaggle for the first 2018 semester by a team of ATLAS, CMS and LHCb physicists tracking experts and Computer Scientists,...
The dynamic data federation software (Dynafed), developed by CERN IT, provides a federated storage cluster on demand using the HTTP protocol with WebDAV extensions. Traditional storage sites which support an experiment can be added to Dynafed without requiring any changes to the site. Dynafed also supports direct access to cloud storage such as S3 and Azure. We report on the usage of Dynafed...
In many HEP experiments a typical data analysis workflow requires each user
to read the experiment data in order to extract meaningful information and produce relevant plots for the considered analysis. Multiple users accessing the same data result in a redundant access to the data itself, which could be factorised effectively improving the CPU efficiency of the analysis jobs and relieving...
The part of the CMS data acquisition (DAQ) system responsible for data readout and event building is a complex network of interdependent distributed programs. To ensure successful data taking, these programs have to be constantly monitored in order to facilitate the timeliness of necessary corrections in case of any deviation from specified behaviour. A large number of diverse monitoring data...
The LHCb experiment will undergo a major upgrade for LHC Run-III, scheduled to
start taking data in 2021. The upgrade of the LHCb detector introduces a
radically new data-taking strategy: the current multi-level event filter will
be replaced by a trigger-less readout system, feeding data into a software
event filter at a rate of 40 MHz.
In particular, a new Vertex Locator (VELO) will be...
With the increase of power and reduction of cost of GPU accelerated processors a corresponding interest in their uses in the scientific domain has spurred. OSG users are no different and they have shown an interest in accessing GPU resources via their usual workload infrastructures. Grid sites that have these kinds of resources also want to make them grid available. In this talk, we discuss...
During 2017, LHCb created Docker and Singularity container definitions which allow sites to run all LHCb DIRAC workloads in containers as "black boxes". This parallels LHCb's previous work to encapsulate the execution of DIRAC payload jobs in virtual machines, and we explain how these three types of "logical machine" are related in LHCb's case and how they differ, in terms of architecture,...
High energy physics is no longer the main user or developer of data analysis tools. Open source tools developed primarily for data science, business intelligence, and finance are available for use in HEP, and adopting them would the reduce in-house maintenance burden and provide users with a wider set of training examples and career options. However, physicists have been analyzing data with...
The EOS project started as a specialized disk-only storage software solution for physics analysis use-cases at CERN in 2010.
Over the years EOS has evolved into an open storage platform, leveraging several open source building blocks from the community. The service at CERN manages around 250 PB, distributed across two data centers and provides user- and project-spaces to all CERN experiments....
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque maximus felis eu magna feugiat bibendum. Nullam leo ligula, vestibulum a molestie sit amet, consectetur nec sapien. Sed vel scelerisque elit, non hendrerit lacus. Duis imperdiet sapien ut dictum scelerisque. Curabitur volutpat porta elit, eu blandit velit molestie a. Cras risus nisl, scelerisque at molestie at, tincidunt in...