Access and exploitation of large scale computing resources, such as those offered by general
purpose HPC centres, is one import measure for ATLAS and the other Large Hadron Collider experiments
in order to meet the challenge posed by the full exploitation of the future data within the constraints of flat budgets.
We report on the effort moving the Swiss WLCG T2 computing,
serving ATLAS, CMS...
Based on GooFit, a GPU-friendly framework for doing maximum-likelihood fits, we have developed a tool for extracting model-independent S-wave amplitudes from three-body decays such as D+ --> h(')-,h+,h+. A full amplitude analysis is done where the magnitudes and phases of the S-wave amplitudes (or alternatively, the real and imaginary components), are anchored at a finite number of...
The SND detector takes data at the e+e- collider VEPP-2000 in Novosibirsk. We present here
recent upgrades of the SND DAQ system which are mainly aimed to handle the enhanced events
rate load after the collider modernization. To maintain acceptable events selection quality the electronics
throughput and computational power should be increased. These goals are achieved with the new fast...
The installation of Virtual Visit services by the LHC collaborations began shortly after the first high energy collisions were provided by the CERN accelerator in 2010. The experiments: ATLAS, CMS, LHCb, and ALICE have all joined in this popular and effective method to bring the excitement of scientific exploration and discovery into classrooms and other public venues around the world. Their...
CERN has been developing and operating EOS as a disk storage solution successfully for 5 years. The CERN deployment provides 135 PB and stores 1.2 billion replicas distributed over two computer centres. Deployment includes four LHC instances, a shared instance for smaller experiments and since last year an instance for individual user data as well. The user instance represents the backbone of...
Fifteen Chinese High Performance Computing sites, many of them on the TOP500 list of most powerful supercomputers, are integrated into a common infrastructure providing coherent access to a user through an interface based on a RESTful interface called SCEAPI. These resources have been integrated into the ATLAS Grid production system using a bridge between ATLAS and SCEAPI which translates the...
Since the launch of HiggsHunters.org in November 2014, citizen science volunteers
have classified more than a million points of interest in images from the ATLAS experiment
at the LHC. Volunteers have been looking for displaced vertices and unusual features in images
recorded during LHC Run-1. We discuss the design of the project, its impact on the public,
and the surprising results of how...
Obtaining CPU cycles on an HPC cluster is nowadays relatively simple and sometimes even cheap for academic institutions. However, in most of the cases providers of HPC services would not allow changes on the configuration, implementation of special features or a lower-level control on the computing infrastructure and networks, for example for testing new computing patterns or conducting...
The LHC will collide protons in the ATLAS detector with increasing luminosity through 2016, placing stringent operational and physical requirements to the ATLAS trigger system in order to reduce the 40 MHz collision rate to a manageable event storage rate of about 1 kHz, while not rejecting interesting physics events. The Level-1 trigger is the first rate-reducing step in the ATLAS trigger...
The instantaneous luminosity of the LHC is expected to increase at HL-LHC so that the amount of pile-up can reach a level of 200 interaction per bunch crossing, almost a factor of 10 w.r.t the luminosity reached at the end of run 1. In addition, the experiments plan a 10-fold increase of the readout rate. This will be a challenge for the ATLAS and CMS experiments, in particular for the...
All four of the LHC experiments depend on web proxies (that is, squids) at each grid site in order to support software distribution by the CernVM FileSystem (CVMFS). CMS and ATLAS also use web proxies for conditions data distributed through the Frontier Distributed Database caching system. ATLAS & CMS each have their own methods for their grid jobs to find out which web proxy to use for...
XRootD is a distributed, scalable system for low-latency file access. It is the primary data access framework for the high-energy physics community. One of the latest developments in the project has been to incorporate metalink and segmented file transfer technologies.
We report on the implementation of the metalink metadata format support within XRootD client. This includes both the CLI and...
ALICE HLT Run2 performance overview
M.Krzewicki for the ALICE collaboration
The ALICE High Level Trigger (HLT) is an online reconstruction and data compression system used in the ALICE experiment at CERN. Unique among the LHC experiments, it extensively uses modern coprocessor technologies like general purpose graphic processing units (GPGPU) and field programmable gate arrays (FPGA) in the...
The ATLAS computing model was originally designed as static clouds (usually national or geographical groupings of sites) around the
Tier 1 centers, which confined tasks and most of the data traffic. Since those early days, the sites' network bandwidth has
increased at O(1000) and the difference in functionalities between Tier 1s and Tier 2s has reduced. After years of manual,
intermediate...
For the previous decade, high performance, high capacity Open Source storage systems have been designed and implemented, accommodating the demanding needs of the LHC experiments. However, with the general move away from the concept of local computer centers, supporting their associated communities, towards large infrastructures, providing Cloud-like solutions to a large variety of different...
The Open Science Grid (OSG) is a large, robust computing grid that started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero, but has evolved in recent years to a much larger user and resource platform. In addition to meeting the US LHC community’s computational needs, the OSG continues to be one of the largest providers of distributed...
Radiotherapy is planned with the aim of delivering a lethal dose of radiation to a tumour, while keeping doses to nearby healthy organs at an acceptable level. Organ movements and shape changes, over a course of treatment typically lasting four to eight weeks, can result in actual doses being different from planned. The UK-based VoxTox project aims to compute actual doses, at the level of...
ALICE HLT Cluster operation during ALICE Run 2
(Johannes Lehrbach) for the ALICE collaboration
ALICE (A Large Ion Collider Experiment) is one of the four major detectors located at the LHC at CERN, focusing on the study of heavy-ion collisions. The ALICE High Level Trigger (HLT) is a compute cluster which reconstructs the events and compresses the data in real-time. The data compression...
The use of up-to-date machine learning methods, including deep neural networks, running directly on raw data has significant potential in High Energy Physics for revealing patterns in detector signals and as a result improving reconstruction and the sensitivity of the final physics analyses. In this work, we describe a machine-learning analysis pipeline developed and operating at the National...
ZFS is a combination of file system, logical volume manager, and software raid system developed by SUN Microsystems for the Solaris OS. ZFS simplifies the administration of disk storage and on Solaris it has been well regarded for its high performance, reliability, and stability for many years. It is used successfully for enterprise storage administration around the globe, but so far on such...
The ALICE HLT uses a data transport framework based on the publisher subscriber message principle, which transparently handles the communication between processing components over the network and between processing components on the same node via shared memory with a zero copy approach.
We present an analysis of the performance in terms of maximum achievable data rates and event rates as well...
LArSoft is an integrated, experiment-agnostic set of software tools for liquid argon (LAr) neutrino experiments
to perform simulation, reconstruction and analysis within Fermilab art framework.
Along with common algorithms, the toolkit provides generic interfaces and extensibility
that accommodate the needs of detectors of very different size and configuration.
To date, LArSoft has been...
The observation of neutrino oscillation provides evidence of physics beyond the standard model, and the precise measurement of those oscillations remains an important goal for the field of particle physics. Using two finely segmented liquid scintillator detectors located 14 mrad off-axis from the NuMI muon-neutrino beam, NOvA is in a prime position to contribute to precision measurements of...
A framework for performing a simplified particle physics data analysis has been created. The project analyses a pre-selected sample from the full 2011 LHCb data. The analysis aims to measure matter antimatter asymmetries. It broadly follows the steps in a significant LHCb publication where large CP violation effects are observed in charged B meson three-body decays to charged pions and kaons....
The LHCb software trigger underwent a paradigm shift before the start of Run-II. From being a system to select events for later offline reconstruction, it can now perform the event analysis in real-time, and subsequently decide which part of the event information is stored for later analysis.
The new strategy is only possible due to a major upgrade during the LHC long shutdown I (2012-2015)....
For a few years now, the artdaq data acquisition software toolkit has
provided numerous experiments with ready-to-use components which allow
for rapid development and deployment of DAQ systems. Developed within
the Fermilab Scientific Computing Division, artdaq provides data
transfer, event building, run control, and event analysis
functionality. This latter feature includes built-in...
In preparation for the XENON1T Dark Matter data acquisition, we have
prototyped and implemented a new computing model. The XENON signal and data processing
software is developed fully in Python 3, and makes extensive use of generic scientific data
analysis libraries, such as the SciPy stack. A certain tension between modern “Big Data”
solutions and existing HEP frameworks is typically...
We previously described Lobster, a workflow management tool for exploiting volatile opportunistic computing resources for computation in HEP. We will discuss the various challenges that have been encountered while scaling up the simultaneous CPU core utilization and the software improvements required to overcome these challenges.
Categories: Workflows can now be divided into categories...
Brookhvaven National Laboratory (BNL) anticipates significant growth in scientific programs with large computing and data storage needs in the near future and has recently re-organized support for scientific computing to meet these needs.
A key component is the enhanced role of the RHIC-ATLAS Computing Facility
(RACF)in support of high-throughput and high-performance computing (HTC and HPC) ...
We present the Web-Based Monitoring project of the CMS experiment at the LHC at CERN. With the growth in size and complexity of High Energy Physics experiments and the accompanying increase in the number of collaborators spread across the globe, the importance of broadly accessible monitoring has grown. The same can be said about the increasing relevance of operation and reporting web tools...
In the near future, many new experiments (JUNO, LHAASO, CEPC, etc) with challenging data volume are coming into operations or are planned in IHEP, China. The Jiangmen Underground Neutrino Observatory (JUNO) is a multipurpose neutrino experiment to be operational in 2019. The Large High Altitude Air Shower Observatory (LHAASO) is oriented to the study and observation of cosmic rays, which is...
The CMS experiment has collected an enormous volume of metadata about its computing operations in its monitoring systems, describing its experience in operating all of the CMS workflows on all of the Worldwide LHC Computing Grid Tiers. Data mining efforts into all these information have rarely been done, but are of crucial importance for a better understanding of how CMS did successful...
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GByte/s to the high-level trigger (HLT) farm. The HLT farm selects and classifies interesting events for storage and offline analysis at a rate of around 1 kHz.
The DAQ system has been redesigned during the...
ROOT is one of the core software tool for physicists. For more than a decade it has a central position in the physicists' analysis code and the experiments' frameworks thanks in parts to its stability and simplicity of use. This allowed software development for analysis and frameworks to use ROOT as a "common language" for HEP, across virtually all experiments.
Software development in...
In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run-2 events requires parallelization of the code in order to reduce the memory-per-core footprint constraining serial-execution programs, thus optimizing the exploitation of present multi-core...
The Belle II experiment at KEK is preparing for first collisions in 2017. Processing the large amounts of data that will be produced will require conditions data to be readily available to systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer.
The Belle II conditions database was designed with a straightforward goal: make it as easily...
This paper introduces the evolution of the monitoring system of the Alpha Magnetic Spectrometer (AMS) Science Operation Center (SOC) at CERN.
The AMS SOC monitoring system includes several independent tools: Network Monitor to poll the health metrics of AMS local computing farm, Production Monitor to show the production status, Frame Monitor to record the flight data arriving status, and...
The INFN CNAF Tier-1 computing center is composed by 2 different main rooms containing IT resources and 4 additional locations that hosts the necessary technology infrastructures providing the electrical power and refrigeration to the facility. The power supply and continuity are ensured by a dedicated room with three 15,000 to 400 V transformers in a separate part of the principal building...
ROOT version 6 comes with a C++ compliant interpreter cling. Cling needs to know everything about the code in libraries to be able to interact with them.
This translates into increased memory usage with respect to previous versions of
ROOT.
During the runtime automatic library loading process, ROOT6 re-parses a
set of header files, which describe the library; and enters "recursive"...
Support for Online Calibration in the ALICE HLT Framework
Mikolaj Krzewicki, for the ALICE collaboration
ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN. The High Level Trigger (HLT) is an online compute farm, which reconstructs events measured by the ALICE detector in real-time. The HLT uses a custom online...
Since 2014, the RAL Tier 1 has been working on deploying a Ceph backed object store. The aim is to replace Castor for disk storage. This new service must be scalable to meet the data demands of the LHC to 2020 and beyond. As well as offering access protocols the LHC experiments currently use, it must also provide industry standard access protocols. In order to keep costs down the service...
Dependability, resilience, adaptability, and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the detectors need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the...
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments.
The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In...
Since the 2014 the ATLAS and CMS experiments share a common vision for the Condition Database infrastructure required to handle the non-event data for the forthcoming LHC runs. The large commonality in the use cases allows to agree on a common overall design solution meeting the requirements of both experiments. A first prototype implementing these solutions has been completed in 2015 and was...
The GridPP project in the UK has a long-standing policy of supporting non-LHC VOs with 10% of the provided resources. Up until recently this had only been taken up be a very limited set of VOs, mainly due to a combination of the (perceived) large overhead of getting started, the limited computing support within non-LHC VOs and the ability to fulfill their computing requirements on local batch...
LHCb has introduced a novel real-time detector alignment and calibration strategy for LHC Run 2. Data collected at the start of the fill are processed in a few minutes and used to update the alignment parameters, while the calibration constants are evaluated for each run. This procedure improves the quality of the online reconstruction. For example, the vertex locator is retracted and...
The CERN Control and Monitoring Platform (C2MON) is a modular, clusterable framework designed to meet a wide range of monitoring, control, acquisition, scalability and availability requirements. It is based on modern Java technologies and has support for several industry-standard communication protocols. C2MON has been reliably utilised for several years as the basis of multiple monitoring...
This work will present the status of Ceph-related operations and development within the CERN IT Storage Group: we summarise significant production experience at the petabyte scale as well as strategic developments to integrate with our core storage services. As our primary back-end for OpenStack Cinder and Glance, Ceph has provided reliable storage to thousands of VMs for more than 3 years;...
Conditions data (for example: alignment, calibration, data quality) are used extensively in the processing of real and simulated data in ATLAS. The volume and variety of the conditions data needed by different types of processing are quite diverse, so optimizing its access requires a careful understanding of conditions usage patterns. These patterns can be quantified by mining representative...
The exploitation of the full physics potential of the LHC experiments requires fast and efficient processing of the largest possible dataset with the most refined understanding of the detector conditions. To face this challenge, the CMS collaboration has setup an infrastructure for the continuous unattended computation of the alignment and calibration constants, allowing for a refined...
The Tier-1 at CNAF is the main INFN computing facility offering computing and storage resources to more than 30 different scientific collaborations including the 4 experiments at the LHC. It is also foreseen a huge increase in computing needs in the following years mainly driven by the experiments at the LHC (especially starting with the run 3 from 2021) but also by other upcoming experiments...
Notebooks represent an exciting new approach that will considerably facilitate collaborative physics analysis.
They are a modern and widely-adopted tool to express computational narratives comprising, among other elements, rich text, code and data visualisations. Several notebook flavours exist, although one of them has been particularly successful: the Jupyter open source project.
In this...
CRAB3 is a workload management tool used by more than 500 CMS physicists every month to analyze data acquired by the Compact Muon Solenoid (CMS) detector at the CERN Large Hadron Collider (LHC). CRAB3 allows users to analyze a large collection of input files (datasets), splitting the input into multiple Grid jobs depending on parameters provided by users.
The process of manually specifying...
The ATLAS EventIndex System has amassed a set of key quantities for a large number of ATLAS events into a Hadoop based infrastructure for the purpose of providing the experiment with a number of event-wise services. Collecting this data in one place provides the opportunity to investigate various storage formats and technologies and assess which best serve the various use cases as well as...
CEPH is a cutting edge, open source, self-healing distributed data storage technology which is exciting both the enterprise and academic worlds. CEPH delivers an object storage layer (RADOS), block storage layer, and file system storage in a single unified system. CEPH object and block storage implementations are widely used in a broad spectrum of enterprise contexts, from dynamic provision of...
The WLCG Tier-1 center GridKa is developed and operated by the Steinbuch Centre for Computing (SCC)
at the Karlsruhe Institute of Technology (KIT). It was the origin of further Big Data research activities and
infrastructures at SCC, e.g. the Large Scale Data Facility (LSDF), providing petabyte scale data storage
for various non-HEP research communities.
Several ideas and plans...
ROOT provides advanced statistical methods needed by the LHC experiments to analyze their data. These include machine learning tools for classification, regression and clustering. TMVA, a toolkit for multi-variate analysis in ROOT, provides these machine learning methods.
We will present new developments in TMVA, including parallelisation, deep-learning neural networks, new features and...
AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB3) that manages users’ transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user's output data to a single remote site was part of the job execution, resulting in inefficient...
ROOT provides an extremely flexible format used throughout the HEP community. The number of use cases – from an archival data format to end-stage analysis – has required a number of tradeoffs to be exposed to the user. For example, a high “compression level” in the traditional DEFLATE algorithm will result in a smaller file (saving disk space) at the cost of slower decompression (costing CPU...
The ATLAS High Level Trigger Farm consists of around 30,000 CPU cores which filter events at up to 100 kHz input rate.
A costing framework is built into the high level trigger, this enables detailed monitoring of the system and allows for data-driven predictions to be made
utilising specialist datasets. This talk will present an overview of how ATLAS collects in-situ monitoring data on both...
We will report on the first year of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) which is targeting the creation of a distributed Ceph storage infrastructure coupled together with software-defined networking to provide high-performance access for well-connected locations on any participating campus. The project’s goal is to provide a single scalable, distributed storage...
We present rootJS, an interface making it possible to seamlessly integrate ROOT 6 into applications written for Node.js, the JavaScript runtime platform increasingly commonly used to create high-performance Web applications. ROOT features can be called both directly from Node.js code and by JIT-compiling C++ macros. All rootJS methods are invoked asynchronously and support callback functions,...
At the RAL Tier-1 we have been deploying production services on both bare metal and a variety of virtualisation platforms for many years. Despite the significant simplification of configuration and deployment of services due to the use of a configuration management system, maintaining services still requires a lot of effort. Also, the current approach of running services on static machines...
HEP applications perform an excessive amount of allocations/deallocations within short time intervals which results in memory churn, poor locality and performance degradation. These issues are already known for a decade, but due to the complexity of software frameworks and the large amount of allocations (which are in the order of billions for a single job), up until recently no efficient...
The Geant4 Collaboration released a new generation of the Geant4 simulation toolkit (version 10) in December 2013 and reported its new features at CHEP 2015. Since then, the Collaboration continues to improve its physics and computing performance and usability. This presentation will survey the major improvements made since version 10.0. On the physics side, it includes fully revised multiple...
We present a system deployed in the summer of 2015 for the automatic assignment of production and reprocessing workflows for simulation and detector data in the frame of the Computing Operation of the CMS experiment at the CERN LHC. Processing requests involves a number of steps in the daily operation, including transferring input datasets where relevant and monitoring them, assigning work to...
The ATLAS experiment at CERN is planning a second phase of upgrades to prepare for the "High Luminosity LHC", a 4th major run due to start in 2026. In order to deliver an order of magnitude more data than previous runs, 14 TeV protons will collide with an instantaneous luminosity of 7.5 × 1034 cm−2s−1, resulting in much higher pileup and data rates than the current experiment was designed to...
The recent progress in parallel hardware architectures with deeper
vector pipelines or many-cores technologies brings opportunities for
HEP experiments to take advantage of SIMD and SIMT computing models.
Launched in 2013, the GeantV project studies performance gains in
propagating multiple particles in parallel, improving instruction
throughput and data locality in HEP event simulation....
A status of recent developments of the DELPHES C++ fast detector simulation framework will be given. New detector cards for the LHCb detector and prototypes for future e+ e- (ILC, FCC-ee) and p-p colliders at 100 TeV (FCC-hh) have been designed. The particle-flow algorithm has been optimised for high multiplicity environments such as high luminosity and boosted regimes. In addition, several...
The HEP prototypical systems at the Supercomputing conferences each year have served to illustrate the ongoing state of the art developments in high throughput, software-defined networked systems important for future data operations at the LHC and for other data intensive programs. The Supercomputing 2015 SDN demonstration revolved around an OpenFlow ring connecting 7 different booths and the...
The CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. Total resources at Tier-1 and Tier-2 sites pledged to CMS exceed 100,000 CPU cores, and another 50,000-100,000 CPU cores are available opportunistically, pushing the needs of the...
After the Phase-I upgrade and onward, the Front-End Link eXchange (FELIX) system will be the interface between the data handling system and the detector front-end electronics and trigger electronics at the ATLAS experiment. FELIX will function as a router between custom serial links and a commodity switch network which will use standard technologies (Ethernet or Infiniband) to communicate with...
As the ATLAS Experiment prepares to move to a multi-threaded framework
(AthenaMT) for Run3, we are faced with the problem of how to migrate 4
million lines of C++ source code. This code has been written over the
past 15 years and has often been adapted, re-written or extended to
the changing requirements and circumstances of LHC data taking. The
code was developed by different authors, many of...
In today's world of distributed scientific collaborations, there are many challenges to providing reliable inter-domain network infrastructure. Network operators use a combination of
active monitoring and trouble tickets to detect problems, but these are often ineffective at identifying issues that impact wide-area network users. Additionally, these approaches do not scale to wide area...
The LHCb detector will be upgraded for the LHC Run 3 and will be readout at 40 MHz, with major implications on the software-only trigger and offline computing. If the current computing model is kept, the data storage capacity and computing power required to process data at this rate, and to generate and reconstruct equivalent samples of simulated events, will exceed the current capacity by a...
The FabrIc for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientific Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of differing size, scope, and physics area. The FIFE...
Some data analysis methods typically used in econometric studies and in ecology have been evaluated and applied in physics software environments. They concern the evolution of observables through objective identification of change points and trends, and measurements of inequality, diversity and evenness across a data set. Within each one of these analysis areas, several statistical tests and...
The Open Science Grid (OSG) relies upon the network as a critical part of the distributed infrastructures it enables. In 2012 OSG added a new focus area in networking with a goal of becoming the primary source of network information for its members and collaborators. This includes gathering, organizing and providing network metrics to guarantee effective network usage and prompt detection and...
The SciDAC-Data project is a DOE funded initiative to analyze and exploit two decades of information and analytics that have been collected, by the Fermilab Data Center, on the organization, movement, and consumption of High Energy Physics data. The project is designed to analyze the analysis patterns and data organization that have been used by the CDF, DØ, NO𝜈A, Minos, Minerva and other...
The fraction of internet traffic carried over IPv6 continues to grow rapidly. IPv6 support from network hardware vendors and carriers is pervasive and becoming mature. A network infrastructure upgrade often offers sites an excellent window of opportunity to configure and enable IPv6.
There is a significant overhead when setting up and maintaining dual stack machines, so where possible...
The ALICE Collaboration and the ALICE O$^2$ project have carried out detailed studies for a new online computing facility planned to be deployed for Run 3 of the Large Hadron Collider (LHC) at CERN. Some of the main aspects of the data handling concept are partial reconstruction of raw data organized in so called time frames, and based on that information reduction of the data rate without...
High Energy Physics experiments have long had to deal with huge amounts of data. Other fields of study are now being faced with comparable volumes of experimental data and have similar requirements to organize access by a distributed community of researchers. Fermilab is partnering with the Simons Foundation Autism Research Initiative (SFARI) to adapt Fermilab’s custom HEP data management...
The second generation of the ATLAS production system called ProdSys2 is a
distributed workload manager that runs daily hundreds of thousands of jobs,
from dozens of different ATLAS specific workflows, across more than
hundred heterogeneous sites. It achieves high utilization by combining
dynamic job definition based on many criteria, such as input and output
size, memory requirements and...
The Compressed Baryonic Matter experiment (CBM) is a next-generation heavy-ion experiment to be operated at the FAIR facility, currently under construction in Darmstadt, Germany. A key feature of CBM are very high intercation rates, exceeding those of contemporary nuclear collision experiments by several orders of magnitude. Such interaction rates forbid a conventional, hardware-triggered...
The Electron-Ion Collider (EIC) is envisioned as the
next-generation U.S. facility to study quarks and gluons in
strongly interacting matter. Developing the physics program for
the EIC, and designing the detectors needed to realize it,
requires a plethora of software tools and multifaceted analysis
efforts. Many of these tools have yet to be developed or need to
...
The LHCb experiment will undergo a major upgrade during the second long shutdown (2018 - 2019). The upgrade will concern both the detector and the Data Acquisition (DAQ) system, to be rebuilt in order to optimally exploit the foreseen higher event rate. The Event Builder (EB) is the key component of the DAQ system which gathers data from the sub-detectors and build up the whole event. The EB...
The goal of the comparison is to summarize the state-of-the-art techniques of deep learning which is boosted with modern GPUs. Deep learning, which is also known as deep structured learning or hierarchical learning, is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers composed of multiple...
We report current status of the CMS full simulation. For run-II CMS is using Geant4 10.0p02 built in sequential mode. About 8 billion events are produced in 2015. In 2016 any extra production will be done using the same production version. For the development Geant4 10.0p03 with CMS private patches built in multi-threaded mode were established. We plan to use newest Geant4 10.2 for 2017...
We present an implementation of the ATLAS High Level Trigger that provides parallel execution of trigger algorithms within the ATLAS multithreaded software framework, AthenaMT. This development will enable the ATLAS High Level Trigger to meet future challenges due to the evolution of computing hardware and upgrades of the Large Hadron Collider, LHC, and ATLAS Detector. During the LHC...
The LHC is the world's most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the
energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for
event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline
reconstruction workflows at...
With the increased load and pressure on required computing power brought by the higher luminosity in LHC during Run2, there is a need to utilize opportunistic resources not currently dedicated to the Compact Muon Solenoid (CMS) collaboration. Furthermore, these additional resources might be needed on demand. The Caltech group together with the Argonne Leadership Computing Facility (ALCF) are...
RapidIO (http://rapidio.org/) technology is a packet-switched high-performance fabric, which has been under active development since 1997. Originally meant to be a front side bus, it developed into a system level interconnect which is today used in all 4G/LTE base stations world wide. RapidIO is often used in embedded systems that require high reliability, low latency and scalability in a...
The ATLAS EventIndex has been running in production since mid-2015,
reliably collecting information worldwide about all produced events and storing
them in a central Hadoop infrastructure at CERN. A subset of this information
is copied to an Oracle relational database for fast access.
The system design and its optimization is serving event picking from requests of
a few events up to scales of...
HPC network technologies like Infiniband, TrueScale or OmniPath provide low-
latency and high-throughput communication between hosts, which makes them
attractive options for data-acquisition systems in large-scale high-energy
physics experiments. Like HPC networks, data acquisition networks are local
and include a well specified number of systems. Unfortunately traditional...
The LHCb experiment stores around 10^11 collision events per year. A typical physics analysis deals with a final sample of up to 10^7 events. Event preselection algorithms (lines) are used for data reduction. They are run centrally and check whether an event is useful for a particular physical analysis. The lines are grouped into streams. An event is copied to all the streams its lines belong,...
The ATLAS Simulation infrastructure has been used to produce upwards of 50 billion proton-proton collision events for analyses
ranging from detailed Standard Model measurements to searches for exotic new phenomena. In the last several years, the
infrastructure has been heavily revised to allow intuitive multithreading and significantly improved maintainability. Such a
massive update of a...
In the midst of the multi- and many-core era, the computing models employed by
HEP experiments are evolving to embrace the trends of new hardware technologies.
As the computing needs of present and future HEP experiments -particularly those
at the Large Hadron Collider- grow, adoption of many-core architectures and
highly-parallel programming models is essential to prevent degradation...
The ATLAS experiment at the high-luminosity LHC will face a five-fold
increase in the number of interactions per collision relative to the ongoing
Run 2. This will require a proportional improvement in rejection power at
the earliest levels of the detector trigger system, while preserving good signal efficiency.
One critical aspect of this improvement will be the implementation of
precise...
Changes in the trigger menu, the online algorithmic event-selection of the ATLAS experiment at the LHC in response to luminosity and detector changes are followed by adjustments in their monitoring system. This is done to ensure that the collected data is useful, and can be properly reconstructed at Tier-0, the first level of the computing grid. During Run 1, ATLAS deployed monitoring updates...
PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive...
The Canadian Advanced Network For Astronomical Research (CANFAR)
is a digital infrastructure that has been operational for the last
six years.
The platform allows astronomers to store, collaborate, distribute and
analyze large astronomical datasets. We have implemented multi-site storage and
in collaboration with an HEP group at University of Victoria, multi-cloud processing.
CANFAR is deeply...
In recent years there has been increasing use of HPC facilities for HEP experiments. This has initially focussed on less I/O intensive workloads such as generator-level or detector simulation. We now demonstrate the efficient running of I/O-heavy ‘analysis’ workloads for the ATLAS and ALICE collaborations on HPC facilities at NERSC, as well as astronomical image analysis for DESI.
To do...
Software for the next generation of experiments at the Future Circular Collider (FCC), should by design efficiently exploit the available computing resources and therefore support of parallel execution is a particular requirement. The simulation package of the FCC Common Software Framework (FCCSW) makes use of the Gaudi parallel data processing framework and external packages commonly used in...
MonALISA, which stands for Monitoring Agents using a Large Integrated Services Architecture, has been developed over the last fourteen years by Caltech and its partners with the support of the CMS software and computing program. The framework is based on Dynamic Distributed Service Architecture and is able to provide complete monitoring, control and global optimization services for complex...
The High Luminosity LHC (HL-LHC) will deliver luminosities of up to 5x10^34 cm^2/s, with an average of about 140-200 overlapping proton-proton collisions per bunch crossing. These extreme pileup conditions can significantly degrade the ability of trigger systems to cope with the resulting event rates. A key component of the HL-LHC upgrade of the CMS experiment is a Level-1 (L1) track...
Abstract: Southeast University Science Operation Center (SEUSOC) is one of the computing centers of the Alpha Magnetic Spectrometer (AMS-02) experiment. It provides 2000 CPU cores for AMS scientific computing and a dedicated 1Gbps Long Fat Network (LFN) for AMS data transmission between SEU and CERN. In this paper, the workflows of SEUSOC Monte Carlo (MC) production are discussed in...
or some physics processes studied with the ATLAS detector, a more
accurate simulation in some respects can be achieved by including real
data into simulated events, with substantial potential improvements in the CPU,
disk space, and memory usage of the standard simulation configuration,
at the cost of significant database and networking challenges.
Real proton-proton background events can be...
Exascale computing resources are roughly a decade away and will be capable of 100 times more computing than current supercomputers. In the last year, Energy Frontier experiments crossed a milestone of 100 million core-hours used at the Argonne Leadership Computing Facility, Oak Ridge Leadership Computing Facility, and NERSC. The Fortran-based leading-order parton generator called Alpgen was...
The ATLAS Distributed Data Management (DDM) system has evolved drastically in the last two years with the Rucio software fully
replacing the previous system before the start of LHC Run-2. The ATLAS DDM system manages now more than 200 petabytes spread on 130
storage sites and can handle file transfer rates of up to 30Hz. In this talk, we discuss our experience acquired in...
Physics analysis at the Compact Muon Solenoid (CMS) requires both a vast production of simulated events and an extensive processing of the data collected by the experiment.
Since the end of the LHC runI in 2012, CMS has produced over 20 Billion simulated events, from 75 thousand processing requests organised in one hundred different campaigns, which emulate different configurations of...
Micropattern gaseous detector (MPGD) technologies, such as GEMs or MicroMegas, are particularly suitable for precision tracking and triggering in high rate environments. Given their relatively low production costs, MPGDs are an exemplary candidate for the next generation of particle detectors. Having acknowledged these advantages, both the ATLAS and CMS collaborations at the LHC are exploiting...
The long standing problem of reconciling the cosmological evidence of the existence of dark matter with the lack of any clear experimental observation of it, has recently revived the idea that the new particles are not directly connected with the Standard Model gauge fields, but only through mediator fields or ''portals'', connecting our world with new ''secluded'' or ''hidden'' sectors. One...
With processor architecture evolution, the HPC market has undergone a paradigm shift. The adoption of low-cost, Linux-based clusters extended HPC’s reach from its roots in modeling and simulation of complex physical systems to a broad range of industries, from biotechnology, cloud computing, computer analytics and big data challenges to manufacturing sectors. In this perspective, the near...
The ATLAS Event Service (ES) has been designed and implemented for efficient
running of ATLAS production workflows on a variety of computing platforms, ranging
from conventional Grid sites to opportunistic, often short-lived resources, such
as spot market commercial clouds, supercomputers and volunteer computing.
The Event Service architecture allows real time delivery of fine grained...
Distributed data processing in High Energy and Nuclear Physics (HENP) is a prominent example of big data analysis. Having petabytes of data being processed at tens of computational sites with thousands of CPUs, standard job scheduling approaches either do not address well the problem complexity or are dedicated to one specific aspect of the problem only (CPU, network or storage). As a result, ...
Over the past two years, the operations at INFN-CNAF have undergone significant changes.
The adoption of configuration management tools, such as Puppet and the constant increase of dynamic and cloud infrastructures, have led us to investigate a new monitoring approach.
Our aim is the centralization of the monitoring service at CNAF through a scalable and highly configurable monitoring...
A lot of experiments in the field of accelerator based science are actively running at High Energy Accelerator Research Organization (KEK) by using SuperKEKB and J-PARC accelerator in Japan. In these days at KEK, the computing demand from the various experiments for the data processing, analysis and MC simulation is monotonically increasing. It is not only for the case with high-energy...
This contribution gives a report on the remote evaluation of the pre-production Intel Omni-Path (OPA) interconnect hardware and software performed by RHIC & ATLAS Computing Facility (RACF) at BNL in Dec 2015 - Feb 2016 time period using a 32 node “Diamond” cluster with a single Omni-Path Host Fabric Interface (HFI) installed on each and a single 48-port Omni-Path switch with the non-blocking...
IceProd is a data processing and management framework developed by the IceCube Neutrino Observatory for processing of Monte Carlo simulations, detector data, and analysis levels. It runs as a separate layer on top of grid and batch systems. This is accomplished by a set of daemons which process job workflow, maintaining configuration and status information on the job before, during, and after...
The low flux of the ultra-high energy cosmic rays (UHECR) at the highest energies provides a challenge to answer the long standing question about their origin and nature. Even lower fluxes of neutrinos with energies above 10^22 eV are predicted in certain Grand-Unifying-Theories (GUTs) and e.g. models for super-heavy dark matter (SHDM). The significant increase in detector volume required to...
Many physics and performance studies with the ATLAS detector at the Large Hadron Collider require very large samples of simulated events, and producing these using the full GEANT4 detector simulation is highly CPU intensive.
Often, a very detailed detector simulation is not needed, and in these cases fast simulation tools can be used
to reduce the calorimeter simulation time by a few orders...
The ALICE experiment at CERN was designed to study the properties of the strongly-interacting hot and dense matter created in heavy-ion collisions at the LHC energies. The computing model of the experiment currently relies on the hierarchical Tier-based structure, with a top-level Grid site at CERN (Tier-0, also extended to Wigner) and several globally distributed datacenters at national and...
In the ideal limit of infinite resources, multi-tenant applications are able to scale in/out on a Cloud driven only by their functional requirements. A large Public Cloud may be a reasonable approximation of this condition, where tenants are normally charged a posteriori for their resource consumption. On the other hand, small scientific computing centres usually work in a saturated regime...
The Computing Center of the Institute of Physics (CC IoP) of the Czech Academy of Sciences serves a broad spectrum of users with various computing needs. It runs WLCG Tier-2 center for the ALICE and the ATLAS experiments; the same group of services is used by astroparticle physics projects the Pierre Auger Observatory (PAO) and the Cherenkov Telescope Array (CTA). OSG stack is installed for...
The complex geometry of the whole detector of the ATLAS experiment at LHC is currently stored only in custom online databases, from which it is built on-the-fly on request. Accessing the online geometry guarantees accessing the latest version of the detector description, but requires the setup of the full ATLAS software framework "Athena", which provides the online services and the tools to...
The INFN Section of Turin hosts a middle-size multi-tenant cloud infrastructure optimized for scientific computing.
A new approach exploiting the features of VMDIRAC and aiming to allow for dynamic automatic instantiation and destruction of Virtual Machines from different tenants, in order to maximize the global computing efficiency of the infrastructure, has been designed, implemented and...
The ATLAS software infrastructure facilitates efforts of more than 1000
developers working on the code base of 2200 packages with 4 million C++
and 1.4 million python lines. The ATLAS offline code management system is
the powerful, flexible framework for processing new package versions
requests, probing code changes in the Nightly Build System, migration to
new platforms and compilers,...
ATLAS is a high energy physics experiment in the Large Hadron Collider
located at CERN.
During the so called Long Shutdown 2 period scheduled for late 2018,
ATLAS will undergo
several modifications and upgrades on its data acquisition system in
order to cope with the
higher luminosity requirements. As part of these activities, a new
read-out chain will be built
for the New Small Wheel muon...
Distributed computing infrastructures require automatic tools to strengthen, monitor and analyze the security behavior of computing devices. These tools should inspect monitoring data such as resource usage, log entries, traces and even processes' system calls. They also should detect anomalies that could indicate the presence of a cyber-attack. Besides, they should react to attacks without...
The engineering design of a particle detector is usually performed in a
Computer Aided Design (CAD) program, and simulation of the detector's performance
can be done with a Geant4-based program. However, transferring the detector
design from the CAD program to Geant4 can be laborious and error-prone.
SW2GDML is a tool that reads a design in the popular SolidWorks CAD
program and...
The Compact Muon Solenoid (CMS) experiment makes a vast use of alignment and calibration measurements in several data processing workflows: in the High Level Trigger, in the processing of the recorded collisions and in the production of simulated events for data analysis and studies of detector upgrades. A complete alignment and calibration scenario is factored in approximately three-hundred...
The Trigger and Data Acquisition system of the ATLAS detector at the Large Hadron
Collider at CERN is composed of a large number of distributed hardware and software
components (about 3000 machines and more than 25000 applications) which, in a coordinated
manner, provide the data-taking functionality of the overall system.
During data taking runs, a huge flow of operational data is produced...
Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments.
One of the challenges with exploiting volunteer computing is to support a global community of volunteers that provides heterogeneous resources.
However, HEP applications require more data input and output than the CPU intensive applications that are typically used by other...
As demand for widely accessible storage capacity increases and usage is on the rise, steady IO performance is desired but tends to suffer within multi-user environments. Typical deployments use standard hard drives as the cost per/GB is quite low. On the other hand, HDD based solutions for storage are not known to scale well with process concurrency and soon enough, high rate of IOPs create a...
GooFit, a GPU-friendly framework for doing maximum-likelihood fits, has been extended in functionality to do a full amplitude analysis of scalar mesons decaying into four final states via various combinations of intermediate resonances. Recurring resonances in different amplitudes are recognized and only calculated once, to save memory and execution time. As an example, this tool can be used...
The AMS data production uses different programming modules for job submission, execution and management, as well as for validation of produced data. The modules communicate with each other using CORBA interface. The main module is the AMS production server, a scalable distributed service which links all modules together starting from job submission request and ending with writing data to disk...
Efficient administration of computing centres requires advanced tools for the monitoring and front-end interface of their infrastructure. The large-scale distributed grid systems, like the Worldwide LHC Computing Grid (WLCG) and ATLAS computing, offer many existing web pages and information sources indicating the status of the services, systems, requests and user jobs at grid sites. These...
The pilot model employed by the ATLAS production system has been in use for many years. The model has proven to be a success, with many
advantages over push models. However one of the negative side-effects of using a pilot model is the presence of 'empty pilots' running
on sites, consuming a small amount of walltime and not running a useful payload job. The impact on a site can be significant,...
A new analysis category based on g4tools was added in Geant4 release 9.5 with the aim of providing users with a lightweight analysis tool available as part of the Geant4 installation without the need to link to an external analysis package. It has progressively replaced the usage of external tools based on AIDA (Abstract Interfaces for Data Analysis) in all Geant4 examples. Frequent questions...
Simulation of particle-matter interactions in complex geometries is one of
the main tasks in high energy physics (HEP) research. Geant4 is the most
commonly used tool to accomplish it.
An essential aspect of the task is an accurate and efficient handling
of particle transport and crossing volume boundaries within a
predefined (3D) geometry.
At the core of the Geant4 simulation toolkit,...
The distributed computing system in Institute of High Energy Physics (IHEP), China, is based on DIRAC middleware. It integrates about 2000 CPU cores and 500 TB storage contributed by 16 distributed cites. These sites are of various type, such as cluster, grid, cloud and volunteer computing. This system went into production status in 2012. Now it supports multi-VO and serves three HEP...
Previous research has shown that it is relatively easy to apply a simple shim to conventional WLCG storage interfaces, in order to add Erasure coded distributed resilience to data.
One issue with simple EC models is that, while they can recover from losses without needing additional full copies of data, recovery often involves reading the all of the distributed chunks of the file (and their...
Maintainability is a critical issue for large scale, widely used software systems, characterized by a long life cycle. It is of paramount importance for a software toolkit, such as Geant4, which is a key instrument for research and industrial applications in many fields, not limited to high energy physics.
Maintainability is related to a number of objective metrics associated with...
Consolidation towards more computing at flat budgets beyond what pure chip technology
can offer, is a requirement for the full scientific exploitation of the future data from the
Large Hadron Collider. One consolidation measure is to exploit cloud infrastructures whenever
they are financially competitive. We report on the technical solutions and the performance used
and achieved running...
The ATLAS Experiment at the LHC is recording data from proton-proton collisions with 13 TeV
center-of-mass energy since spring 2015. The ATLAS collaboration has set up, updated
and optimized a fast physics monitoring framework (TADA) to automatically perform a broad
range of validation and to scan for signatures of new physics in the rapidly growing data.
TADA is designed to provide fast...
The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence.
Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for
metadata aggregation and cataloguing. We briefly describe the architecture, the main services
and the benefits of using AMI in big collaborations, especially for high energy physics.
We focus on the...
The ATLAS experiment explores new hardware and software platforms that, in the future,
may be more suited to its data intensive workloads. One such alternative hardware platform
is the ARM architecture, which is designed to be extremely power efficient and is found
in most smartphones and tablets.
CERN openlab recently installed a small cluster of ARM 64-bit evaluation prototype servers....
The LHCb Vertex Locator (VELO) is a silicon strip semiconductor detector operating at just 8mm distance to the LHC beams. Its 172,000 strips are read at a frequency of 1 MHz and processed by off-detector FPGAs followed by a PC cluster that reduces the event rate to about 10 kHz. During the second run of the LHC, which lasts from 2015 until 2018, the detector performance will undergo continued...
The exploitation of volunteer computing resources has become a popular practice in the HEP computing community as the huge amount of potential computing power it provides. In the recent HEP experiments, the grid middleware has been used to organize the services and the resources, however it relies heavily on the X.509 authentication, which is contradictory to the untrusted feature of volunteer...
In this paper we explain how the C++ code quality is managed in ATLAS using a range of tools from compile-time through to run time testing and reflect on the substantial progress made in the last two years largely through the use of static analysis tools such as Coverity®, an industry-standard tool which enables quality comparison with general open source C++ code. Other available code...
CBM is a heavy-ion experiment at the future FAIR facility in
Darmstadt, Germany. Featuring self-triggered front-end electronics and
free-streaming read-out event selection will exclusively be done by
the First Level Event Selector (FLES). Designed as an HPC cluster,
its task is an online analysis and selection of
the physics data at a total input data rate exceeding 1 TByte/s. To
allow...
The CMS experiment collects and analyzes large amounts of data coming from high energy particle collisions produced by the Large Hadron Collider (LHC) at CERN. This involves a huge amount of real and simulated data processing that needs to be handled in batch-oriented platforms. The CMS Global Pool of computing resources provide +100K dedicated CPU cores and another 50K to 100K CPU cores from...
CMS deployed a prototype infrastructure based on Elastic Search that stores all classAds from the global pool. This includes detailed information on IO, CPU, datasets, etc. etc. for all analysis as well as production jobs. We will present initial results from analyzing this wealth of data, describe lessons learned, and plans for the future to derive operational benefits from analyzing this...
One of the primary objectives of the research on GEMs at CERN is the testing and simulation of prototypes, manufacturing of large-scale GEM detectors and installation into CMS detector sections at the outer layer, where only highly energetic muons particles are detected. When a muon particle traverses a GEM detector, it ionizes the gas molecules generating a freely moving electron that starts...
One of the difficulties experimenters encounter when using a modular event-processing framework is determining the appropriate configuration for the workflow they intend to execute. A typical solution is to provide documentation external to the C++ code source that explains how a given component of the workflow is to be configured. This solution is fragile, because the documentation and the...
Throughout the first year of LHC Run 2, ATLAS Cloud Computing has undergone
a period of consolidation, characterized by building upon previously established systems,
with the aim of reducing operational effort, improving robustness, and reaching higher scale.
This paper describes the current state of ATLAS Cloud Computing.
Cloud activities are converging on a common contextualization...
The Belle II experiment is the upgrade of the highly successful Belle experiment located at the KEKB asymmetric-energy e+e- collider at KEK in Tsukuba, Japan. The Belle experiment collected e+e- collision data at or near the centre-of-mass energies corresponding to $\Upsilon(nS)$ ($n\leq 5$) resonances between 1999 and 2010 with the total integrated luminosity of 1 ab$^{-1}$. The data...
The LHC has planned a series of upgrades culminating in the High Luminosity LHC (HL-LHC) which will have
an average luminosity 5-7 times larger than the nominal Run-2 value. The ATLAS Tile Calorimeter (TileCal) will
undergo an upgrade to accommodate to the HL-LHC parameters. The TileCal read-out electronics will be redesigned,
introducing a new read-out strategy.
The photomultiplier signals...
CERN has been archiving data on tapes in its Computer Center for decades and its archive system is now holding more than 135 PB of HEP data in its premises on high density tapes.
For the last 20 years, tape areal bit density has been doubling every 30 months, closely following HEP data growth trends. During this period, bits on the tape magnetic substrate have been shrinking exponentially;...
Data Flow Simulation of the ALICE Computing System with OMNET++
Rifki Sadikin, Furqon Hensan Muttaqien, Iosif Legrand, Pierre Vande Vyvre for the ALICE Collaboration
The ALICE computing system will be entirely upgraded for Run 3 to address the major challenge of sampling the full 50 kHz Pb-Pb interaction rate increasing by a factor 100 times the present limit. We present, in this...
This contribution reports on the feasibility of executing data intensive workflows on Cloud infrastructures. In order to assess this, the metric ETC = Events/Time/Cost is formed, which quantifies the different workflow and infrastructure configurations that are tested against each other.
In these tests ATLAS reconstruction Jobs are run, examining the effects of overcommitting (more parallel...
We review and demonstrate the design of efficient data transfer nodes (DTNs), from the perspectives of the highest throughput over both local and wide area networks, as well as the highest performance per unit cost. A careful system-level design is required for the hardware, firmware, OS and software components. Furthermore, additional tuning of these components, and the identification and...
The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence.
Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem
for metadata aggregation and cataloguing. AMI is used by the ATLAS production system,
therefore the service must guarantee a high level of availability. We describe our monitoring system
and the...
With many parts of the world having run out of IPv4 address space and the Internet Engineering Task Force (IETF) depreciating IPv4 the use of and migration to IPv6 is becoming a pressing issue. A significant amount of effort has already been expended by the HEPiX IPv6 Working Group (http://hepix-ipv6.web.cern.ch/) on testing dual-stacked hosts and IPv6-only CPU resources. The Queen Mary grid...
Abstract: Nowadays, the High Energy Physics experiments produce a large amount of data. These data is stored in massive storage system, which need to balance the cost, performance and manageability. HEP is a typical data-intensive application, and process a lot of data to achieve scientific discoveries. A hybrid storage system including SSD (Solid-state Drive) and HDD (Hard Disk Drive) layers...
Abstract: Monte Carlo (MC) simulation production plays an important part in physics analysis of the Alpha Magnetic Spectrometer (AMS-02) experiment. To facilitate the metadata retrieving for data analysis needs among the millions of database records, we developed a monitoring tool to analyze and visualize the production status and progress. In this paper, we discuss the workflow of the...
ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN Large Hadron Collider (LHC). A major upgrade of the experiment is planned for 2020. In order to cope with a data rate 100 times higher and with the continuous readout of the Time Projection Chamber (TPC), it is necessary to...
The growing use of private and public clouds, and volunteer computing are driving significant changes in the way large parts of the distributed computing for our communities are carried out. Traditionally HEP workloads within WLCG were almost exclusively run via grid computing at sites where site administrators are responsible for and have full sight of the execution environment. The...
The long standing problem of reconciling the cosmological evidence of the existence of dark matter with the lack of any clear experimental observation of it, has recently revived the idea that the new particles are not directly connected with the Standard Model gauge fields, but only through mediator fields or ''portals'', connecting our world with new ''secluded'' or ''hidden'' sectors. One...
The trigger system of the ATLAS detector at the LHC is a combination of hardware, firmware and software, associated to various sub-detectors that must seamlessly cooperate in order to select 1 collision of interest out of every 40,000 delivered by the LHC every millisecond. This talk will discuss the challenges, workflow and organization of the ongoing trigger software development, validation...
The new generation of high energy physics(HEP) experiments have been producing gigantic data. How to store and access those data with high performance have been challenging the availability, scalability, and I/O performance of the underlying massive storage system. At the same time, a series of researches focusing on big data have been more and more active, and the research about metadata...
Requests for computing resources from LHC experiments are constantly
mounting, and so are their peak usage. Since dimensioning
a site to handle the peak usage times is impractical due to
constraints on resources that many publicly-owned computing centres
have, opportunistic usage of resources from external, even commercial
cloud providers is becoming more and more interesting, and is even...
The CMS experiment at LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning systems. Given the scale of the global queue in CMS, the operators found it increasingly difficult to monitor the pool to find problems and fix them. The operators had to rely on several different web pages, with several different levels of information, and sifting tirelessly...
CRAB3 is a tool used by more than 500 users all over the world for distributed Grid analysis of CMS data. Users can submit sets of Grid jobs with similar requirements (tasks) with a single user request. CRAB3 uses a client-server architecture, where a lightweight client, a server, and ancillary services work together and are maintained by CMS operators at CERN.
As with most complex...
The computing infrastructures serving the LHC experiments have been
designed to cope at most with the average amount of data recorded. The
usage peaks, as already observed in Run-I, may however originate large
backlogs, thus delaying the completion of the data reconstruction and
ultimately the data availability for physics analysis. In order to
cope with the production peaks, the LHC...
The use of opportunistic cloud resources by HEP experiments has significantly increased over the past few years. Clouds that are owned or managed by the HEP community are connected to the LHCONE network or the research network with global access to HEP computing resources. Private clouds, such as those supported by non-HEP research funds are generally connected to the international...
The long standing problem of reconciling the cosmological evidence of the existence of dark matter with the lack of any clear experimental observation of it, has recently revived the idea that the new particles are not directly connected with the Standard Model gauge fields, but only through mediator fields or ''portals'', connecting our world with new ''secluded'' or ''hidden'' sectors. One...
The algorithms and infrastructure of the CMS offline software are under continuous change in order to adapt to a changing accelerator, detector and computing environment. In this presentation, we discuss the most important technical aspects of this evolution, the corresponding gains in performance and capability, and the prospects for continued software improvement in the face of challenges...
Ceph based storage solutions and especially object storage systems based on it are now well recognized and widely used across the HEP/NP community. Both object storage and block storage layers of Ceph are now supporting production ready services for HEP/NP experiments at many research organizations across the globe, including CERN and Brookhaven National Laboratory (BNL), and even the Ceph...
The researchers at the Google Brain team released their second generation Deep Learning library, TensorFlow, as an open-source package under the Apache 2.0 license in November, 2015. Google has already deployed the first generation library using DistBelief in various systems such as Google Search, advertising systems, speech recognition systems, Google Images, Google Maps, Street View, Google...
The High Luminosity LHC (HL-LHC) is a project to increase the luminosity of the Large Hadron Collider to 5*10^34 cm-2 s-1. The CMS experiment is planning a major upgrade in order to cope with an expected average number of overlapping collisions per bunch crossing of 140. The dataset sizes will increase by several orders of magnitude and so will be the request for larger computing...
We present a new experiment management system for the SND detector at the VEPP-2000 collider (Novosibirsk). Substantially, it includes as important part operator access to experimental databases (configuration, conditions and metadata).
The system is designed in client-server architecture. A user interacts with it via web-interface. The server side includes several logical layers: user...
The BESIII experiment located in Beijing is an electron-positron collision experiment to study Tau-Charm physics. Now in its middle age BESIII has aggregated more than 1PB raw data and the distributed computing system has been built up based on DIRAC and put into productions since 2012 to deal with peak demands. Nowadays cloud becomes popular ways to provide resources among BESIII...
The high precision experiment PANDA is specifically designed to shed new light on the structure and properties of hadrons. PANDA is a fixed target antiproton proton experiment and will be part of Facility for Antiproton and Ion Research (FAIR) in Darmstadt, Germany. When measuring the total cross sections or determining the properties of intermediate states very precisely e.g. via the energy...
Processing of the large amount of data produced by the ATLAS experiment requires fast and reliable access to what we call Auxiliary Data Files (ADF). These files, produced by Combined Performance, Trigger and Physics groups, contain conditions, calibrations, and other derived data used by the ATLAS software. In ATLAS this data has, thus far for historical reasons, been collected and accessed...
Accurate simulation of calorimeter response for high energy electromagnetic
particles is essential for the LHC experiments. Detailed simulation of the
electromagnetic showers using Geant4 is however very CPU intensive and
various fast simulation methods were proposed instead. The frozen shower
simulation substitutes the full propagation of the showers for energies
below $1$~GeV by showers...
The current tier-0 processing at CERN is done on two managed sites, the CERN computer centre and the Wigner computer centre. With the proliferation of public cloud resources at increasingly competitive prices, we have been investigating how to transparently increase our compute capacity to include these providers. The approach taken has been to integrate these resources using our existing...
Throughout the last decade the Open Science Grid (OSG) has been fielding requests from user communities, resource owners, and funding agencies to provide information about utilization of OSG resources. Requested data include traditional “accounting” - core-hours utilized - as well as user’s certificate Distinguished Name, their affiliations, and field of science. The OSG accounting service,...
It is well known that submitting jobs to the grid and transferring the
resulting data are not trivial tasks, especially when users are required
to manage their own X.509 certificates. Asking users to manage their
own certificates means that they need to keep the certificates secure,
remember to renew them periodically, frequently create proxy
certificates, and make them available to...
grid-control is an open source job submission tool that supports common HEP workflows.
Since 2007 it has been used by a number of HEP analyses to process tasks which routinely reach the order of tens of thousands of jobs.
The tool is very easy to deploy, either from its repository or the python package index (pypi). The project aims at being lightweight and portable. It can run in...
The Belle II experiment at the SuperKEKB e+e- accelerator is preparing for taking first collision data next year. For the success of the experiment it is essential to have information about varying conditions available in the simulation, reconstruction, and analysis code.
The interface to the conditions data in the client code was designed to make the life for developers as easy as possible....
Argonne provides a broad portfolio of computing resources to researchers. Since 2011 we have been providing a cloud computing resource to researchers, primarily using Openstack. Over the last year we’ve been working to better support containers in the context of HPC. Several of our operating environments now leverage a combination of the three technologies which provides infrastructure...
The Scientific Computing Department of the STFC runs a cloud service for internal users and various user communities. The SCD Cloud is configured using a Configuration Management System called Aquilon. Many of the virtual machine images are also created/configured using Aquilon. These are not unusual however our Integrations also allow Aquilon to be altered by the Cloud. For instance creation...
IPv4 network addresses are running out and the deployment of IPv6 networking in many places is now well underway. Following the work of the HEPiX IPv6 Working Group, a growing number of sites in the Worldwide Large Hadron Collider Computing Grid (WLCG) have deployed dual-stack IPv6/IPv4 services. The aim of this is to support the use of IPv6-only clients, i.e. worker nodes, virtual machines or...
Hybrid systems are emerging as an efficient solution in the HPC arena, with an abundance of approaches for integration of accelerators into the system (i.e. GPU, FPGA). In this context, one of the most important features is the chance of being able to address the accelerators, whether they be local or off-node, on an equal footing. Correct balancing and high performance in how the network...
We present an overview of Data Processing and Data Quality (DQ) Monitoring for the ATLAS Tile Hadronic
Calorimeter. Calibration runs are monitored from a data quality perspective and used as a cross-check for physics
runs. Data quality in physics runs is monitored extensively and continuously. Any problems are reported and
immediately investigated. The DQ efficiency achieved was 99.6% in 2012...
The SDN Next Generation Integrated Architecture (SDN-NGeNIA) program addresses some of the key challenges facing the present and next generations of science programs in HEP, astrophysics, and other fields whose potential discoveries depend on their ability to distribute, process and analyze globally distributed petascale to exascale datasets.
The SDN-NGenIA system under development by the...
A large part of the programs of hadron physics experiments deal with the search for new conventional and exotic hadronic states like e.g. hybrids and glueballs. In a majority of analyses a Partial Wave Analysis (PWA) is needed to identify possible exotic states and to classifiy known states. Of special interest is the comparison or combination of data from multiple experiments. Therefore, a...
This paper describes GridPP's Vacuum Platform for managing virtual machines (VMs), which has been used to run production workloads for WLCG, other HEP experiments, and some astronomy projects. The platform provides a uniform interface between VMs and the sites they run at, whether the site is organised as an Infrastructure-as-a-Service cloud system such as OpenStack with a push model, or an...
Over the past several years, rapid growth of data has affected many fields of science. This has often resulted in the need for overhauling or exchanging the tools and approaches in the disciplines’ data life cycles, allowing the application of new data analysis methods and facilitating improved data sharing.
The project Large-Scale Data Management and Analysis (LSDMA) of the German Helmholtz...
The LHCb collaboration is one of the four major experiments at the Large Hadron Collider at CERN. Petabytes of data are generated by the detectors and Monte-Carlo simulations. The LHCb Grid interware LHCbDIRAC is used to make data available to all collaboration members around the world. The data is replicated to the Grid sites in different locations. However, disk storage on the Grid is...
Performing efficient resource provisioning is a fundamental aspect for any resource provider. Local Resource Management Systems (LRMS) have been used in data centers for decades in order to obtain the best usage of the resources, providing their fair usage and partitioning for the users. In contrast, current cloud schedulers are normally based on the immediate allocation of resources on a...
Limits on power dissipation have pushed CPUs to grow in parallel processing capabilities rather than clock rate, leading to the rise of "manycore" or GPU-like processors. In order to achieve the best performance, applications must be able to take full advantage of vector units across multiple cores, or some analogous arrangement on an accelerator card. Such parallel performance is becoming a...
The Cherenkov Telescope Array (CTA) – an array of many tens of Imaging Atmospheric Cherenkov Telescopes deployed on an unprecedented scale – is the next-generation instrument in the field of very high energy gamma-ray astronomy. An average data stream of about 0.9 GB/s for about 1300 hours of observation per year is expected, therefore resulting in 4 PB of raw data per year and a total of 27...
The INFN’s project KM3NeT-Italy, supported with Italian PON (National Operative Programs) fundings, has designed a distributed Cherenkov neutrino telescope for collecting photons emitted along the path of the charged particles produced in neutrino interactions. The detector consists of 8 vertical structures, called towers, instrumented with a total number of 672 Optical Modules (OMs) and its...
The reconstruction of charged particles trajectories is a crucial task for most particle physics
experiments. The high instantaneous luminosity achieved at the LHC leads to a high number
of proton-proton collisions per bunch crossing, which has put the track reconstruction
software of the LHC experiments through a thorough test. Preserving track reconstruction
performance under...
Axion is a candidate of dark matter and is believed to be a breakthrough of strong CP problem in QCD [1]. CULTASK (CAPP Ultra-Low Temperature Axion Search in Korea) experiment is an axion search experiment which is being performed at Center for Axion and Precision Physics Research (CAPP), Institute for Basic Science (IBS) in Korea. Based on Sikivie’s method [2], CULTASK uses a resonant cavity...
More than one thousand physicists analyse data collected by the ATLAS experiment at the Large Hadron Collider (LHC) at CERN through 150 computing facilities around the world. Efficient distributed analysis requires optimal resource usage and the interplay of several
factors: robust grid and software infrastructures, and system capability to adapt to different workloads. The continuous...
Over the last seven years the software stack of the next generation B factory experiment Belle II has grown to over 400,000 lines of C++ and python code, counting only the part included in offline software releases. There are several thousand commits to the central repository by about 100 individual developers per year. To keep a coherent software stack of high quality such that it can be...
SWAN is a novel service to perform interactive data analysis in the cloud. SWAN allows users to write and run their data analyses with only a web browser, leveraging the widely-adopted Jupyter notebook interface. The user code, executions and data live entirely in the cloud. SWAN makes it easier to produce and share results and scientific code, access scientific software, produce tutorials and...
Open City Platform (OCP) is an industrial research project funded by the Italian Ministry of University and Research, started in 2014. It intends to research, develop and test new technological solutions open, interoperable and usable on-demand in the field of Cloud Computing, along with new sustainable organizational models for the public administration, to innovate, with scientific results,...
In particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads.
We present a generic analysis design pattern...
As a new approach to manage resource, virtualization technology is more and more widely applied in high-energy physics field. A virtual computing cluster based on Openstack was built at IHEP, and with HTCondor as the job queue management system. An accounting system which can record the resource usages of different experiment groups in details was also developed. There are two types of the...
The Deep Underground Neutrino Experiment (DUNE) will employ a uniquely large (40kt) Liquid Argon Time Projection chamber as the main component of its Far Detector. In order to validate this design and characterize the detector performance an ambitious experimental program (called "protoDUNE") has been created which includes a beam test of a large-scale DUNE prototype at CERN. The amount of...
With the LHC Run2, end user analyses are increasingly challenging for both users and resource providers.
On the one hand, boosted data rates and more complex analyses favor and require larger data volumes to be processed.
On the other hand, efficient analyses and resource provisioning require fast turnaround cycles.
This puts the scalability of analysis infrastructures to new...
The reconstruction and identification of charmed hadron decays provides an important tool for the study of heavy quark behavior in the Quark Gluon Plasma. Such measurements require high resolution to topologically identify decay daughters at vertices displaced <100 microns from the primary collision vertex, placing stringent demands on track reconstruction software. To enable these...
The LArIAT Liquid Argon Time Projection Chamber (TPC) in a Test Beam experiment explores the interaction of charged particles such as pions, kaons, electrons, muons and protons within the active liquid argon volume of the TPC detector. The LArIAT experiment started data collection at the Fermilab Test Beam Facility (FTBF) in April 2015 and continues to run in 2016. LArIAT provides important...
The VecGeom geometry library is a relatively recent effort aiming to provide
a modern and high performance geometry service for particle-detector simulation
in hierarchical detector geometries common to HEP experiments.
One of its principal targets is the effective use of vector SIMD hardware
instructions to accelerate geometry calculations for single-track as well
as multiple-track...
Over the past few years, Grid Computing technologies have reached a high
level of maturity. One key aspect of this success has been the development and adoption of newer Compute Elements to interface the external Grid users with local batch systems. These new Compute Elements allow for better handling of jobs requirements and a more precise management of diverse local resources.
However,...
The IceCube Neutrino Observatory is a cubic kilometer neutrino telescope located at the Geographic South Pole. IceCube collects 1 TB of data every day. An online filtering farm processes this data in real time and selects 10% to be sent via satellite to the main data center at the University of Wisconsin-Madison. IceCube has two year-round on-site operators. New operators are hired every year,...
One of the large challenges of future particle physics experiments is the trend to run without a first level hardware trigger. The typical data rates exceed easily hundreds of GBytes/s, which is way too much to be stored permanently for an offline analysis. Therefore a strong data reduction has to be done by selection only those data, which is physically interesting. This implies that all...
Clouds and Virtualization are typically used in computing centers to satisfy diverse needs: different operating systems, software releases or fast servers/services delivery. On the other hand solutions relying on Linux kernel capabilities such as Docker are well suited for applications isolation and software developing. In our previous work (Docker experience at INFN-Pisa Grid Data Center*) we...
ATLAS track reconstruction code is continuously evolving to match the demands from the increasing instantaneous luminosity of LHC, as well as the increased centre-of-mass energy. With the increase in energy, events with dense environments, e.g. the cores of jets or boosted tau leptons, become much more abundant. These environments are characterised by charged particle separations on the order...
The Toolkit for Multivariate Analysis (TMVA) is a component of the ROOT data analysis framework and is widely used for classification problems. For example, TMVA might be used for the binary classification problem of distinguishing signal from background events.
The classification methods included in TMVA are standard, well-known machine learning techniques which can be implemented in other...
One of the STAR experiment's modular Messaging Interface and Reliable Architecture framework (MIRA) integration goals is to provide seamless and automatic connections with the existing control systems. After an initial proof of concept and operation of the MIRA system as a parallel data collection system for online use and real-time monitoring, the STAR Software and Computing group is now...
The Cloud Area Padovana has been running for almost two years. This is an OpenStack-based scientific cloud, spread across two different sites: the INFN Padova Unit and the INFN Legnaro National Labs.
The hardware resources have been scaled horizontally and vertically, by upgrading some hypervisors and by adding new ones: currently it provides about 1100 cores.
Some in-house developments were...
Containers remain a hot topic in computing, with new use cases and tools appearing every day. Basic functionality such as spawning containers seems to have settled, but topics like volume support or networking are still evolving. Solutions like Docker Swarm, Kubernetes or Mesos provide similar functionality but target different use cases, exposing distinct interfaces and APIs.
The CERN...
This contribution reports on solutions, experiences and recent developments with the dynamic, on-demand provisioning of remote computing resources for analysis and simulation workflows. Local resources of a physics institute are extended by private and commercial cloud sites, ranging from the inclusion of desktop clusters over institute clusters to HPC centers.
Rather than relying on...
Gravitational wave (GW) events can have several possible progenitors, including binary black hole mergers, cosmic string cusps, core-collapse supernovae, black hole-neutron star mergers, and neutron star-neutron star mergers. The latter three are expected to produce an electromagnetic signature that would be detectable by optical and infrared
telescopes. To that end, the LIGO-Virgo...
We investigate the combination of a Monte Carlo Tree Search, hierarchical space decomposition, Hough Transform techniques and
parallel computing to the problem of line detection and shape recognition in general.
Paul Hough introduced in 1962 a method for detecting lines in binary images. Extended in the 1970s to the detection of space forms, what
came to be known as the Hough Transform...
The all-silicon design of the tracking system of the CMS experiment provides excellent resolution for charged tracks and an efficient tagging of jets. As the CMS tracker, and in particular its pixel detector, underwent repairs and experienced changed conditions with the start of the LHC Run-II in 2015, the position and orientation of each of the 15148 silicon strip and 1440 silicon pixel...
In order to face the LHC luminosity increase planned for the next years, new high-throughput network mechanisms interfacing the detectors readout to the software trigger computing nodes are being developed in several CERN experiments.
Adopting many-core computing architectures such as Graphics Processing Units (GPUs) or the Many Integrated Core (MIC) would allow to reduce drastically the size...
The development of scientific computing is increasingly moving to web and mobile applications. All these clients need high-quality implementations of accessing heterogeneous computing resources provided by clusters, grid computing or cloud computing. We present a web service called SCEAPI and describe how it can abstract away many details and complexities involved in the use of scientific...
With the imminent upgrades to the LHC and the consequent increase of the amount and complexity of data collected by the experiments, CERN's computing infrastructures will be facing a large and challenging demand of computing resources. Within this scope, the adoption of cloud computing at CERN has been evaluated and has opened the doors for procuring external cloud services from providers,...
Events visualisation in ALICE - current status and strategy for Run 3
Jeremi Niedziela for the ALICE Collaboration
A Large Ion Collider Experiment (ALICE) is one of the four big experiments running at the Large Hadron Collider (LHC), which focuses on the study of the Quark-Gluon Plasma (QGP) being produced in heavy-ion collisions.
The ALICE Event Visualisation Environment (AliEVE) is...
General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded ATLAS High Level Trigger farm. We have developed a demonstrator including GPGPU implementations of Inner Detector and Muon tracking and Calorimeter clustering within the ATLAS software framework. ATLAS is a general purpose particle physics experiment located on the LHC collider at...
The exponentially increasing need for high speed data transfer is driven by big data, cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software that has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a...
DAMPE is a powerful space telescope launched in December 2015, able to detect electrons and photons in a wide range of energy (5 GeV to 10 TeV) and with unprecedented energy resolution. Silicon tracker is a crucial component of detector, able to determine the direction of detected particles and trace the origin of incoming gamma rays. This contribution covers the reconstruction software of...
In this presentation, the data preparation workflows for Run 2 are
presented. Online data quality uses a new hybrid software release
that incorporates the latest offline data quality monitoring software
for the online environment. This is used to provide fast feedback in
the control room during a data acquisition (DAQ) run, via a
histogram-based monitoring framework as well as the online...
ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN.
The High Level Trigger (HLT) is an online compute farm which reconstructs events measured by the ALICE detector in real-time.
The most compute-intensive part is the reconstruction of particle trajectories called tracking and the most important detector for tracking is the...
INDIGO-DataCloud (INDIGO for short, https://www.indigo-datacloud.eu) is a project started in April 2015, funded under the EC Horizon 2020 framework program. It includes 26 European partners located in 11 countries and addresses the challenge of developing open source software, deployable in the form of a data/computing platform, aimed to scientific communities and designed to be deployed on...
The CERN Computer Security Team is assisting teams and individuals at CERN who want to address security concerns related to their computing endeavours. For projects in the early stages, we help incorporate security in system architecture and design. For software that is already implemented, we do penetration testing. For particularly sensitive components, we perform code reviews. Finally, for...
In 2019 the Large Hadron Collider will undergo upgrades in order to increase the luminosity by a factor two if compared to today's nominal luminosity. Currently CMS software parallelization strategy is oriented at scheduling one event per thread. However tracking timing performance depends from the factorial of the pileup leading the current approach to increase latency. When designing a HEP...
At the beginning, HEP experiments made use of photographical images both to record and store experimental data and to illustrate their findings. Then the experiments evolved and needed to find ways to visualize their data. With the availability of computer graphics, software packages to display event data and the detector geometry started to be developed. Here a brief history of event displays...
The main goal of the project to demonstrate the ability of using HTTP data
federations in a manner analogous to today.s AAA infrastructure used from
the CMS experiment. An initial testbed at Caltech has been built and
changes in the CMS software (CMSSW) are being implemented in order to
improve HTTP support. A set of machines is already set up at the Caltech
Tier2 in order to improve the...
The INDIGO-DataCloud project's ultimate goal is to provide a sustainable European software infrastructure for science, spanning multiple computer centers and existing public clouds.
The participating sites form a set of heterogeneous infrastructures, some running OpenNebula, some running OpenStack. There was the need to find a common denominator for the deployment of both the required PaaS...
Modern web browsers are powerful and sophisticated applications that support an ever-wider range of uses. One such use is rendering high-quality, GPU-accelerated, interactive 2D and 3D graphics in an HTML canvas. This can be done via WebGL, a JavaScript API based on OpenGL ES. Applications delivered via the browser have several distinct benefits for the developer and user. For example, they...
Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically come with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The...
For over a decade, X509 Proxy Certificates are used in High Energy Physics (HEP) to authenticate users and guarantee their membership in Virtual Organizations, on which subsequent authorization, e.g. for data access, is based upon. Although the established infrastructure worked well and provided sufficient security, the implementation of procedures and the underlying software is often seen as...
The increase in instantaneous luminosity, number of interactions per bunch crossing and detector granularity will pose an interesting challenge for the event reconstruction and the High Level Trigger system in the CMS experiment at the High Luminosity LHC (HL-LHC), as the amount of information to be handled will increase by 2 orders of magnitude. In order to reconstruct the Calorimetric...
In the competitive 'market' for large-scale storage solutions, EOS has been showing its excellence in the multi-Petabyte high-concurrency regime. It has also shown a disruptive potential in powering the CERNBox service in providing sync&share capabilities and in supporting innovative analysis environments along the storage of LHC data. EOS has also generated interest as generic storage...
JUNO (Jiangmen Underground Neutrino Observatory) is a multi-purpose neutrino experiment designed to measure the neutrino mass hierarchy and mixing parameters. JUNO is estimated to be in operation in 2019 with 2PB/year raw data rate. The IHEP computing center plans to build up virtualization infrastructure to manage computing resources in the coming years and JUNO is selected to be one of the...
Efficient and precise reconstruction of the primary vertex in
an LHC collision is essential in both the reconstruction of the full
kinematic properties of a hard-scatter event and of soft interactions as a
measure of the amount of pile-up. The reconstruction of primary vertices in
the busy, high pile-up environment of Run-2 of the LHC is a challenging
task. New methods have been developed by...
In view of Run3 (2020) the LHCb experiment is planning a major upgrade to fully readout events at 40 MHz collision rate. This in order to highly increase the statistic of the collected samples and go further in precision beyond Run2. An unprecedented amount of data will be produced, which will be fully reconstructed real-time to perform fast selection and categorization of interesting events....
ParaView [1] is a high performance visualization application not widely used in HEP. It is a long standing open source project led by Kitware[2] and involves several DOE and DOD laboratories and has been adopted by many DOE supercomputing centers and other sites. ParaView is unique in speed and efficiency by using state-of-the-art techniques developed by the academic visualization community...
When first looking at converting a part of our site’s grid infrastructure into a cloud based system in late 2013 we needed to ensure the continued accessibility of all of our resources during a potentially lengthy transition period.
Moving a limited number of nodes to the cloud proved ineffective as users expected a significant number of cloud resources to be available to justify the effort...
Randomly restoring files from tapes degrades the read performance primarily due to frequent tape mounts. The high latency and time-consuming tape mount and dismount is a major issue when accessing massive amounts of data from tape storage. BNL's mass storage system currently holds more than 80 PB of data on tapes, managed by HPSS. To restore files from HPSS, we make use of a scheduler...
Reproducibility is a fundamental piece of the scientific method and increasingly complex problems demand ever wider collaboration between scientists. To make research fully reproducible and accessible to collaborators a researcher has to take care of several aspects: research protocol description, data access, preservation of the execution environment, workflow pipeline, and analysis script...
This talk will present the result of recent developments to support new users from the Large Scale Survey Telescope (LSST) group on the GridPP DIRAC instance. I will describe a workflow used for galaxy shape identification analyses whilst highlighting specific challenges as well as the solutions currently being explored. The result of this work allows this community to make best use of...
The 2020 upgrade of the LHCb detector will vastly increase the rate of collisions the Online system needs to process in software, in order to filter events in real time. 30 million collisions per second will pass through a selection chain, where each step is executed conditional to its prior acceptance.
The Kalman Filter is a fit applied to all reconstructed tracks which, due to its time...
The LHCb detector at the LHC is a general purpose detector in the forward region with a focus on reconstructing decays of c- and b-hadrons. For Run II of the LHC, a new trigger strategy with a real-time reconstruction, alignment and calibration was developed and employed. This was made possible by implementing an offline-like track reconstruction in the high level trigger. However, the ever...
The Pacific Research Platform is an initiative to interconnect Science DMZs between campuses across the West Coast of the United States over a 100 gbps network. The LHC @ UC is a proof of concept pilot project that focuses on interconnecting 6 University of California campuses. It is spearheaded by computing specialists from the UCSD Tier 2 Center in collaboration with the San Diego...
The Durham High Energy Physics Database (HEPData) has been built up over the past four decades as a unique open-access repository for scattering data from experimental particle physics. It is comprised of data points from plots and tables underlying over eight thousand publications, some of which are from the Large Hadron Collider (LHC) at CERN.
HEPData has been rewritten from the ground up...
The CRAYFIS experiment proposes usage of private mobile phones as a ground detector for Ultra High Energy Cosmic Rays. Interacting with Earth's atmosphere they produce extensive particle showers which can be detected by cameras on mobile phones. A typical shower contains minimally-ionizing particles such as muons. As they interact with CMOS detector they leave low-energy tracks that sometimes...
Experimental Particle Physics has been at the forefront of analyzing the world’s largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called “Big Data” technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles...
Precise modelling of detectors in simulations is the key to the understanding of their performance, which, in turn, is a prerequisite for the proper design choice and, later, for the achievement of valid physics results. In this report,
we describe the implementation of the Silicon Tracking System (STS), the main tracking device of the CBM experiment, in the CBM software environment. The STS...
Reproducibility is an essential component of the scientific process.
It is often necessary to check whether multiple runs of the same software
produce the same result. This may be done to validate whether a new machine
produces correct results on old software, whether new software produces
correct results on an old machine, or to compare the equality of two different approaches to the...
We describe the development and deployment of a distributed campus computing infrastructure consisting of a single job submission portal linked to multiple local campus resources, as well the wider computational fabric of the Open Science Grid (OSG). Campus resources consist of existing OSG-enabled clusters and clusters with no previous interface to the OSG. Users accessing the single...
European Strategy for Particle Physics update 2013, the study explores different designs of circular colliders for the post-LHC era. Reaching unprecedented energies and luminosities require to understand system reliability behaviour from the concept phase onwards and to design for availability and sustainable operation. The study explores industrial approaches to model and simulate the...
Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) located at Daejeon in South Korea is the unique data center in the country which helps with its computing resources fundamental research fields deal with the large-scale of data. For historical reason, it has run Torque batch system while recently it starts running HTCondor for...
Moore’s Law has defied our expectations and remained relevant in the semiconductor industry in the past 50 years, but many believe it is only a matter of time before an insurmountable technical barrier brings about its eventual demise. Many in the computing industry are now developing post-Moore’s Law processing solutions based on new and novel architectures. An example is the Micron...
The Large Hadron Collider beauty (LHCb) experiment at CERN specializes in investigating the slight differences between matter and antimatter by studying the decays of beauty or bottom (B) and charm (D) hadrons. The detector has been recording data from proton-proton collisions since 2010. Data preservation (DP) project at the LHCb insures preservation of the experimental and simulated (Monte...
High-energy particle physics (HEP) has advanced greatly over recent years and current plans for the future foresee even more ambitious targets and challenges that have to be coped with. Amongst the many computer technology R&D areas, simulation of particle detectors stands out as the most time consuming part of HEP computing. An intensive R&D and programming effort is required to exploit the...
The CMS experiment has implemented a computing model where distributed monitoring infrastructures are collecting any kind of data and metadata about the performance of the computing operations. This data can be probed further by harnessing Big Data analytics approaches and discovering patterns and correlations that can improve the throughput and the efficiency of the computing model.
CMS...
ALICE (A Large Ion Collider Experiment) is a detector system
optimized for the study of heavy ion collision detector at the
CERN LHC. The ALICE High Level Trigger (HLT) is a computing
cluster dedicated to the online reconstruction, analysis and
compression of experimental data. The High-Level Trigger receives
detector data via serial optical links into custom PCI-Express
based FPGA...
The Belle II experiment will generate very large data samples. In order to reduce the time for data analyses, loose selection criteria will be used to create files rich in samples of particular interest for a specific data analysis (data skims). Even so, many of the resultant skims will be very large, particularly for highly inclusive analyses. The Belle II collaboration is investigating the...
HEP software today is a rich and diverse domain in itself and exists within the mushrooming world of open source software. As HEP software developers and users we can be more productive and effective if our work and our choices are informed by a good knowledge of what others in our community have created or found useful. The HEP Software and Computing Knowledge Base, [hepsoftware.org][1], was...
CMS has tuned its simulation program and chosen a specific physics model of Geant4 by comparing the simulation results with dedicated test beam experiments. CMS continues to validate the physics models inside Geant4 using the test beam data as well as collision data. Several physics lists (collection of physics models) inside the most recent version of Geant4 provide good agreement of the...
The goal of the “INFN-RETINA” R&D project is to develop and implement a parallel computational methodology that allows to reconstruct events with an extremely high number (>100) of charged-particle tracks in pixel and silicon strip detectors at 40 MHz, thus matching the requirements for processing LHC events at the full crossing frequency.
Our approach relies on a massively parallel...
Big Data technologies have proven to be very useful for storage, processing and visualization of derived
metrics associated with ATLAS distributed computing (ADC) services. Log file data and database records, and
metadata from a diversity of systems have been aggregated and indexed to create an analytics platform for
ATLAS ADC operations analysis. Dashboards, wide area data access cost...
Purpose
The aim of this work consists in the full simulation and measurements of a GEMPix (Gas Electron Multiplier) detector for a possible application as monitor for beam verification at CNAO Center (National Center for Oncological Hadrontherapy).
A triple GEMPix detector read by 4 Timepix chips could provide a beam monitoring, dose verification and quality checks with good resolution...
Traditionally, the RHIC/ATLAS Computing Facility (RACF) at Brookhaven National Laboratory has only maintained High Throughput Computing (HTC) resources for our HEP/NP user community. We've been using HTCondor as our batch system for many years, as this software is particularly well suited for managing HTC processor farm resources. Recently, the RACF has also begun to design/administrate some...
LHC data analyses consist of workflows that utilize a diverse set of software tools to produce physics results. The different set of tools range from large software frameworks like Gaudi/Athena to single-purpose scripts written by the analysis teams. The analysis steps that lead to a particular physics result are often not reproducible without significant assistance from the original authors....
In order to estimate the capabilities of a Computing slot with limited processing time, it is necessary to know with a rather good precision its “power”. This allows for example pilot job to match a task for which the required CPU work is known, or to define the number of events to be processed knowing the CPU work per event. Otherwise one always has the risk that the task is aborted because...
The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment.
PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year.
While...
High-energy physics experiments rely on reconstruction of the trajectories of particles produced at the interaction point. This is a challenging task, especially in the high track multiplicity environment generated by p-p collisions at the LHC energies. A typical event includes hundreds of signal examples (interesting decays) and a significant amount of noise (uninteresting examples).
This...
The advent of microcontrollers with enough CPU power and with analog and digital peripherals give the possibility to design a complete acquisition system in one chip. The existence of an world wide data infrastructure as internet allows to think at distributed network of detectors capable to elaborate and send data or respond to settings commands.
The internet infrastructure allow us to do...
The ATLAS Distributed Computing (ADC) group established a new Computing Run Coordinator (CRC)
shift at the start of LHC Run2 in 2015. The main goal was to rely on a person with a good overview
of the ADC activities to ease the ADC experts' workload. The CRC shifter keeps track of ADC tasks
related to their fields of expertise and responsibility. At the same time, the shifter maintains...
The connection of diverse and sometimes non-Grid enabled resource types to the CMS Global Pool, which is based on HTCondor and glideinWMS, has been a major goal of CMS. These resources range in type from a high-availability, low latency facility at CERN for urgent calibration studies, called the CAF, to a local user facility at the Fermilab LPC, allocation-based computing resources at NERSC...
The ATLAS Forward Proton (AFP) detector upgrade project consists of two forward detectors located at 205 m and 217 m on each side of the ATLAS experiment. The aim is to measure momenta and angles of diffractively scattered protons. In 2016 two detector stations on one side of the ATLAS interaction point have been installed and are being commissioned.
The detector infrastructure and necessary...
The Visual Physics Analysis (VISPA) project defines a toolbox for accessing software via the web. It is based on latest web technologies and provides a powerful extension mechanism that enables to interface a wide range of applications. Beyond basic applications such as a code editor, a file browser, or a terminal, it meets the demands of sophisticated experiment-specific use cases that focus...
A modern high energy physics analysis code is complex. As it has for decades, it must handle high speed data I/O, corrections to physics objects applied at the last minute, and multi-pass scans to calculate corrections. An analysis has to accommodate multi-100 GB dataset sizes, multi-variate signal/background separation techniques, larger collaborative teams, and reproducibility and data...
The MasterCode collaboration (http://cern.ch/mastercode) is concerned with the investigation of supersymmetric models that go beyond the current status of the Standard Model of particle physics. It involves teams from CERN, DESY, Fermilab, SLAC, CSIC, INFN, NIKHEF, Imperial College London,King's College London, the Universities of Amsterdam, Antwerpen, Bristol, Minnesota and ETH...
Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted national physics groups to evaluate new data handling and processing solutions. Russian grid sites and universities’ clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics...
Memory has become a critical parameter for many HEP applications and as a consequence some experiments had already to move from single- to multicore jobs. However in the case of LHC experiment software, benchmark studies have shown that many applications are able to run with a much lower memory footprint than what is actually allocated. In certain cases even half of the allocated memory being...
The volume of the coming data in HEP is growing. Also growing volume of the data to be hold long time. Actually large volume of data – big data – is distributed around the planet. In other words now there is situation where the data storage does integrate storage resources from many data centers located far from each other. That means the methods, approaches how to organize, manage the...
HazelNut is a block based Hierarchical Storage System, in which logical data blocks are migrated among storage tiers to achieve better I/O performance. In order to choose migrated blocks, data block I/O process is traced to collect enough information for migration algorithms. There are many ways to trace I/O process and implement block migration. However, how to choose trace metrics and ...
High luminosity operations of the LHC are expected to deliver
proton-proton collisions to experiments with average number of pp
interactions reaching 200 every bunch crossing.
Reconstruction of charged particle tracks in this environment is
computationally challenging.
At CMS, charged particle tracking in the outer silicon tracker detector
is among the largest contributors to the overall CPU...
JavaScript ROOT (JSROOT) aims to provide ROOT-like graphics in web browsers. JSROOT supports reading of binary and JSON ROOT files, and drawing of ROOT classes like histograms (TH1/TH2/TH3), graphs (TGraph), functions (TF1) and many others. JSROOT implements a user interface for THttpServer-based applications.
With the version 4 of JSROOT, many improvements and new features are...
The increases in both luminosity and center of mass energy of the LHC in Run 2 impose more stringent requirements on the accuracy of the Monte Carlo simulation. An important element in this is the inclusion of matrix elements with high parton multiplicity and NLO accuracy, with the corresponding increase in computing requirements for the matrix element generation step posing a significant...
The LHCb experiment relies on LHCbDIRAC, an extension of DIRAC, to drive its offline computing. This middleware provides a development framework and a complete set of components for building distributed computing systems. These components are currently installed and ran on virtual machines (VM) or bare metal hardware. Due to the increased load of work, high availability is becoming more and...
Within the ATLAS detector, the Trigger and Data Acquisition system is responsible for the online processing of data streamed from the detector during collisions at the Large Hadron Collider at CERN. The online farm is comprised of ~4000 servers processing the data read out from ~100 million detector channels through multiple trigger levels. Configuring of these servers is not an easy task,...
MCBooster is a header-only, C++11-compliant library for the generation of large samples of phase-space Monte Carlo events on massively parallel platforms. It was released on GitHub in the spring of 2016. The library core algorithms implement the Raubold-Lynch method; they are able to generate the full kinematics of decays with up to nine particles in the final state. The library supports the...
High energy physics experiments are implementing highly parallel solutions for event processing on resources that support
concurrency at multiple levels. These range from the inherent large-scale parallelism of HPC resources to the multiprocessing and
multithreading needed for effective use of multi-core and GPU-augmented nodes.
Such modes of processing, and the efficient opportunistic use of...
Traditional T2 grid sites still process large amounts of data flowing from the LHC and elsewhere. More flexible technologies, such as virtualisation and containerisation, are rapidly changing the landscape, but the right migration paths to these sunlit uplands are not well defined yet. We report on the innovations and pressures that are driving these changes and we discuss their pros and cons....
The Compact Muon Solenoid (CMS) experiment makes a vast use of alignment and calibration measurements in several crucial workflows: in the event selection at the High Level Trigger (HLT), in the processing of the recorded collisions and in the production of simulated events. A suite of services addresses the key requirements for the handling of the alignment and calibration conditions such as:...
As more detailed and complex simulations are required in different application domains, there is much interest in adapting the code for parallel and multi-core architectures. Parallelism can be achieved by tracking many particles at the same time. This work presents MPEXS, a CUDA implementation of the core Geant4 algorithm used for the simulation of electro-magnetic interactions (electron,...
In a large Data Center, such as a LHC Tier-1, where the structure of the Local Area Network and Cloud Computing Systems varies on a daily basis, network management has become more and more complex.
In order to improve the operational management of the network, this article presents a real-time network topology auto-discovery tool named Netfinder.
The information required for effective...
Monitoring of IT infrastructure and services is essential to maximize availability and minimize disruption, by detecting failures and developing issues to allow rapid intervention.
The HEP group at Liverpool have been working on a project to modernize local monitoring infrastructure (previously provided using Nagios and ganglia) with the goal of increasing coverage, improving visualization...
In this paper, we'll talk about our experiences with different data storage technologies within the ATLAS Distributed Data Management
system, and in particular about object-based storage. Object-based storage differs in many points from traditional file system
storage and offers a highly scalable, simple and most common storage solution for the cloud. First, we describe the needed changes
in...
With the demand for more computing power and the widespread use of parallel and distributed computing, applications are looking for message-based transport solutions for fast, stateless communication. There are many solutions already available, with competing performances, but with varying APIs, making it difficult to support all of them. Trying to find a solution to this problem we decided to...
Managing resource allocation in a Cloud based data center serving multiple virtual organizations is a challenging issue. In fact, while batch systems are able to allocate resources to different user groups according to specific shares imposed by the data center administrators, without a static partitioning of such resources, this is not so straightforward in the most common Cloud frameworks,...
This work combines metric and parallel computing on both multi-GPU and distributed memory architectures when applied to
multi-million or even billion bodies simulations.
Metric trees are data structures for indexing multidimensional sets of points in arbitrary metric spaces. First proposed by Jeffrey
K. Uhlmann [1], as a structure to efficiently solve neighbourhood queries, they have...
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events at a rate of 100 kHz. It transports event data at an aggregate throughput of ~100 GB/s to the high-level trigger (HLT) farm. The CMS DAQ system has been completely rebuilt during the first long shutdown of the LHC in 2013/14. The new DAQ architecture is based on state-of-the-art...
The computing power of most modern commodity computers is far from being fully exploited by standard usage patterns.
The work we present describes the development and setup of a virtual computing cluster based on Docker containers used as worker nodes. The facility is based on Plancton[1]: a lightweight fire-and-forget background service that spawns and controls a local pool of Docker...
The Alpha Magnetic Spectrometer (AMS) on board of the International Space Station (ISS) requires a large amount of computing power for data production and Monte Carlo simulation. A large fraction of the computing resource has been contributed by the computing centers among the AMS collaboration. AMS has 12 “remote” computing centers outside of Science Operation Center at CERN, with different...
A major challenge for data production at the IceCube Neutrino Observatory presents itself in connecting a large set of small clusters together to form a larger computing grid. Most of these clusters do not provide a Grid interface. Using a local account on each submit machine, HTCondor glideins can be submitted to virtually any type of scheduler. The glideins then connect back to a main...
The Alignment, Calibrations and Databases group at the CMS Experiment delivers Alignment and Calibration Conditions Data to a large set of workflows which process recorded event data and produce simulated events. The current infrastructure for releasing and consuming Conditions Data was designed in the two years of the first LHC long shutdown to respond to use cases from the preceding...
Cppyy provides fully automatic Python/C++ language bindings and so doing
covers a vast number of use cases. Use of conventions and known common
patterns in C++ (such as smart pointers, STL iterators, etc.) allow us to
make these C++ constructs more "pythonistic." We call these treatments
"pythonizations", as the strictly bound C++ code is turned into bound code
that has a Python "feel."...
AFP, the ATLAS Forward Proton detector upgrade project consists of two
forward detectors at 205 m and 217 m on each side of the ATLAS
experiment at the LHC. The new detectors aim to measure momenta and
angles of diffractively scattered protons. In 2016 two detector stations
on one side of the ATLAS interaction point have been installed and are
being commissioned.
The front-end electronics...
The current LHCb trigger system consists of a hardware level, which reduces the LHC bunch-crossing rate of 40 MHz to 1 MHz, a rate at which the entire detector is read out. A second level, implemented in a farm of around 20k parallel processing CPUs, the event rate is reduced to around 12.5 kHz. The LHCb experiment plans a major upgrade of the detector and DAQ system in the LHC long shutdown...
The ongoing integration of clouds into the WLCG raises the need for a detailed health and performance monitoring of the virtual resources in order to prevent problems of degraded service and interruptions due to undetected failures. When working in scale, the existing monitoring diversity can lead to a metric overflow whereby the operators need to manually collect and correlate data from...
The endcap time of flight(TOF) detector of the BESIII experiment at the BEPCII was upgraded based on multigap resistive plate chamber technology. During 2015-2016 data taking the TOF system has achieved a total time resolution of 65ps for electrons in Bhabha events. Details of reconstruction and calibration procedures, detector alignment and performance with data will be described.
The STAR Heavy Flavor Tracker (HFT) was designed to provide high-precision tracking for the identification of charmed hadron decays in heavy ion collisions at RHIC. It consists of three independently mounted subsystems, providing four precision measurements along the track trajectory, with the goal of pointing decay daughters back to vertices displaced by <100 microns from the primary event...
Cloud computing can make IT resources configuration flexible and reduce the hardware cost,it also can privide computing service according to the real need.We are applying this computing mode to the Chinese Spallation Neutron Source(CSNS) computing environment.So from the research and practice aspects,firstly,the application status of cloud computing science in High Energy Physics Experiments...
Multi-VO supports based on DIRAC have been set up to provide workload and data management for several high energy experiments in IHEP. The distributed computing platform has 19 heterogeneous sites including Cluster, Grid and Cloud. The heterogeneous resources belong to different Virtual Organizations. Due to scale and heterogeneity, it is complicated to monitor and manage these resources...
One of the biggest challenge with Large scale data management system is to ensure the consistency between the global file catalog
and what is physically on all storage elements.
To tackle this issue, the Rucio software which is used by the ATLAS Distributed Data Management system has been extended to
automatically handle lost or unregistered files (aka Dark Data). This system automatically...
The AliEn file catalogue is a global unique namespace providing mapping between a UNIX-like logical name structure and the corresponding physical files distributed over 80 storage elements worldwide. Powerful search tools and hierarchical metadata information are integral part of the system and are used by the Grid jobs as well as local users to store and access all files on the Grid storage...
The University of Notre Dame (ND) CMS group operates a modest-sized Tier-3 site suitable for local, final-stage analysis of CMS data. However, through the ND Center for Research Computing (CRC), Notre Dame researchers have opportunistic access to roughly 25k CPU cores of computing and a 100 Gb/s WAN network link. To understand the limits of what might be possible in this scenario, we...
Geant4 is a toolkit for the simulation of the passage of particles through matter. Its areas of application include high energy, nuclear and accelerator physics as well as studies in medical and space science.
The Geant4 collaboration regularly performs validation and regression tests through its development cycle. A validation test compares results obtained with a specific Geant4 version...
The expected growth in HPC capacity over the next decade makes such resources attractive for meeting future computing needs of HEP/NP experiments, especially as their cost is becoming comparable to traditional clusters. However, HPC facilities rely on features like specialized operating systems and hardware to enhance performance that make them difficult to be used without significant changes...
SWIFT is a compiled object-oriented language similar in spirit to C++ but with the coding simplicity of a scripting language. Built with the LLVM compiler framework used within Xcode 6 and later versions, SWIFT features interoperability with C, Objective-C, and C++ code, truly comprehensive debugging and documentation features, and a host of language features that make for rapid and effective...
This paper introduces the storage strategy and tools of the science data of the Alpha Magnetic Spectrometer (AMS) at Science Operation Center (SOC) at CERN.
The AMS science data includes flight data, reconstructed data and simulation data, as well as the metadata of them. The data volume is 1070 TB per year of operation, and currently reached 5086 TB in total. We have two storage levels:...
Operational and other pressures have lead to WLCG experiments moving increasingly to a stratified model for Tier-2 resources, where "fat" Tier-2s ("T2Ds") and "thin" Tier-2s ("T2Cs") provide different levels of service.
In the UK, this distinction is also encouraged by the terms of the current GridPP5 funding model. In anticipation of this, testing has been performed on the implications, and...
We review the concept of support vector machines before proceeding to discuss examples of their use in a number of scenarios. Using the Toolkit for Multivariate Analysis (TMVA) implementation we discuss examples relevant to HEP including background suppression for H->tau+tau- at the LHC. The use of several different kernel functions and performance benchmarking is discussed.
The Large Hadron Collider at CERN restarted in 2015 with a higher
centre-of-mass energy of 13 TeV. The instantaneous luminosity is expected
to increase significantly in the coming
years. An upgraded Level-1 trigger system is being deployed in the CMS
experiment in order to maintain the same efficiencies for searches and
precision measurements as those achieved in
the previous run. This system...
CERN currently manages the largest data archive in the HEP domain; over 135PB of custodial data is archived across 7 enterprise tape libraries containing more than 20,000 tapes and using over 80 tape drives. Archival storage at this scale requires a leading edge monitoring infrastructure that acquires live and lifelong metrics from the hardware in order to assess and proactively identify...
The ATLAS collaboration has recently setup a number of citizen science projects which have a strong IT component and could not have been envisaged without the growth of general public computing resources and network connectivity: event simulation through volunteer computing, algorithms improvement via Machine Learning challenges, event display analysis on citizen science platforms, use of...
The LHC has been providing pp collisions with record luminosity and energy since the start of Run 2 in 2015. In the ATLAS experiment the Trigger and Data Acquisition system has been upgraded to deal with the increased event rates. The dataflow element of the system is distributed across hardware and software and is responsible for buffering and transporting event data from the Readout system...
The LHC, at design capacity, has a bunch-crossing rate of 40 MHz whereas the ATLAS experiment at the LHC has an average recording rate of about 1000 Hz. To reduce the rate of events but still maintain a high efficiency of selecting rare events such as physics signals beyond the Standard Model, a two-level trigger system is used in ATLAS. Events are selected based on physics signatures such as...
An overview of the CMS Data analysis school (CMSDAS) model and experience is provided. The CMSDAS is the official school that CMS organize every year in US, in Europe and in Asia to train students, Ph.D and young post-docs for the physics analysis. It consists of two days of short exercises about physics objects reconstruction and identification and 2.5 days of long exercises about physics...
The Czech National Grid Infrastructure is operated by MetaCentrum, a CESNET department responsible for coordinating and managing activities related to distributed computing. CESNET as the Czech National Research and Education Network (NREN) provides many e-infrastructure services, which are used by 94% of the scientific and research community in the Czech Republic. Computing and storage...
In the sociology of small- to mid-sized (O(100) collaborators) experiments the issue of data collection and storage is sometimes felt as a residual problem for which well-established solutions are known. Still, the DAQ system can be one of the few forces that drive towards the integration of otherwise loosely coupled detector systems. As such it may be hard to complete with
off-the-shelf...
The Data and Software Preservation for Open Science (DASPOS) collaboration has developed an ontology for describing particle physics analyses. The ontology, a series of data triples, is designed to describe dataset, selection cuts, and measured quantities for an analysis. The ontology specification, written in the Web Ontology Language (OWL), is designed to be interpreted by many pre-existing...
The growth in size and geographical distribution of scientific collaborations, while enabling researcher to achieve always higher and bolder results, also poses new technological challenges, one of these being the additional efforts to analyse and troubleshoot network flows that travel for thousands of miles, traversing a number of different network domains. While the day-to-day multi-domain...
The LHCb Grid access if based on the LHCbDirac system. It provides access to data and computational resources to researchers with different geographical locations. The Grid has a hierarchical topology with multiple sites distributed over the world. The sites differ from each other by their number of CPUs, amount of disk storage and connection bandwidth. These parameters are essential for the...
IO optimizations along with the vertical and horizontal elasticity of an application are essential to achieve data processing performance linear scalability. However to deploy these three critical concepts in a unified software environment presents a challenge and as a result most of the existing data processing frameworks rely on external solutions to address them. For example in a multicore...
The ATLAS Trigger & Data Acquisition project was started almost twenty years ago with the aim of providing a scalable distributed data collection system for the experiment. While the software dealing with physics dataflow was implemented by directly using low level communication protocols, like TCP and UDP, the control and monitoring infrastructure services for the system were implemented on...
The Compact Muon Solenoid (CMS) experiment makes a vast use of alignment and calibration measurements in several data processing workflows. Such measurements are produced either by automated workflows or by analysis tasks carried out by experts in charge. Very frequently, experts want to inspect and exchange with others in CMS the time evolution of a given calibration, or want to monitor the...
The Resource Manager is one of the core components of the Data Acquisition system of the ATLAS experiment at the LHC. The Resource Manager marshals the right for applications to access resources which may exist in multiple but limited copies, in order to avoid conflicts due to program faults or operator errors.
The access to resources is managed in a manner similar to what a lock manager...
SuperKEKB, a next generation B factory, has finished being constructed in Japan as an upgrade of the KEKB e+e- collider. Currently it is running with the BEAST II detector, whose purpose is to understand the interaction and background events at the beam collision region in preparation for the 2018 launch of the Belle II detector. Overall SuperKEKB is expected to deliver a rich data set for the...
The ZEUS data preservation (ZEUS DP) project assures
continued access to the analysis software, experimental data and
related documentation.
The ZEUS DP project supports the possibility to derive valuable
scientific results from the ZEUS data in the future.
The implementation of the data preservation is discussed in the
context of contemporary data analyses and of planning of...
Daily operation of a large scale experimental setup is a challenging task both in terms of maintenance and monitoring. In this work we describes an approach for automated Data Quality system. Based on the Machine Learning methods it can be trained online on manually-labeled data by human experts. Trained model can assist data quality managers filtering obvious cases (both good and bad) and...
The SHiP is a new fixed-target experiment at the CERN SPS accelerator. The goal of the experiment is searching for hidden particles predicted by the models of Hidden Sectors. The purposes of the SHiP Spectrometer Tracker is to reconstruct the tracks of charged particles from the decay of neutral New Physics objects with high efficiency, while rejecting background events. The problem is to...
Electron, muon and photon triggers covering transverse energies from a few GeV to several TeV are essential for signal selection in a wide variety of ATLAS physics analyses to study Standard Model processes and to search for new phenomena. Final states including leptons and photons had, for example, an important role in the discovery and measurement of the Higgs particle. Dedicated triggers...
CERN’s enterprise Search solution “CERN Search” provides a central search solution for users and CERN service providers. A total of about 20 million public and protected documents from a wide range of document collections is indexed, including Indico, TWiki, Drupal, SharePoint, JACOW, E-group archives, EDMS, and CERN Web pages.
In spring 2015, CERN Search was migrated to a new...
The Queen Mary University of London grid site's Lustre file system has recently undergone a major upgrade from version 1.8 to the most recent 2.8 release, and the capacity increased to over 3 PB. Lustre is an open source, POSIX compatible, clustered file system presented to the Grid using the StoRM Storage Resource Manager. The motivation and benefits of upgrading including hardware and...
The international Muon Ionization Cooling Experiment (MICE) is designed to demonstrate the principle of muon ionisation cooling for the first time, for application to a future Neutrino Factory or Muon Collider. The experiment is currently under construction at the ISIS synchrotron at the Rutherford Appleton Laboratory, UK. As presently envisaged, the programme is divided into three Steps:...
The storage ring for the Muon g-2 experiment is composed of twelve custom vacuum chambers designed to interface with tracking and calorimeter detectors. The irregular shape and complexity of the chamber design made implementing these chambers in a GEANT simulation with native solids difficult. Instead, we have developed a solution that uses the CADMesh libraries to convert STL files from 3D...
ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider).
ALICE has been successfully collecting physics data of Run 2 since spring 2015. In parallel, preparations for a major upgrade of the computing system, called O2 (Online-Offline) and scheduled for...
RHIC & ATLAS Computing Facility (RACF) at BNL is a 15000 sq. ft. facility hosting the IT equipment of the BNL ATLAS WLCG Tier-1 site, offline farms for the STAR and PHENIX experiments operating at the Relativistic Heavy Ion Collider (RHIC), BNL Cloud installations, various Open Science Grid (OSG) resources, and many other physics research oriented IT installations of a smaller scale. The...
The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the...
The ATLAS experiment is one of four detectors located on the Large Hardon Collider (LHC) based at CERN. Its detector control system (DCS) stores the slow control data acquired within the back-end of distributed WinCC OA applications. The data can be retrieved for future analysis, debugging and detector development in an Oracle relational database.
The ATLAS DCS Data Viewer (DDV) is a...
In order to patch web servers and web application in a timely manner, we first need to know which software packages are used, and where. But, a typical web stack is composed of multiple layers, including the operating system, web server, application server, programming platform and libraries, database server, web framework, content management system etc. as well as client-side tools. Keeping...
We present the novel Analysis Workflow Management (AWM) that provides users with the tools and competences of professional large scale workflow systems. The approach presents a paradigm shift from executing parts of the analysis to defining the analysis.
Within AWM an analysis consists of steps. For example, a step defines to run a certain executable for multiple files of an input data...
When we first introduced XRootD storage system to the LHC, we needed a filesystem interface so that XRootD system could function as a Grid Storage Element. The result was XRootDfs, a FUSE based mountable posix filesystem. It glues all the data servers in a XRootD storage system together and presents it as a single, posix compliant, multi-user networked filesystem. XRootD's unique redirection...
The Yet Another Rapid Readout (YARR) system is a DAQ system designed for the readout of current generation ATLAS Pixel FE-I4 and next generation ATLAS ITk chips. It utilises a commercial-of-the-shelf PCIe FPGA card as a reconfigurable I/O interface, which acts as a simple gateway to pipe all data from the pixel chips via the high speed PCIe connection into the host systems memory. Relying on...