The advent of microcontrollers with enough CPU power and with analog and digital peripherals give the possibility to design a complete acquisition system in one chip. The existence of an world wide data infrastructure as internet allows to think at distributed network of detectors capable to elaborate and send data or respond to settings commands.
The internet infrastructure allow us to do...
The ATLAS Distributed Computing (ADC) group established a new Computing Run Coordinator (CRC)
shift at the start of LHC Run2 in 2015. The main goal was to rely on a person with a good overview
of the ADC activities to ease the ADC experts' workload. The CRC shifter keeps track of ADC tasks
related to their fields of expertise and responsibility. At the same time, the shifter maintains...
The connection of diverse and sometimes non-Grid enabled resource types to the CMS Global Pool, which is based on HTCondor and glideinWMS, has been a major goal of CMS. These resources range in type from a high-availability, low latency facility at CERN for urgent calibration studies, called the CAF, to a local user facility at the Fermilab LPC, allocation-based computing resources at NERSC...
High energy physics experiments produce huge amounts of raw data, while because of the sharing characteristics of the network resources, there is no guarantee of the available bandwidth for each experiment which may cause link competition problems. On the other side, with the development of cloud computing technologies,IHEP have established a cloud platform based on OpenStack which can ensure...
The ATLAS Forward Proton (AFP) detector upgrade project consists of two forward detectors located at 205 m and 217 m on each side of the ATLAS experiment. The aim is to measure momenta and angles of diffractively scattered protons. In 2016 two detector stations on one side of the ATLAS interaction point have been installed and are being commissioned.
The detector infrastructure and necessary...
LHC Run3 and Run4 represent an unprecedented challenge for HEP computing in terms of both data volume and complexity. New approaches are needed for how data is collected and filtered, processed, moved, stored and analyzed if these challenges are to be met with a realistic budget. To develop innovative techniques we are fostering relationships with industry leaders. CERN openlab is a...
The Visual Physics Analysis (VISPA) project defines a toolbox for accessing software via the web. It is based on latest web technologies and provides a powerful extension mechanism that enables to interface a wide range of applications. Beyond basic applications such as a code editor, a file browser, or a terminal, it meets the demands of sophisticated experiment-specific use cases that focus...
A modern high energy physics analysis code is complex. As it has for decades, it must handle high speed data I/O, corrections to physics objects applied at the last minute, and multi-pass scans to calculate corrections. An analysis has to accommodate multi-100 GB dataset sizes, multi-variate signal/background separation techniques, larger collaborative teams, and reproducibility and data...
The MasterCode collaboration (http://cern.ch/mastercode) is concerned with the investigation of supersymmetric models that go beyond the current status of the Standard Model of particle physics. It involves teams from CERN, DESY, Fermilab, SLAC, CSIC, INFN, NIKHEF, Imperial College London,King's College London, the Universities of Amsterdam, Antwerpen, Bristol, Minnesota and ETH...
Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted national physics groups to evaluate new data handling and processing solutions. Russian grid sites and universities’ clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics...
Memory has become a critical parameter for many HEP applications and as a consequence some experiments had already to move from single- to multicore jobs. However in the case of LHC experiment software, benchmark studies have shown that many applications are able to run with a much lower memory footprint than what is actually allocated. In certain cases even half of the allocated memory being...
Data quality monitoring (DQM) in high-energy physics (HEP) experiments is essential and widely implemented in most large experiments. It provides important real-time information during the commissioning and production phases that allows the early identification of potential issues and eases their resolution.
Existing and performant solutions for online monitoring exist for large experiments...
In 2016 the Large Hadron Collider (LHC) will continue to explore the physics at the high-energy frontier. The integrated luminosity is expected to be about 25 fb$^{-1}$ in 2016 with the estimated peak luminosity of around 1.1 $\times$ 10$^{34}$ cm$^{-2}$ s$^{-1}$ and the peak mean pile-up of about 30. The CMS experiment will upgrade its hardware-based Level-1 trigger system to keep its...
EOS, the CERN open-source distributed disk storage system, provides the high-performance storage solution for HEP analysis and the back-end for various work-flows. Recently EOS became the back-end of CERNBox, the cloud synchronisation service for CERN users.
EOS can be used to take advantage of wide-area distributed installations: for the last few years CERN EOS uses a common deployment...
The volume of the coming data in HEP is growing. Also growing volume of the data to be hold long time. Actually large volume of data – big data – is distributed around the planet. In other words now there is situation where the data storage does integrate storage resources from many data centers located far from each other. That means the methods, approaches how to organize, manage the...
HazelNut is a block based Hierarchical Storage System, in which logical data blocks are migrated among storage tiers to achieve better I/O performance. In order to choose migrated blocks, data block I/O process is traced to collect enough information for migration algorithms. There are many ways to trace I/O process and implement block migration. However, how to choose trace metrics and ...
The HELIX NEBULA Science Cloud (HNSciCloud) project (presented in general by another contribution) is run by a consortium of ten procurers and two other partners; it is funded partly by the European Commission, has a total volume of 5.5 MEUR and runs from January 2016 to June 2018. By its nature as a pre-commercial procurement (PCP) project, it addresses needs that are not covered by any...
High luminosity operations of the LHC are expected to deliver
proton-proton collisions to experiments with average number of pp
interactions reaching 200 every bunch crossing.
Reconstruction of charged particle tracks in this environment is
computationally challenging.
At CMS, charged particle tracking in the outer silicon tracker detector
is among the largest contributors to the overall CPU...
The development and new discoveries of a new generation of high-energy physics cannot be separated from the mass data processing and analysis. The BESIII experiments studies physics in the tau-charm energy region from 2GeV to 4.6 GeV, at the Institute of High Energy Physics (IHEP) in Beijing, China, which is a typical data-intensive computing requiring mass storage and efficient computing...
JavaScript ROOT (JSROOT) aims to provide ROOT-like graphics in web browsers. JSROOT supports reading of binary and JSON ROOT files, and drawing of ROOT classes like histograms (TH1/TH2/TH3), graphs (TGraph), functions (TF1) and many others. JSROOT implements a user interface for THttpServer-based applications.
With the version 4 of JSROOT, many improvements and new features are...
The offline software of the ATLAS experiment at the LHC
(Large Hadron Collider) serves as the platform for
detector data reconstruction, simulation and analysis.
It is also used in the detector trigger system to
select LHC collision events during data taking.
ATLAS offline software consists of several million lines of
C++ and Python code organized in a modular design of
more than 2000...
The increases in both luminosity and center of mass energy of the LHC in Run 2 impose more stringent requirements on the accuracy of the Monte Carlo simulation. An important element in this is the inclusion of matrix elements with high parton multiplicity and NLO accuracy, with the corresponding increase in computing requirements for the matrix element generation step posing a significant...
The long term preservation and sharing of scientific data is becoming nowadays an integral part of any new scientific project. In High Energy Physics experiments (HEP) this is particularly challenging, given the large amount of data to be preserved and the fact that each experiment has its own specific computing model. In the case of HEP experiments that have already concluded the data taking...
Used as lightweight virtual machines or as enhanced chroot environments, Linux containers, and in particular the Docker abstraction over them, are more and more popular in the virtualization communities.
LHCb Core Software team decided to investigate how to use Docker containers to provide stable and reliable build environments for the different supported platforms, including the obsolete...
Because of user demand and to support new development workflows based on code review and multiple development streams, LHCb decided to port the source code management from Subversion to Git, using the CERN GitLab hosting service.
Although tools exist for this kind of migration, LHCb specificities and development models required careful planning of the migration, development of migration...
The LHCb experiment relies on LHCbDIRAC, an extension of DIRAC, to drive its offline computing. This middleware provides a development framework and a complete set of components for building distributed computing systems. These components are currently installed and ran on virtual machines (VM) or bare metal hardware. Due to the increased load of work, high availability is becoming more and...
LStore was developed to satisfy the ever-growing need for
cost-effective, fault-tolerant, distributed storage. By using erasure
coding for fault-tolerance, LStore has an
order of magnitude lower probability of data loss than traditional
3-replica storage while incurring 1/2 the storage overhead. LStore
was integrated with the Data Logistics Toolkit (DLT) to introduce
LStore to a wider...
Within the ATLAS detector, the Trigger and Data Acquisition system is responsible for the online processing of data streamed from the detector during collisions at the Large Hadron Collider at CERN. The online farm is comprised of ~4000 servers processing the data read out from ~100 million detector channels through multiple trigger levels. Configuring of these servers is not an easy task,...
MCBooster is a header-only, C++11-compliant library for the generation of large samples of phase-space Monte Carlo events on massively parallel platforms. It was released on GitHub in the spring of 2016. The library core algorithms implement the Raubold-Lynch method; they are able to generate the full kinematics of decays with up to nine particles in the final state. The library supports the...
The European project INDIGO-DataCloud aims at developing an advanced computing and data platform. It provides advanced PaaS functionalities to orchestrate the deployment of Long-Running Services (LRS) and the execution of jobs (workloads) across multiple sites through a federated AAI architecture.
The multi-level and multi-site orchestration and scheduling capabilities of the INDIGO PaaS...
High energy physics experiments are implementing highly parallel solutions for event processing on resources that support
concurrency at multiple levels. These range from the inherent large-scale parallelism of HPC resources to the multiprocessing and
multithreading needed for effective use of multi-core and GPU-augmented nodes.
Such modes of processing, and the efficient opportunistic use of...
Any time you modify an implementation within a program, change compiler version or operating system, you should also do regression testing. You can do regression testing by rerunning existing tests against the changes to determine whether this breaks anything that worked prior to the change and by writing new tests where necessary. At LHCb we have a huge codebase which is maintained by many...
Collaborative services and tools are essential for any (HEP) experiment.
They help to integrate global virtual communities by allowing to share
and exchange relevant information among members by way of web-based
services.
Typical examples are public and internal web pages, wikis, mailing list
services, issue tracking system, services for meeting organization and
document and authorship...
Traditional T2 grid sites still process large amounts of data flowing from the LHC and elsewhere. More flexible technologies, such as virtualisation and containerisation, are rapidly changing the landscape, but the right migration paths to these sunlit uplands are not well defined yet. We report on the innovations and pressures that are driving these changes and we discuss their pros and cons....
The Compact Muon Solenoid (CMS) experiment makes a vast use of alignment and calibration measurements in several crucial workflows: in the event selection at the High Level Trigger (HLT), in the processing of the recorded collisions and in the production of simulated events. A suite of services addresses the key requirements for the handling of the alignment and calibration conditions such as:...
Monitoring the quality of the data, DQM, is crucial in a high-energy physics experiment to ensure the correct functioning of the apparatus during the data taking. DQM at LHCb is carried out in two phase. The first one is performed on-site, in real time, using unprocessed data directly from the LHCb detector, while the second, also performed on-site, requires the reconstruction of the data...
As more detailed and complex simulations are required in different application domains, there is much interest in adapting the code for parallel and multi-core architectures. Parallelism can be achieved by tracking many particles at the same time. This work presents MPEXS, a CUDA implementation of the core Geant4 algorithm used for the simulation of electro-magnetic interactions (electron,...
In a large Data Center, such as a LHC Tier-1, where the structure of the Local Area Network and Cloud Computing Systems varies on a daily basis, network management has become more and more complex.
In order to improve the operational management of the network, this article presents a real-time network topology auto-discovery tool named Netfinder.
The information required for effective...
The CernVM File System today is commonly used to host and distribute application software stacks. In addition to this core task, recent developments expand the scope of the file system into two new areas. Firstly, CernVM-FS emerges as a good match for container engines to distribute the container image contents. Compared to native container image distribution (e.g. through the ``Docker...
Monitoring of IT infrastructure and services is essential to maximize availability and minimize disruption, by detecting failures and developing issues to allow rapid intervention.
The HEP group at Liverpool have been working on a project to modernize local monitoring infrastructure (previously provided using Nagios and ganglia) with the goal of increasing coverage, improving visualization...
In this paper, we'll talk about our experiences with different data storage technologies within the ATLAS Distributed Data Management
system, and in particular about object-based storage. Object-based storage differs in many points from traditional file system
storage and offers a highly scalable, simple and most common storage solution for the cloud. First, we describe the needed changes
in...
The offline software for the CMS Leve-1 trigger provides a reliable bitwise emulation of the high-speed custom FPGA-based hardware at the foundation of the CMS data acquisition system. The staged upgrade of the trigger system requires flexible software that accurately reproduces the system at each stage using recorded running conditions. The high intensity of the upgraded LHC necessitates new...
With the demand for more computing power and the widespread use of parallel and distributed computing, applications are looking for message-based transport solutions for fast, stateless communication. There are many solutions already available, with competing performances, but with varying APIs, making it difficult to support all of them. Trying to find a solution to this problem we decided to...
Managing resource allocation in a Cloud based data center serving multiple virtual organizations is a challenging issue. In fact, while batch systems are able to allocate resources to different user groups according to specific shares imposed by the data center administrators, without a static partitioning of such resources, this is not so straightforward in the most common Cloud frameworks,...
The CERN Web Frameworks team has deployed OpenShift Origin to facilitate deployment of web applications and improve resource efficiency. OpenShift leverages Docker containers and Kubernetes orchestration to provide a Platform-as-a-service solution oriented for web applications. We will review use cases and how OpenShift was integrated with other services such as source control, web site...
The PANDA experiment, one of the four scientific pillars of the FAIR facility currently in construction in Darmstadt, Germany, is a next-generation particle detector that will study collisions of antiprotons with beam momenta of 1.5–15 GeV/c on a fixed proton target.
Because of the broad physics scope and the similar signature of signal and background events in the energy region of...
This work combines metric and parallel computing on both multi-GPU and distributed memory architectures when applied to
multi-million or even billion bodies simulations.
Metric trees are data structures for indexing multidimensional sets of points in arbitrary metric spaces. First proposed by Jeffrey
K. Uhlmann [1], as a structure to efficiently solve neighbourhood queries, they have...
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events at a rate of 100 kHz. It transports event data at an aggregate throughput of ~100 GB/s to the high-level trigger (HLT) farm. The CMS DAQ system has been completely rebuilt during the first long shutdown of the LHC in 2013/14. The new DAQ architecture is based on state-of-the-art...
Graphical Processing Units (GPUs) represent one of the most sophisticated
and versatile parallel computing architectures available that are nowadays
entering the High Energy Physics field. GooFit is an open source tool
interfacing ROOT/RooFit to the CUDA platform on nVidia GPUs (it also
supports OpeMP). Specifically it acts as an interface between the MINUIT
minimization algorithm and a...
The computing power of most modern commodity computers is far from being fully exploited by standard usage patterns.
The work we present describes the development and setup of a virtual computing cluster based on Docker containers used as worker nodes. The facility is based on Plancton[1]: a lightweight fire-and-forget background service that spawns and controls a local pool of Docker...
The Alpha Magnetic Spectrometer (AMS) on board of the International Space Station (ISS) requires a large amount of computing power for data production and Monte Carlo simulation. A large fraction of the computing resource has been contributed by the computing centers among the AMS collaboration. AMS has 12 “remote” computing centers outside of Science Operation Center at CERN, with different...
A major challenge for data production at the IceCube Neutrino Observatory presents itself in connecting a large set of small clusters together to form a larger computing grid. Most of these clusters do not provide a Grid interface. Using a local account on each submit machine, HTCondor glideins can be submitted to virtually any type of scheduler. The glideins then connect back to a main...
The Alignment, Calibrations and Databases group at the CMS Experiment delivers Alignment and Calibration Conditions Data to a large set of workflows which process recorded event data and produce simulated events. The current infrastructure for releasing and consuming Conditions Data was designed in the two years of the first LHC long shutdown to respond to use cases from the preceding...
Cppyy provides fully automatic Python/C++ language bindings and so doing
covers a vast number of use cases. Use of conventions and known common
patterns in C++ (such as smart pointers, STL iterators, etc.) allow us to
make these C++ constructs more "pythonistic." We call these treatments
"pythonizations", as the strictly bound C++ code is turned into bound code
that has a Python "feel."...
AFP, the ATLAS Forward Proton detector upgrade project consists of two
forward detectors at 205 m and 217 m on each side of the ATLAS
experiment at the LHC. The new detectors aim to measure momenta and
angles of diffractively scattered protons. In 2016 two detector stations
on one side of the ATLAS interaction point have been installed and are
being commissioned.
The front-end electronics...
The current LHCb trigger system consists of a hardware level, which reduces the LHC bunch-crossing rate of 40 MHz to 1 MHz, a rate at which the entire detector is read out. A second level, implemented in a farm of around 20k parallel processing CPUs, the event rate is reduced to around 12.5 kHz. The LHCb experiment plans a major upgrade of the detector and DAQ system in the LHC long shutdown...
The ongoing integration of clouds into the WLCG raises the need for a detailed health and performance monitoring of the virtual resources in order to prevent problems of degraded service and interruptions due to undetected failures. When working in scale, the existing monitoring diversity can lead to a metric overflow whereby the operators need to manually collect and correlate data from...
In this work we report on recent progress of the Geant4 electromagnetic (EM) physics sub-packages. A number of new interfaces and models recently introduced are already used in LHC applications and may be useful for any type of simulation.
To improve usability, a new set of User Interface (UI) commands and corresponding C++ interfaces have been added for easier configuration of EM physics. In...
The endcap time of flight(TOF) detector of the BESIII experiment at the BEPCII was upgraded based on multigap resistive plate chamber technology. During 2015-2016 data taking the TOF system has achieved a total time resolution of 65ps for electrons in Bhabha events. Details of reconstruction and calibration procedures, detector alignment and performance with data will be described.
The STAR Heavy Flavor Tracker (HFT) was designed to provide high-precision tracking for the identification of charmed hadron decays in heavy ion collisions at RHIC. It consists of three independently mounted subsystems, providing four precision measurements along the track trajectory, with the goal of pointing decay daughters back to vertices displaced by <100 microns from the primary event...
Cloud computing can make IT resources configuration flexible and reduce the hardware cost,it also can privide computing service according to the real need.We are applying this computing mode to the Chinese Spallation Neutron Source(CSNS) computing environment.So from the research and practice aspects,firstly,the application status of cloud computing science in High Energy Physics Experiments...
IhepCloud is a multi-user virtualization platform which based on Openstack icehouse and deployed at Nov. 2014. The platform provides multiple types virtual machine, such as test VM, UI and WN, is a part of local computing system. There are 21 physical machines and 120 users on this platform and about 300 virtual machines running on it.
Upgrading IhepCloud from Icehouse to Kilo is difficult,...
Multi-VO supports based on DIRAC have been set up to provide workload and data management for several high energy experiments in IHEP. The distributed computing platform has 19 heterogeneous sites including Cluster, Grid and Cloud. The heterogeneous resources belong to different Virtual Organizations. Due to scale and heterogeneity, it is complicated to monitor and manage these resources...
One of the biggest challenge with Large scale data management system is to ensure the consistency between the global file catalog
and what is physically on all storage elements.
To tackle this issue, the Rucio software which is used by the ATLAS Distributed Data Management system has been extended to
automatically handle lost or unregistered files (aka Dark Data). This system automatically...
With the current distributed data management system for ATLAS, called Rucio, all user interactions, e.g. the Rucio command line
tools or the ATLAS workload management system, communicate with Rucio through the same REST-API. This common interface makes it
possible to interact with Rucio using a lot of different programming languages, including Javascript. Using common web...
The AliEn file catalogue is a global unique namespace providing mapping between a UNIX-like logical name structure and the corresponding physical files distributed over 80 storage elements worldwide. Powerful search tools and hierarchical metadata information are integral part of the system and are used by the Grid jobs as well as local users to store and access all files on the Grid storage...
The University of Notre Dame (ND) CMS group operates a modest-sized Tier-3 site suitable for local, final-stage analysis of CMS data. However, through the ND Center for Research Computing (CRC), Notre Dame researchers have opportunistic access to roughly 25k CPU cores of computing and a 100 Gb/s WAN network link. To understand the limits of what might be possible in this scenario, we...
A central timing (CT) is a dedicated system responsible for driving an accelerator behaviour. It allows operation teams to interactively select and schedule cycles. While executing a scheduled cycle a CT sends out events which (a) provide precise synchronization and (b) information what to do - to all equipment operating an accelerator. The events are also used to synchronize accelerators...
Simulation has been used for decades in various areas of computing science, such as network protocol design ,microprocessor design. By comparison, current practice in storage simulation is in its infancy. So we are trying to fulfill a simulator with Simgrid to simulate the storage part of application . Cluefs is a lightweight utility to collect data on the I/O events induced by an application...
Beam manipulation of high- and very-high-energy particle beams is a hot topic in accelerator physics. Coherent effects of ultra-relativistic particles in bent crystals allow the steering of particle trajectories thanks to the strong electrical field generated between atomic planes. Recently, a collimation experiment with bent crystals was carried out at the CERN-LHC [1], paving the way to the...
Geant4 is a toolkit for the simulation of the passage of particles through matter. Its areas of application include high energy, nuclear and accelerator physics as well as studies in medical and space science.
The Geant4 collaboration regularly performs validation and regression tests through its development cycle. A validation test compares results obtained with a specific Geant4 version...
The expected growth in HPC capacity over the next decade makes such resources attractive for meeting future computing needs of HEP/NP experiments, especially as their cost is becoming comparable to traditional clusters. However, HPC facilities rely on features like specialized operating systems and hardware to enhance performance that make them difficult to be used without significant changes...
SWIFT is a compiled object-oriented language similar in spirit to C++ but with the coding simplicity of a scripting language. Built with the LLVM compiler framework used within Xcode 6 and later versions, SWIFT features interoperability with C, Objective-C, and C++ code, truly comprehensive debugging and documentation features, and a host of language features that make for rapid and effective...
This paper introduces the storage strategy and tools of the science data of the Alpha Magnetic Spectrometer (AMS) at Science Operation Center (SOC) at CERN.
The AMS science data includes flight data, reconstructed data and simulation data, as well as the metadata of them. The data volume is 1070 TB per year of operation, and currently reached 5086 TB in total. We have two storage levels:...
Operational and other pressures have lead to WLCG experiments moving increasingly to a stratified model for Tier-2 resources, where "fat" Tier-2s ("T2Ds") and "thin" Tier-2s ("T2Cs") provide different levels of service.
In the UK, this distinction is also encouraged by the terms of the current GridPP5 funding model. In anticipation of this, testing has been performed on the implications, and...
We review the concept of support vector machines before proceeding to discuss examples of their use in a number of scenarios. Using the Toolkit for Multivariate Analysis (TMVA) implementation we discuss examples relevant to HEP including background suppression for H->tau+tau- at the LHC. The use of several different kernel functions and performance benchmarking is discussed.
The Large Hadron Collider at CERN restarted in 2015 with a higher
centre-of-mass energy of 13 TeV. The instantaneous luminosity is expected
to increase significantly in the coming
years. An upgraded Level-1 trigger system is being deployed in the CMS
experiment in order to maintain the same efficiencies for searches and
precision measurements as those achieved in
the previous run. This system...
CERN currently manages the largest data archive in the HEP domain; over 135PB of custodial data is archived across 7 enterprise tape libraries containing more than 20,000 tapes and using over 80 tape drives. Archival storage at this scale requires a leading edge monitoring infrastructure that acquires live and lifelong metrics from the hardware in order to assess and proactively identify...
The ATLAS collaboration has recently setup a number of citizen science projects which have a strong IT component and could not have been envisaged without the growth of general public computing resources and network connectivity: event simulation through volunteer computing, algorithms improvement via Machine Learning challenges, event display analysis on citizen science platforms, use of...
The LHC has been providing pp collisions with record luminosity and energy since the start of Run 2 in 2015. In the ATLAS experiment the Trigger and Data Acquisition system has been upgraded to deal with the increased event rates. The dataflow element of the system is distributed across hardware and software and is responsible for buffering and transporting event data from the Readout system...
The LHC, at design capacity, has a bunch-crossing rate of 40 MHz whereas the ATLAS experiment at the LHC has an average recording rate of about 1000 Hz. To reduce the rate of events but still maintain a high efficiency of selecting rare events such as physics signals beyond the Standard Model, a two-level trigger system is used in ATLAS. Events are selected based on physics signatures such as...
An overview of the CMS Data analysis school (CMSDAS) model and experience is provided. The CMSDAS is the official school that CMS organize every year in US, in Europe and in Asia to train students, Ph.D and young post-docs for the physics analysis. It consists of two days of short exercises about physics objects reconstruction and identification and 2.5 days of long exercises about physics...
The Czech National Grid Infrastructure is operated by MetaCentrum, a CESNET department responsible for coordinating and managing activities related to distributed computing. CESNET as the Czech National Research and Education Network (NREN) provides many e-infrastructure services, which are used by 94% of the scientific and research community in the Czech Republic. Computing and storage...
In the sociology of small- to mid-sized (O(100) collaborators) experiments the issue of data collection and storage is sometimes felt as a residual problem for which well-established solutions are known. Still, the DAQ system can be one of the few forces that drive towards the integration of otherwise loosely coupled detector systems. As such it may be hard to complete with
off-the-shelf...
The Data and Software Preservation for Open Science (DASPOS) collaboration has developed an ontology for describing particle physics analyses. The ontology, a series of data triples, is designed to describe dataset, selection cuts, and measured quantities for an analysis. The ontology specification, written in the Web Ontology Language (OWL), is designed to be interpreted by many pre-existing...
The growth in size and geographical distribution of scientific collaborations, while enabling researcher to achieve always higher and bolder results, also poses new technological challenges, one of these being the additional efforts to analyse and troubleshoot network flows that travel for thousands of miles, traversing a number of different network domains. While the day-to-day multi-domain...
In order to generate the huge number of Monte Carlo events that will be required by the ATLAS experiment over the next several runs, a very fast simulation is critical. Fast detector simulation alone, however, is insufficient: with very high numbers of simultaneous proton-proton collisions expected in Run 3 and beyond, the digitization (detector response emulation) and event reconstruction...
Contemporary distributed computing infrastructures (DCIs) are not easily and securely accessible by common users. Computing environments are typically hard to integrate due to interoperability problems resulting from the use of different authentication mechanisms, identity negotiation protocols and access control policies. Such limitations have a big impact on the user experience making it...
High-throughput computing requires resources to be allocated so that jobs can be run. In a highly distributed environment that may be comprised of multiple levels of queueing, it may not be certain where, what and when jobs will run. It is therefore desirable to first acquire the resource before assigning it a job. This late-binding approach has been implemented in resources managed by batch...
The LHCb Grid access if based on the LHCbDirac system. It provides access to data and computational resources to researchers with different geographical locations. The Grid has a hierarchical topology with multiple sites distributed over the world. The sites differ from each other by their number of CPUs, amount of disk storage and connection bandwidth. These parameters are essential for the...
Within the HEPiX virtualization group and the WLCG MJF Task Force, a mechanism has been developed which provides access to detailed information about the current host and the current job to the job itself. This allows user payloads to access meta information, independent of the current batch system or virtual machine model. The information can be accessed either locally via the filesystem on a...
IO optimizations along with the vertical and horizontal elasticity of an application are essential to achieve data processing performance linear scalability. However to deploy these three critical concepts in a unified software environment presents a challenge and as a result most of the existing data processing frameworks rely on external solutions to address them. For example in a multicore...
The ATLAS Trigger & Data Acquisition project was started almost twenty years ago with the aim of providing a scalable distributed data collection system for the experiment. While the software dealing with physics dataflow was implemented by directly using low level communication protocols, like TCP and UDP, the control and monitoring infrastructure services for the system were implemented on...
The Compact Muon Solenoid (CMS) experiment makes a vast use of alignment and calibration measurements in several data processing workflows. Such measurements are produced either by automated workflows or by analysis tasks carried out by experts in charge. Very frequently, experts want to inspect and exchange with others in CMS the time evolution of a given calibration, or want to monitor the...
The Alpha Magnetic Spectrometer (AMS) on board of the International Space Station (ISS) requires a large amount of computing power for data production and Monte Carlo simulation. Recently the AMS Offline software was ported to IBM Blue Gene/Q architecture. The supporting software/libraries which have been successfully ported include: ROOT 5.34, GEANT4.10, CERNLIB, and AMS offline data...
The Resource Manager is one of the core components of the Data Acquisition system of the ATLAS experiment at the LHC. The Resource Manager marshals the right for applications to access resources which may exist in multiple but limited copies, in order to avoid conflicts due to program faults or operator errors.
The access to resources is managed in a manner similar to what a lock manager...
SuperKEKB, a next generation B factory, has finished being constructed in Japan as an upgrade of the KEKB e+e- collider. Currently it is running with the BEAST II detector, whose purpose is to understand the interaction and background events at the beam collision region in preparation for the 2018 launch of the Belle II detector. Overall SuperKEKB is expected to deliver a rich data set for the...
The ZEUS data preservation (ZEUS DP) project assures
continued access to the analysis software, experimental data and
related documentation.
The ZEUS DP project supports the possibility to derive valuable
scientific results from the ZEUS data in the future.
The implementation of the data preservation is discussed in the
context of contemporary data analyses and of planning of...
Daily operation of a large scale experimental setup is a challenging task both in terms of maintenance and monitoring. In this work we describes an approach for automated Data Quality system. Based on the Machine Learning methods it can be trained online on manually-labeled data by human experts. Trained model can assist data quality managers filtering obvious cases (both good and bad) and...
Software development in high energy physics follows the open-source
software (OSS) approach and relies heavily on software being developed
outside the field. Creating a consistent and working stack out of 100s
of external, interdependent packages on a variety of platforms is a
non-trivial task. Within HEP, multiple technical solutions exist to
configure and build those stacks (so-called...
As a robust and scalable storage system, dCache has always allowed the number of storage nodes and user accessible endpoints to be scaled horizontally, providing several levels of fault tolerance and high throughput. Core management services like the POSIX name space and central load balancing components however are merely vertically scalable. This greatly limits the scalability of the core...
The SHiP is a new fixed-target experiment at the CERN SPS accelerator. The goal of the experiment is searching for hidden particles predicted by the models of Hidden Sectors. The purposes of the SHiP Spectrometer Tracker is to reconstruct the tracks of charged particles from the decay of neutral New Physics objects with high efficiency, while rejecting background events. The problem is to...
Electron, muon and photon triggers covering transverse energies from a few GeV to several TeV are essential for signal selection in a wide variety of ATLAS physics analyses to study Standard Model processes and to search for new phenomena. Final states including leptons and photons had, for example, an important role in the discovery and measurement of the Higgs particle. Dedicated triggers...
CERN’s enterprise Search solution “CERN Search” provides a central search solution for users and CERN service providers. A total of about 20 million public and protected documents from a wide range of document collections is indexed, including Indico, TWiki, Drupal, SharePoint, JACOW, E-group archives, EDMS, and CERN Web pages.
In spring 2015, CERN Search was migrated to a new...
The Queen Mary University of London grid site's Lustre file system has recently undergone a major upgrade from version 1.8 to the most recent 2.8 release, and the capacity increased to over 3 PB. Lustre is an open source, POSIX compatible, clustered file system presented to the Grid using the StoRM Storage Resource Manager. The motivation and benefits of upgrading including hardware and...
The international Muon Ionization Cooling Experiment (MICE) is designed to demonstrate the principle of muon ionisation cooling for the first time, for application to a future Neutrino Factory or Muon Collider. The experiment is currently under construction at the ISIS synchrotron at the Rutherford Appleton Laboratory, UK. As presently envisaged, the programme is divided into three Steps:...
The storage ring for the Muon g-2 experiment is composed of twelve custom vacuum chambers designed to interface with tracking and calorimeter detectors. The irregular shape and complexity of the chamber design made implementing these chambers in a GEANT simulation with native solids difficult. Instead, we have developed a solution that uses the CADMesh libraries to convert STL files from 3D...
ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider).
ALICE has been successfully collecting physics data of Run 2 since spring 2015. In parallel, preparations for a major upgrade of the computing system, called O2 (Online-Offline) and scheduled for...
Docker is a container technology that provides a way to "wrap up a
piece of software in a complete filesystem that contains everything it
needs to run" [1]. We have experimented with Docker to investigate its
utility in three broad realms: (1) allowing existing complex software
to run in very different environments from that in which the software
was built (such as Cori, NERSC's newest...
CPU cycles for small experiments and projects can be scarce, thus making use of
all available resources, whether dedicated or opportunistic, is
mandatory. While enabling uniform access to the LCG computing elements (ARC,
CREAM), the DIRAC grid interware was not able to use OSG computing elements
(GlobusCE, HTCondor-CE) without dedicated support at the grid site through so
called...
RHIC & ATLAS Computing Facility (RACF) at BNL is a 15000 sq. ft. facility hosting the IT equipment of the BNL ATLAS WLCG Tier-1 site, offline farms for the STAR and PHENIX experiments operating at the Relativistic Heavy Ion Collider (RHIC), BNL Cloud installations, various Open Science Grid (OSG) resources, and many other physics research oriented IT installations of a smaller scale. The...
The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the...
The ATLAS experiment is one of four detectors located on the Large Hardon Collider (LHC) based at CERN. Its detector control system (DCS) stores the slow control data acquired within the back-end of distributed WinCC OA applications. The data can be retrieved for future analysis, debugging and detector development in an Oracle relational database.
The ATLAS DCS Data Viewer (DDV) is a...
In order to patch web servers and web application in a timely manner, we first need to know which software packages are used, and where. But, a typical web stack is composed of multiple layers, including the operating system, web server, application server, programming platform and libraries, database server, web framework, content management system etc. as well as client-side tools. Keeping...
Windows Terminal Servers provide application gateways for various parts of the CERN accelerator complex, used by hundreds of CERN users every day. The combination of new tools such as Puppet, HAProxy and Microsoft System Center suite enable automation of provisioning workflows to provide a terminal server infrastructure that can scale up and down in an automated manner. The orchestration does...
We present the novel Analysis Workflow Management (AWM) that provides users with the tools and competences of professional large scale workflow systems. The approach presents a paradigm shift from executing parts of the analysis to defining the analysis.
Within AWM an analysis consists of steps. For example, a step defines to run a certain executable for multiple files of an input data...
When we first introduced XRootD storage system to the LHC, we needed a filesystem interface so that XRootD system could function as a Grid Storage Element. The result was XRootDfs, a FUSE based mountable posix filesystem. It glues all the data servers in a XRootD storage system together and presents it as a single, posix compliant, multi-user networked filesystem. XRootD's unique redirection...
The Yet Another Rapid Readout (YARR) system is a DAQ system designed for the readout of current generation ATLAS Pixel FE-I4 and next generation ATLAS ITk chips. It utilises a commercial-of-the-shelf PCIe FPGA card as a reconfigurable I/O interface, which acts as a simple gateway to pipe all data from the pixel chips via the high speed PCIe connection into the host systems memory. Relying on...