The software suite required to support a modern high energy physics experiment is typically made up of many experiment-specific packages in addition to a large set of external packages. The developer-level build system has to deal with external package discovery, versioning, build variants, user environments, etc. We find that various systems for handling these requirements divide the problem...
The ALICE experiment at CERN was designed to study the properties of the strongly-interacting hot and dense matter created in heavy-ion collisions at the LHC energies. The computing model of the experiment currently relies on the hierarchical Tier-based structure, with a top-level Grid site at CERN (Tier-0, also extended to Wigner) and several globally distributed datacenters at national and...
In the ideal limit of infinite resources, multi-tenant applications are able to scale in/out on a Cloud driven only by their functional requirements. A large Public Cloud may be a reasonable approximation of this condition, where tenants are normally charged a posteriori for their resource consumption. On the other hand, small scientific computing centres usually work in a saturated regime...
Application performance is often assessed using the Performance Monitoring Unit (PMU) capabilities present in modern processors. One popular tool that can read the PMU's performance counters is the Linux-perf tool. pmu-tools is a toolkit built around Linux-perf that provides a more powerful interface to the different PMU events and give a more abstracted view of the events. Unfortunately...
OpenStack is an open source cloud computing project that is enjoying wide popularity. More and more organizations and enterprises deploy it to provide their private cloud services. However, most organizations and enterprises cannot achieve unified user management access control to the cloud service, since the authentication and authorization systems of Cloud providers are generic and they...
Belle II experiment can take advantage from Data federation technologies to simplify access to distributed datasets and file replicas. The increasing adoption of http and webdav protocol by sites, enable to create lightweight solutions to give an aggregate view of the distributed storage.
In this work, we make a study on the possible usage of the software Dynafed developed by CERN for the...
Virtual machines have many features — flexibility, easy controlling and customized system environments. More and more organizations and enterprises begin to deploy virtualization technology and cloud computing to construct their distributed system. Cloud computing is widely used in high energy physics field. In this presentation, we introduce an integration of virtual machines with HTCondor,...
The Computing Center of the Institute of Physics (CC IoP) of the Czech Academy of Sciences serves a broad spectrum of users with various computing needs. It runs WLCG Tier-2 center for the ALICE and the ATLAS experiments; the same group of services is used by astroparticle physics projects the Pierre Auger Observatory (PAO) and the Cherenkov Telescope Array (CTA). OSG stack is installed for...
With the era of big data emerging, Hadoop has become de facto standard of big data processing. However, it is still difficult to get High Energy Physics (HEP) applications run efficiently on HDFS platform. There are two reasons to explain. Firstly, Random access to events data is not supported by HDFS platform. Secondly, it is difficult to make HEP applications adequate to Hadoop data...
The complex geometry of the whole detector of the ATLAS experiment at LHC is currently stored only in custom online databases, from which it is built on-the-fly on request. Accessing the online geometry guarantees accessing the latest version of the detector description, but requires the setup of the full ATLAS software framework "Athena", which provides the online services and the tools to...
The INFN Section of Turin hosts a middle-size multi-tenant cloud infrastructure optimized for scientific computing.
A new approach exploiting the features of VMDIRAC and aiming to allow for dynamic automatic instantiation and destruction of Virtual Machines from different tenants, in order to maximize the global computing efficiency of the infrastructure, has been designed, implemented and...
The use of Webdav protocol to access at large storage areas is becoming popular in the High Energy Physics community. All the main Grid and Cloud storage solutions provide such kind of interface, in this scenario, tuning the storage systems and performance evaluation became crucial aspects to promote the adoption of these protocols within the Belle II community.
In this work, we present the...
The ATLAS software infrastructure facilitates efforts of more than 1000
developers working on the code base of 2200 packages with 4 million C++
and 1.4 million python lines. The ATLAS offline code management system is
the powerful, flexible framework for processing new package versions
requests, probing code changes in the Nightly Build System, migration to
new platforms and compilers,...
ATLAS is a high energy physics experiment in the Large Hadron Collider
located at CERN.
During the so called Long Shutdown 2 period scheduled for late 2018,
ATLAS will undergo
several modifications and upgrades on its data acquisition system in
order to cope with the
higher luminosity requirements. As part of these activities, a new
read-out chain will be built
for the New Small Wheel muon...
Distributed computing infrastructures require automatic tools to strengthen, monitor and analyze the security behavior of computing devices. These tools should inspect monitoring data such as resource usage, log entries, traces and even processes' system calls. They also should detect anomalies that could indicate the presence of a cyber-attack. Besides, they should react to attacks without...
This paper reports on the activities aimed at improving the architecture and performance of the ATLAS EventIndex implementation in Hadoop. The EventIndex contains tens of billions event records, each of which consisting of ~100 bytes, all having the same probability to be searched or counted. Data formats represent one important area for optimizing the performance and storage footprint of...
The engineering design of a particle detector is usually performed in a
Computer Aided Design (CAD) program, and simulation of the detector's performance
can be done with a Geant4-based program. However, transferring the detector
design from the CAD program to Geant4 can be laborious and error-prone.
SW2GDML is a tool that reads a design in the popular SolidWorks CAD
program and...
The Compact Muon Solenoid (CMS) experiment makes a vast use of alignment and calibration measurements in several data processing workflows: in the High Level Trigger, in the processing of the recorded collisions and in the production of simulated events for data analysis and studies of detector upgrades. A complete alignment and calibration scenario is factored in approximately three-hundred...
The Trigger and Data Acquisition system of the ATLAS detector at the Large Hadron
Collider at CERN is composed of a large number of distributed hardware and software
components (about 3000 machines and more than 25000 applications) which, in a coordinated
manner, provide the data-taking functionality of the overall system.
During data taking runs, a huge flow of operational data is produced...
Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments.
One of the challenges with exploiting volunteer computing is to support a global community of volunteers that provides heterogeneous resources.
However, HEP applications require more data input and output than the CPU intensive applications that are typically used by other...
Deploying a complex application on a Cloud-based infrastructure can be a challenging task. Among other things, the complexity can derive from software components the application relies on, from requirements coming from the use cases (i.e. high availability of the components, autoscaling, disaster recovery), from the skills of the users that have to run the application.
Using an orchestration...
As demand for widely accessible storage capacity increases and usage is on the rise, steady IO performance is desired but tends to suffer within multi-user environments. Typical deployments use standard hard drives as the cost per/GB is quite low. On the other hand, HDD based solutions for storage are not known to scale well with process concurrency and soon enough, high rate of IOPs create a...
The variety of the ATLAS Distributed Computing infrastructure requires a central information
system to define the topology of computing resources and to store the different parameters and
configuration data which are needed by the various ATLAS software components.
The ATLAS Grid Information System (AGIS) is the system designed to integrate configuration
and status ...
GooFit, a GPU-friendly framework for doing maximum-likelihood fits, has been extended in functionality to do a full amplitude analysis of scalar mesons decaying into four final states via various combinations of intermediate resonances. Recurring resonances in different amplitudes are recognized and only calculated once, to save memory and execution time. As an example, this tool can be used...
The AMS data production uses different programming modules for job submission, execution and management, as well as for validation of produced data. The modules communicate with each other using CORBA interface. The main module is the AMS production server, a scalable distributed service which links all modules together starting from job submission request and ending with writing data to disk...
Efficient administration of computing centres requires advanced tools for the monitoring and front-end interface of their infrastructure. The large-scale distributed grid systems, like the Worldwide LHC Computing Grid (WLCG) and ATLAS computing, offer many existing web pages and information sources indicating the status of the services, systems, requests and user jobs at grid sites. These...
The IT Storage group at CERN develops the software responsible for archiving to tape the custodial copy of the physics data generated by the LHC experiments. Physics run 3 will start in 2021 and will introduce two major challenges for which the tape archive software must be evolved. Firstly the software will need to make more efficient use of tape drives in order to sustain the predicted data...
An Job Accounting tool for IHEP Computing
The computing services running at computing center of IHEP support some HEP experiments and bio-medicine study. It provides 120,000 cpu cores including 3 local cluster and a Tier 2 grid site. A private cloud with 1000 cpu cores has been established to fit the experiment peak requirement. Besides, the computing center has several remote clusters as...
The pilot model employed by the ATLAS production system has been in use for many years. The model has proven to be a success, with many
advantages over push models. However one of the negative side-effects of using a pilot model is the presence of 'empty pilots' running
on sites, consuming a small amount of walltime and not running a useful payload job. The impact on a site can be significant,...
A new analysis category based on g4tools was added in Geant4 release 9.5 with the aim of providing users with a lightweight analysis tool available as part of the Geant4 installation without the need to link to an external analysis package. It has progressively replaced the usage of external tools based on AIDA (Abstract Interfaces for Data Analysis) in all Geant4 examples. Frequent questions...
Simulation of particle-matter interactions in complex geometries is one of
the main tasks in high energy physics (HEP) research. Geant4 is the most
commonly used tool to accomplish it.
An essential aspect of the task is an accurate and efficient handling
of particle transport and crossing volume boundaries within a
predefined (3D) geometry.
At the core of the Geant4 simulation toolkit,...
The distributed computing system in Institute of High Energy Physics (IHEP), China, is based on DIRAC middleware. It integrates about 2000 CPU cores and 500 TB storage contributed by 16 distributed cites. These sites are of various type, such as cluster, grid, cloud and volunteer computing. This system went into production status in 2012. Now it supports multi-VO and serves three HEP...
Previous research has shown that it is relatively easy to apply a simple shim to conventional WLCG storage interfaces, in order to add Erasure coded distributed resilience to data.
One issue with simple EC models is that, while they can recover from losses without needing additional full copies of data, recovery often involves reading the all of the distributed chunks of the file (and their...
Likelihood ratio tests are a well established technique for statistical inference in HEP. Because of the complicated detector response, we usually cannot evaluate the likelihood function directly. Instead, we usually build templates based on (Monte Carlo) samples from a simulator (or generative model). However, this approach doesn't scale well to high dimensional observations.
We describe...
Many Grid sites have the need to reduce operational manpower, and running a storage element consumes a large amount of effort. In
addition, setting up a new Grid site including a storage element involves a steep learning curve and large investment of time. For
these reasons so-called storage-less sites are becoming more popular as a way to provide Grid computing resources with...
Maintainability is a critical issue for large scale, widely used software systems, characterized by a long life cycle. It is of paramount importance for a software toolkit, such as Geant4, which is a key instrument for research and industrial applications in many fields, not limited to high energy physics.
Maintainability is related to a number of objective metrics associated with...
Consolidation towards more computing at flat budgets beyond what pure chip technology
can offer, is a requirement for the full scientific exploitation of the future data from the
Large Hadron Collider. One consolidation measure is to exploit cloud infrastructures whenever
they are financially competitive. We report on the technical solutions and the performance used
and achieved running...
The ATLAS Experiment at the LHC is recording data from proton-proton collisions with 13 TeV
center-of-mass energy since spring 2015. The ATLAS collaboration has set up, updated
and optimized a fast physics monitoring framework (TADA) to automatically perform a broad
range of validation and to scan for signatures of new physics in the rapidly growing data.
TADA is designed to provide fast...
The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence.
Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for
metadata aggregation and cataloguing. We briefly describe the architecture, the main services
and the benefits of using AMI in big collaborations, especially for high energy physics.
We focus on the...
The ATLAS experiment explores new hardware and software platforms that, in the future,
may be more suited to its data intensive workloads. One such alternative hardware platform
is the ARM architecture, which is designed to be extremely power efficient and is found
in most smartphones and tablets.
CERN openlab recently installed a small cluster of ARM 64-bit evaluation prototype servers....
The ATLAS Distributed Data Management system stores more than 180PB of physics data across more than 130 sites globally. Rucio, the
new data management system of the ATLAS collaboration, has now been successfully operated for over a year. However, with the
forthcoming resumption of data taking for Run 2 and its expected workload and utilization, more automated and advanced methods...
The LHCb Vertex Locator (VELO) is a silicon strip semiconductor detector operating at just 8mm distance to the LHC beams. Its 172,000 strips are read at a frequency of 1 MHz and processed by off-detector FPGAs followed by a PC cluster that reduces the event rate to about 10 kHz. During the second run of the LHC, which lasts from 2015 until 2018, the detector performance will undergo continued...
The exploitation of volunteer computing resources has become a popular practice in the HEP computing community as the huge amount of potential computing power it provides. In the recent HEP experiments, the grid middleware has been used to organize the services and the resources, however it relies heavily on the X.509 authentication, which is contradictory to the untrusted feature of volunteer...
Performance measurements and monitoring are essential for the efficient use of computing resources. In a commercial cloud environment an exhaustive resource profiling has additional benefits due to the intrinsic variability of the virtualised environment. In this context resource profiling via synthetic benchmarking quickly allows to identify issues and mitigate them. Ultimately it provides...
In this paper we explain how the C++ code quality is managed in ATLAS using a range of tools from compile-time through to run time testing and reflect on the substantial progress made in the last two years largely through the use of static analysis tools such as Coverity®, an industry-standard tool which enables quality comparison with general open source C++ code. Other available code...
This contribution introduces a new dynamic data placement agent for the ATLAS distributed data management system. This agent is
designed to pre-place potentially popular data to make it more widely available. It uses data from a variety of sources. Those
include input datasets and sites workload information from the ATLAS workload management system, network metrics from different
sources like...
CBM is a heavy-ion experiment at the future FAIR facility in
Darmstadt, Germany. Featuring self-triggered front-end electronics and
free-streaming read-out event selection will exclusively be done by
the First Level Event Selector (FLES). Designed as an HPC cluster,
its task is an online analysis and selection of
the physics data at a total input data rate exceeding 1 TByte/s. To
allow...
CERN Document Server (CDS) is the CERN Institutional Repository, playing a key role in the storage, dissemination and archival for all research material published at CERN, as well as multimedia and some administrative documents. As the CERN’s document hub, it joins together submission and publication workflows dedicated to the CERN experiments, but also to the video and photo teams, to the...
OpenAFS is the legacy solution for a variety of use cases at CERN, most notably home-directory services. OpenAFS has been used as the primary shared file-system for Linux (and other) clients for more than 20 years, but despite an excellent track record the project's age and architectural limitations are becoming more evident. We are now working to offer an alternative solution based on...
A new approach to providing scientific computing services is currently investigated at CERN. It combines solid existing components and services (EOS Storage, CERNBox Cloud Sync&Share layer, ROOT Analysis Framework) with rising new technologies (Jupyter Notebooks) to create a unique environment for Interactive Data Science, Scientific Computing and Education Applications.
EOS is the main disk...
The CMS experiment collects and analyzes large amounts of data coming from high energy particle collisions produced by the Large Hadron Collider (LHC) at CERN. This involves a huge amount of real and simulated data processing that needs to be handled in batch-oriented platforms. The CMS Global Pool of computing resources provide +100K dedicated CPU cores and another 50K to 100K CPU cores from...
CMS deployed a prototype infrastructure based on Elastic Search that stores all classAds from the global pool. This includes detailed information on IO, CPU, datasets, etc. etc. for all analysis as well as production jobs. We will present initial results from analyzing this wealth of data, describe lessons learned, and plans for the future to derive operational benefits from analyzing this...
One of the primary objectives of the research on GEMs at CERN is the testing and simulation of prototypes, manufacturing of large-scale GEM detectors and installation into CMS detector sections at the outer layer, where only highly energetic muons particles are detected. When a muon particle traverses a GEM detector, it ionizes the gas molecules generating a freely moving electron that starts...
One of the difficulties experimenters encounter when using a modular event-processing framework is determining the appropriate configuration for the workflow they intend to execute. A typical solution is to provide documentation external to the C++ code source that explains how a given component of the workflow is to be configured. This solution is fragile, because the documentation and the...
Throughout the first year of LHC Run 2, ATLAS Cloud Computing has undergone
a period of consolidation, characterized by building upon previously established systems,
with the aim of reducing operational effort, improving robustness, and reaching higher scale.
This paper describes the current state of ATLAS Cloud Computing.
Cloud activities are converging on a common contextualization...
The Belle II experiment is the upgrade of the highly successful Belle experiment located at the KEKB asymmetric-energy e+e- collider at KEK in Tsukuba, Japan. The Belle experiment collected e+e- collision data at or near the centre-of-mass energies corresponding to $\Upsilon(nS)$ ($n\leq 5$) resonances between 1999 and 2010 with the total integrated luminosity of 1 ab$^{-1}$. The data...
PANDA is a planned experiment at FAIR (Darmstadt, Germany) with a cooled antiproton beam in a range [1.5; 15] GeV/c, allowing a wide physics program in nuclear and particle physics. It is the only experiment worldwide, which combines a solenoid field (B=2T) and a dipole field (B=2Tm) in an experiment with a fixed target topology, in that energy regime. The tracking system of PANDA involves the...
Let me introduce the convergence research cluster for dark matter which is supported by National Research Council of Science and Technology in Korea. The goal is to build research cluster of nationwide institutes from accelerator-based physics to astrophysics based on computational science using infrastructures at KISTI (Korea Institute of Science Technology Information) and KASI (Korea...
The LHC has planned a series of upgrades culminating in the High Luminosity LHC (HL-LHC) which will have
an average luminosity 5-7 times larger than the nominal Run-2 value. The ATLAS Tile Calorimeter (TileCal) will
undergo an upgrade to accommodate to the HL-LHC parameters. The TileCal read-out electronics will be redesigned,
introducing a new read-out strategy.
The photomultiplier signals...
CERN has been archiving data on tapes in its Computer Center for decades and its archive system is now holding more than 135 PB of HEP data in its premises on high density tapes.
For the last 20 years, tape areal bit density has been doubling every 30 months, closely following HEP data growth trends. During this period, bits on the tape magnetic substrate have been shrinking exponentially;...
Data Flow Simulation of the ALICE Computing System with OMNET++
Rifki Sadikin, Furqon Hensan Muttaqien, Iosif Legrand, Pierre Vande Vyvre for the ALICE Collaboration
The ALICE computing system will be entirely upgraded for Run 3 to address the major challenge of sampling the full 50 kHz Pb-Pb interaction rate increasing by a factor 100 times the present limit. We present, in this...
This contribution reports on the feasibility of executing data intensive workflows on Cloud infrastructures. In order to assess this, the metric ETC = Events/Time/Cost is formed, which quantifies the different workflow and infrastructure configurations that are tested against each other.
In these tests ATLAS reconstruction Jobs are run, examining the effects of overcommitting (more parallel...
dCache is a distributed multi-tiered data storage system widely used
by High Energy Physics and other scientific communities. It natively
supports a variety of storage media including spinning disk, SSD and
tape devices. Data migration between different media tiers is handled
manually or automatically based on policies. In order to provide
different levels of quality of...
We review and demonstrate the design of efficient data transfer nodes (DTNs), from the perspectives of the highest throughput over both local and wide area networks, as well as the highest performance per unit cost. A careful system-level design is required for the hardware, firmware, OS and software components. Furthermore, additional tuning of these components, and the identification and...
The deployment of Openstack Magnum at CERN has given the possibility to manage container orchestration engines such as Docker and Kubernetes as first class resources in Openstack.
In this poster we will show the work done to exploit a docker Swarm cluster deployed via Magnum to setup a docker infrastructure running FTS ( the WLCG file transfer service). FTS has been chosen as one of the...
The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence.
Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem
for metadata aggregation and cataloguing. AMI is used by the ATLAS production system,
therefore the service must guarantee a high level of availability. We describe our monitoring system
and the...
With many parts of the world having run out of IPv4 address space and the Internet Engineering Task Force (IETF) depreciating IPv4 the use of and migration to IPv6 is becoming a pressing issue. A significant amount of effort has already been expended by the HEPiX IPv6 Working Group (http://hepix-ipv6.web.cern.ch/) on testing dual-stacked hosts and IPv6-only CPU resources. The Queen Mary grid...
Abstract: Nowadays, the High Energy Physics experiments produce a large amount of data. These data is stored in massive storage system, which need to balance the cost, performance and manageability. HEP is a typical data-intensive application, and process a lot of data to achieve scientific discoveries. A hybrid storage system including SSD (Solid-state Drive) and HDD (Hard Disk Drive) layers...
Abstract: Monte Carlo (MC) simulation production plays an important part in physics analysis of the Alpha Magnetic Spectrometer (AMS-02) experiment. To facilitate the metadata retrieving for data analysis needs among the millions of database records, we developed a monitoring tool to analyze and visualize the production status and progress. In this paper, we discuss the workflow of the...
ALICE (A Large Ion Collider Experiment) is the heavy-ion detector designed to study the physics of strongly interacting matter and the quark-gluon plasma at the CERN Large Hadron Collider (LHC). A major upgrade of the experiment is planned for 2020. In order to cope with a data rate 100 times higher and with the continuous readout of the Time Projection Chamber (TPC), it is necessary to...
The growing use of private and public clouds, and volunteer computing are driving significant changes in the way large parts of the distributed computing for our communities are carried out. Traditionally HEP workloads within WLCG were almost exclusively run via grid computing at sites where site administrators are responsible for and have full sight of the execution environment. The...
The long standing problem of reconciling the cosmological evidence of the existence of dark matter with the lack of any clear experimental observation of it, has recently revived the idea that the new particles are not directly connected with the Standard Model gauge fields, but only through mediator fields or ''portals'', connecting our world with new ''secluded'' or ''hidden'' sectors. One...
The trigger system of the ATLAS detector at the LHC is a combination of hardware, firmware and software, associated to various sub-detectors that must seamlessly cooperate in order to select 1 collision of interest out of every 40,000 delivered by the LHC every millisecond. This talk will discuss the challenges, workflow and organization of the ongoing trigger software development, validation...
The new generation of high energy physics(HEP) experiments have been producing gigantic data. How to store and access those data with high performance have been challenging the availability, scalability, and I/O performance of the underlying massive storage system. At the same time, a series of researches focusing on big data have been more and more active, and the research about metadata...
Binary decision trees are a widely used tool for supervised classification of high-dimensional data, for example among particle physicists. We present our proposal of the supervised binary divergence decision tree with nested separation method based on kernel density estimation. A key insight we provide is the clustering driven only by a few selected physical variables. The proper selection...
Load Balancing is one of the technologies enabling deployment of large scale applications on cloud resources. At CERN we have developed a DNS Load Balancer as a cost-effective way to do it for applications accepting DNS timing dynamics and not requiring memory. We serve 378 load balanced aliases with two small VMs acting as master and slave. These aliases are based on 'delegated' DNS zones the...
Requests for computing resources from LHC experiments are constantly
mounting, and so are their peak usage. Since dimensioning
a site to handle the peak usage times is impractical due to
constraints on resources that many publicly-owned computing centres
have, opportunistic usage of resources from external, even commercial
cloud providers is becoming more and more interesting, and is even...
The CMS experiment at LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning systems. Given the scale of the global queue in CMS, the operators found it increasingly difficult to monitor the pool to find problems and fix them. The operators had to rely on several different web pages, with several different levels of information, and sifting tirelessly...
CRAB3 is a tool used by more than 500 users all over the world for distributed Grid analysis of CMS data. Users can submit sets of Grid jobs with similar requirements (tasks) with a single user request. CRAB3 uses a client-server architecture, where a lightweight client, a server, and ancillary services work together and are maintained by CMS operators at CERN.
As with most complex...
The computing infrastructures serving the LHC experiments have been
designed to cope at most with the average amount of data recorded. The
usage peaks, as already observed in Run-I, may however originate large
backlogs, thus delaying the completion of the data reconstruction and
ultimately the data availability for physics analysis. In order to
cope with the production peaks, the LHC...
EMMA is a framework designed to build a family of configurable systems, with emphasis on extensibility and flexibility. It is based on a loosely coupled, event driven architecture. The architecture relies on asynchronous communicating components as a basis for decomposition of the system.
EMMA is embracing a fine-grained, component-based architecture, which produces a network of...
The use of opportunistic cloud resources by HEP experiments has significantly increased over the past few years. Clouds that are owned or managed by the HEP community are connected to the LHCONE network or the research network with global access to HEP computing resources. Private clouds, such as those supported by non-HEP research funds are generally connected to the international...
Traditional cluster computing resources can only partly meet the demand for massive data processing in the High Energy Physics (HEP) experiments, and volunteer computing remains a potential resource for this domain. It collects idle CPU time of desktop computers. Desktop Grid is the infrastructure to aggregate multiple volunteer computers to be included into a larger scale heterogeneous...
Within the WLCG project EOS is evaluated as a platform to demonstrate efficient deployment of geographically distributed storage. Aim of distributed storage deployments is to reduce the number of individual end-points for LHC experiments (>100 today) and to minimize the required effort for small storage sites. The split of meta-data and data component in EOS allows to operate one regional...
The long standing problem of reconciling the cosmological evidence of the existence of dark matter with the lack of any clear experimental observation of it, has recently revived the idea that the new particles are not directly connected with the Standard Model gauge fields, but only through mediator fields or ''portals'', connecting our world with new ''secluded'' or ''hidden'' sectors. One...
The Simulation at Point1 project is successfully running traditional ATLAS simulation jobs
on the trigger and data aquisition high level trigger resources.
The pool of the available resources changes dynamically and quickly, therefore we need to be very
effective in exploiting the available computing cycles.
We will present our experience with using the Event Service that provides...
CERN Print Services include over 1000 printers and multi-function devices as well as a centralised print shop. Every year, some 12 million pages are printed. We will present the recent evolution of CERN print services, both from the technical perspective (automated web-based configuration of printers, Mail2Print) and the service management perspective.
The algorithms and infrastructure of the CMS offline software are under continuous change in order to adapt to a changing accelerator, detector and computing environment. In this presentation, we discuss the most important technical aspects of this evolution, the corresponding gains in performance and capability, and the prospects for continued software improvement in the face of challenges...
Ceph based storage solutions and especially object storage systems based on it are now well recognized and widely used across the HEP/NP community. Both object storage and block storage layers of Ceph are now supporting production ready services for HEP/NP experiments at many research organizations across the globe, including CERN and Brookhaven National Laboratory (BNL), and even the Ceph...
Since its original commissioning in 2008, the LHCb data acquisition system has seen several fundamental architectural changes. The original design had a single, continuous stream of data in mind, going from the read-out boards through a software trigger straight to a small set of parallelly written files. Over the years the enormous increase in available storage capacity has made it possible...
Over the last two years, a small team of developers worked on an extensive rewrite of the Indico application based on a new technology stack. The result, Indico 2.0, leverages open source packages in order to provide a web application that is not only more feature-rich but, more importantly, builds on a solid foundation of modern technologies and patterns.
Indico 2.0 has the peculiarity of...
After two years of maintenance and upgrade, the Large Hadron Collider (LHC) has started its second four year run. In the mean time, the CMS experiment at the LHC has also undergone two years of maintenance and upgrade, especially in the field of the Data Acquisition and online computing cluster, where the system was largely redesigned and replaced. Various aspects of the supporting computing...
The researchers at the Google Brain team released their second generation Deep Learning library, TensorFlow, as an open-source package under the Apache 2.0 license in November, 2015. Google has already deployed the first generation library using DistBelief in various systems such as Google Search, advertising systems, speech recognition systems, Google Images, Google Maps, Street View, Google...
The High Luminosity LHC (HL-LHC) is a project to increase the luminosity of the Large Hadron Collider to 5*10^34 cm-2 s-1. The CMS experiment is planning a major upgrade in order to cope with an expected average number of overlapping collisions per bunch crossing of 140. The dataset sizes will increase by several orders of magnitude and so will be the request for larger computing...
We present a new experiment management system for the SND detector at the VEPP-2000 collider (Novosibirsk). Substantially, it includes as important part operator access to experimental databases (configuration, conditions and metadata).
The system is designed in client-server architecture. A user interacts with it via web-interface. The server side includes several logical layers: user...
The BESIII experiment located in Beijing is an electron-positron collision experiment to study Tau-Charm physics. Now in its middle age BESIII has aggregated more than 1PB raw data and the distributed computing system has been built up based on DIRAC and put into productions since 2012 to deal with peak demands. Nowadays cloud becomes popular ways to provide resources among BESIII...
The high precision experiment PANDA is specifically designed to shed new light on the structure and properties of hadrons. PANDA is a fixed target antiproton proton experiment and will be part of Facility for Antiproton and Ion Research (FAIR) in Darmstadt, Germany. When measuring the total cross sections or determining the properties of intermediate states very precisely e.g. via the energy...
Simulated samples of various physics processes are a key ingredient
within analyses to unlock the physics behind LHC collision data. Samples
with more and more statistics are required to keep up with the
increasing amounts of recorded data. During sample generation,
significant computing time is spent on the reconstruction of charged
particle tracks from energy deposits which additionally...
Charmonium is one of the most interesting, yet most challenging observables for the CBM experiment. CBM will try to measure
charmonium in the di-muon decay channel in heavy-ion collisions close to or even below the kinematic threshold for elementary interactions. The expected signal yield is consequently extremely low - less than one in a million collisions. CBM as a high-rate experiment shall...
In October 2015, CERN’s core website has been moved to a new address, [http://home.cern][1], marking the launch of the brand new top-level domain .cern. In combination with a formal governance and registration policy, the IT infrastructure needed to be extended to accommodate the hosting of Web sites in this new top level domain. We will present the technical implementation in the framework...
Processing of the large amount of data produced by the ATLAS experiment requires fast and reliable access to what we call Auxiliary Data Files (ADF). These files, produced by Combined Performance, Trigger and Physics groups, contain conditions, calibrations, and other derived data used by the ATLAS software. In ATLAS this data has, thus far for historical reasons, been collected and accessed...
It's been for almost 10 years that CERN has been providing live webcast of events using Adobe Flash technology. This year is finally the year that flash died at CERN! At CERN we closely follow the broadcast industry and are always trying to provide our users with the same experience as they have on other commercial streaming services. With Flash being slowly phased out on most of the streaming...
Accurate simulation of calorimeter response for high energy electromagnetic
particles is essential for the LHC experiments. Detailed simulation of the
electromagnetic showers using Geant4 is however very CPU intensive and
various fast simulation methods were proposed instead. The frozen shower
simulation substitutes the full propagation of the showers for energies
below $1$~GeV by showers...
The current tier-0 processing at CERN is done on two managed sites, the CERN computer centre and the Wigner computer centre. With the proliferation of public cloud resources at increasingly competitive prices, we have been investigating how to transparently increase our compute capacity to include these providers. The approach taken has been to integrate these resources using our existing...
The LHCb Software Framework Gaudi was initially designed and developed almost twenty years ago, when computing was very different from today. It has also been used by a variety of other experiments, including ATLAS, Daya Bay, GLAST, HARP, LZ, and MINERVA. Although it has been always actively developed all these years, stability and backward compatibility have been favoured, reducing the...
After an initial R&D stage of prototyping portable performance for particle transport simulation, the GeantV project reaches a new phase where the different components such as kernel libraries, scheduling, geometry and physics are rapidly developing. The increase in complexity is accelerating by the multiplication of demonstrator examples and tested platforms, while trying to maintain a...
Throughout the last decade the Open Science Grid (OSG) has been fielding requests from user communities, resource owners, and funding agencies to provide information about utilization of OSG resources. Requested data include traditional “accounting” - core-hours utilized - as well as user’s certificate Distinguished Name, their affiliations, and field of science. The OSG accounting service,...
It is well known that submitting jobs to the grid and transferring the
resulting data are not trivial tasks, especially when users are required
to manage their own X.509 certificates. Asking users to manage their
own certificates means that they need to keep the certificates secure,
remember to renew them periodically, frequently create proxy
certificates, and make them available to...
Grid Site Availability Evaluation and Monitoring at CMS
The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyze the vast quantity of scientific data recorded every year.
The computing resources are grouped into sites and organized in a tiered structure. A tier consists of sites in various countries...
grid-control is an open source job submission tool that supports common HEP workflows.
Since 2007 it has been used by a number of HEP analyses to process tasks which routinely reach the order of tens of thousands of jobs.
The tool is very easy to deploy, either from its repository or the python package index (pypi). The project aims at being lightweight and portable. It can run in...
The Belle II experiment at the SuperKEKB e+e- accelerator is preparing for taking first collision data next year. For the success of the experiment it is essential to have information about varying conditions available in the simulation, reconstruction, and analysis code.
The interface to the conditions data in the client code was designed to make the life for developers as easy as possible....
At the Large Hadron Collider, numerous physics processes expected within the standard model and theories beyond it give rise to very high momentum particles decaying to multihadronic final states. Development of algorithms for efficient identification of such “boosted” particles while rejecting the background from multihadron jets from light quarks and gluons can greatly aid in the sensitivity...
The online farm of the ATLAS experiment at the LHC, consisting of nearly 4000 PCs with various characteristics, provides configuration and control of the detector and performs the collection, processing, selection and conveyance of event data from the front-end electronics to mass storage.
The status and health of every host must be constantly monitored to ensure the correct and reliable...
Argonne provides a broad portfolio of computing resources to researchers. Since 2011 we have been providing a cloud computing resource to researchers, primarily using Openstack. Over the last year we’ve been working to better support containers in the context of HPC. Several of our operating environments now leverage a combination of the three technologies which provides infrastructure...
The Scientific Computing Department of the STFC runs a cloud service for internal users and various user communities. The SCD Cloud is configured using a Configuration Management System called Aquilon. Many of the virtual machine images are also created/configured using Aquilon. These are not unusual however our Integrations also allow Aquilon to be altered by the Cloud. For instance creation...
IPv4 network addresses are running out and the deployment of IPv6 networking in many places is now well underway. Following the work of the HEPiX IPv6 Working Group, a growing number of sites in the Worldwide Large Hadron Collider Computing Grid (WLCG) have deployed dual-stack IPv6/IPv4 services. The aim of this is to support the use of IPv6-only clients, i.e. worker nodes, virtual machines or...
Hybrid systems are emerging as an efficient solution in the HPC arena, with an abundance of approaches for integration of accelerators into the system (i.e. GPU, FPGA). In this context, one of the most important features is the chance of being able to address the accelerators, whether they be local or off-node, on an equal footing. Correct balancing and high performance in how the network...
We present an overview of Data Processing and Data Quality (DQ) Monitoring for the ATLAS Tile Hadronic
Calorimeter. Calibration runs are monitored from a data quality perspective and used as a cross-check for physics
runs. Data quality in physics runs is monitored extensively and continuously. Any problems are reported and
immediately investigated. The DQ efficiency achieved was 99.6% in 2012...
The SDN Next Generation Integrated Architecture (SDN-NGeNIA) program addresses some of the key challenges facing the present and next generations of science programs in HEP, astrophysics, and other fields whose potential discoveries depend on their ability to distribute, process and analyze globally distributed petascale to exascale datasets.
The SDN-NGenIA system under development by the...
A large part of the programs of hadron physics experiments deal with the search for new conventional and exotic hadronic states like e.g. hybrids and glueballs. In a majority of analyses a Partial Wave Analysis (PWA) is needed to identify possible exotic states and to classifiy known states. Of special interest is the comparison or combination of data from multiple experiments. Therefore, a...
This paper describes GridPP's Vacuum Platform for managing virtual machines (VMs), which has been used to run production workloads for WLCG, other HEP experiments, and some astronomy projects. The platform provides a uniform interface between VMs and the sites they run at, whether the site is organised as an Infrastructure-as-a-Service cloud system such as OpenStack with a push model, or an...