In position-sensitive detectors with segmented readout (pixels or strips), charged particles activate in general several adjacent read-out channels. The first step in the reconstruction of the hit position is thus to identify clusters of active channels associated to one particle crossing the detector. In conventionally triggered systems, where the association of raw data to events is given by...
To study the performance of the Micro Vertex Detector (MVD), a fully modularized framework has been developed. The main goals of this framework have been: easy adaptability to new sensor specifications or changes in the geometry. This should be provided and additional high constrains on performance and memory usage had been set.
To achieve these goals a framework has been build which...
The Historic Data Quality Monitor (HDQM) of the CMS experiment is a framework developed by the Tracker group of the CMS collaboration that permits a web-based monitoring of the time evolution of measurements ( S/N ratio, cluster size etc) in the Tracker silicon micro-strip and pixel detectors. It addition, it provides a flexible way for the implementation of HDQM to the other detector systems...
The offline software framework of the ATLAS experiment (Athena) consists of many small components of various types like Algorithm, Tool or Service. To assemble these components into an executable application for event processing, a dedicated configuration step is necessary. The configuration of a particular job depends on the workflow (simulation, reconstruction, high-level trigger, overlay,...
Muon reconstruction is currently all done offline for ALICE. In Run3 this is supposed to move online, with ALICE running in continuous readout with a minimum bias Pb-Pb interaction rate of 50kHz.
There are numerous obstacles to getting the muon software to achieve the required performance, with the muon cluster finder being replaced and moved to run on a GPU inside the new O2 computing...
We introduce SWiF - Simplified Workload-intuitive Framework - a workload-centric, application programming framework designed to simplify the large-scale deployment of FPGAs in end-to-end applications. SWiF intelligently mediates access to shared resources by orchestrating the distribution and scheduling of tasks across a heterogeneous mix of FPGA and CPU resources in order to improve...
The LHC experiments produce petabytes of data each year, which must be stored, processed and analyzed. This requires a significant amount of storage and computing resources. In addition to that, the requirements to these resources are increasing over the years, at each LHC running period.
In order to predict the resource usage requirements of the ALICE Experiment for a particular LHC Run...
The Message Queue architecture is an asynchronous communication scheme that provides an attractive solution for certain scenarios in the distributed computing model. The introduction of the intermediate component (queue) in-between the interacting processes, allows to decouple the end-points making the system more flexible and providing high scalability and redundancy. The message queue...
The GridKa Tier 1 data and computing center hosts a significant share of WLCG processing resources. Providing these resources to all major LHC and other VOs requires an efficient, scalable and reliable cluster management. To satisfy this, GridKa has recently migrated its batch resources from CREAM-CE and PBS to ARC-CE and HTCondor. This contribution discusses the key highlights of the adoption...
Modern workload management systems that are responsible for central data production and processing in High Energy and Nuclear Physics experiments have highly complicated architectures and require a specialized control service for resource and processing components balancing. Such a service represents a comprehensive set of analytical tools, management utilities and monitoring views aimed at...
IaaS clouds brought us greater flexibility in managing computing infrastructures enabling us to mix different computing environments (e.g. Grid systems, web-servers and even personal desktop-like systems) in form of virtual machines (VM) within the same hardware equipment. The new paradigm automatically introduced efficiency increase caused by switching from using single-task dedicated...
Abstract:
ALICE (A Large Ion Collider Experiment) is one of the four big experiments at the Large Hadron Collider (LHC). For ALICE Run 3 there will be a major upgrade for several detectors as well as the compute infrastructure with a combined Online-Offline computing system (O2) to support continuous readout at much higher data rates than before (3TB/s). The ALICE Time Projection Chamber...
AlphaTwirl is a python library that loops over event data and summarizes them into multi-dimensional categorical (binned) data as data frames. Event data, input to AlphaTwirl, are data with one entry (or row) for one event: for example, data in ROOT TTree with one entry per collision event of an LHC experiment. Event data are often large -- too large to be loaded in memory -- because they have...
The ATLAS experiment records data from the proton-proton collisions produced by the Large Hadron Collider (LHC). The Tile Calorimeter is the hadronic sampling calorimeter of ATLAS in the region |eta| < 1.7. It uses iron absorbers and scintillators as active material. Jointly with the other calorimeters it is designed for reconstruction of hadrons, jets, tau-particles and missing transverse...
Many areas of academic research are increasingly catching up with the LHC experiments
when it comes to data volumes, and just as in particle physics they require large data sets to be moved between analysis locations.
The LHC experiments have built a global e-Infrastructure in order to handle hundreds of
petabytes of data and massive compute requirements. Yet, there is nothing particle physics...
We investigate novel approaches using Deep Learning (DL) for efficient execution of workflows on distributed resources. Specifically, we studied the use of DL for job performance prediction, performance classification, and anomaly detection to improve the utilization of the computing resources.
- Performance prediction:
- capture performance of workflows on multiple resources
-...
The ATLAS Distributed Computing (ADC) Project is responsible for the off-line processing of data produced by the ATLAS experiment at the Large Hadron Collider (LHC) at CERN. It facilitates data and workload management for ATLAS computing on the Worldwide LHC Computing Grid (WLCG).
ADC Central Services operations (CSops)is a vital part of ADC, responsible for the deployment and configuration...
PanDA (Production and Distributed Analysis) is the workload management system for ATLAS across the Worldwide LHC Computing Grid. While analysis tasks are submitted to PanDA by over a thousand users following personal schedules (e.g. PhD or conference deadlines), production campaigns are scheduled by a central Physics Coordination group based on the organization’s calendar. The Physics...
PowerPC and high performance computers (HPC) are important resources for computing in the ATLAS experiment. The future LHC data processing will require more resources than Grid computing, currently using approximately 100,000 cores at well over 100 sites, can provide. Supercomputers are extremely powerful as they use resources of hundreds of thousands CPUs joined together. However their...
The Czech national HPC center IT4Innovations located in Ostrava provides two HPC systems, Anselm and Salomon. The Salomon HPC is amongst the hundred most powerful supercomputers on Earth since its commissioning in 2015. Both clusters were tested for usage by the ATLAS experiment for running simulation jobs. Several thousand core hours were allocated to the project for tests, but the main aim...
In 2018 the Belle II detector will begin collecting data from $e^+e^-$ collisions at the SuperKEKB electron-positron collider at the High Energy Accelerator Research Organization (KEK, Tsukuba, Japan). Belle II aims to collect a data sample 50 times larger than the previous generation of B-Factories, taking advantage of the SuperKEKB design luminosity of $8\times10^{35} cm^{-2} s^{-1}$.
It is...
Creating software releases is one of the more tedious occupations in the life of
a software developer. For this purpose we have tried to automate as many of the
repetitive tasks involved from getting the commits to running the software as
possible. For this simplification we rely in large parts on free collaborative
services build around GitHub: issue tracking, code review (GitHub),...
One of the main challenges the CMS collaboration must overcome during the phase-2 upgrade is the radiation damage to the detectors from the high integrated luminosity of the LHC and the very high pileup. The LHC will produce collisions at a rate of about 5x10^9/s. The particles emerging from these collisions and the radioactivity they induce will cause significant damage to the detectors and...
CERN's current Backup and Archive Service hosts 11 PB of data in more than 2.1 billion files. We have over 500 clients which back up or restore an average of 80 TB of data each day. At the current growth rate, we expect to have about 13 PB by the end of 2018.
In this contribution we present CERN's Backup and Archive Service based on IBM Spectrum Protect (previously known as Tivoli Storage...
The SHiP experiment is new general purpose fixed target experiment designed to complement collider experiments in the search for new physics. A 400 GeV/c proton beam from the CERN SPS will be dumped on a dense target to accumulate $2\times10^{20}$ protons on target in five years.
A crucial part of the experiment is the active muon shield, which allows the detector to operate at a very high...
Over the last seven years the software stack of the next generation B factory experiment Belle II has grown to over one million lines of C++ and Python code, counting only the part included in offline software releases. This software is used by many physicists for their analysis, many of which will be students with no prior experience in HEP software. A beginner-friendly and up-to-date...
Several data samples from a Belle II experiment will be available to the general public as a part of experiment outreach activities. Belle2Lab is designed as an interactive graphical user interface to reconstructed particles, offering users basic particle selection tools. The tool is based on a Blockly JavaScript graphical code generator and can be run in a HTML5 capable browser. It allows...
I describe a novel interactive virtual reality visualization of subatomic particle physics, designed as an educational tool for learning about and exploring the subatomic particle collision events of the Belle II experiment. The visualization is designed for untethered, locomotive virtual reality, allowing multiple simultaneous users to walk naturally through a virtual model of the Belle II...
Tape is an excellent choice for archival storage because of the capacity, cost per GB and long retention intervals, but its main drawback is the slow access time due to the nature of sequential medium. Modern enterprise tape drives now support Recommended Access Ordering (RAO), which is designed to improve recall/retrieval times.
BNL's mass storage system currently holds more than 100 PB of...
Many of the workflows in the CMS offline operation are designed around the concept of acquisition of a run: a period of data-taking with stable detector and accelerator conditions. The capability of integrating statistics across several runs is an asset for statistically limited monitoring and calibration workflows. Crossing run boundaries requires careful evaluation of the conditions of the...
AMI (ATLAS Metadata Interface) is a generic ecosystem for metadata
aggregation, transformation and cataloguing. Often, it is interesting
to share up-to-date metadata with other content services such as wikis.
Here, we describe the cross-domain solution implemented in the AMI Web
Framework: a system of embeddable controls, communicating with the
central AMI service and based on the AJAX and...
The analysis and understanding of resources utilization in shared infrastructures, such as cloud environments, is crucial in order to provide better performance, administration and capacity planning.
The management of resource usage of the OpenStack-based cloud infrastructures hosted at INFN-Padova, the Cloud Area Padovana and the INFN-PADOVA-STACK instance of the EGI Federated Cloud, started...
CERN Document Server (CDS, cds.cern.ch) is the CERN Institutional Repository based on the Invenio open source digital repository framework. It is a heterogeneous repository, containing more than 2 million records, including research publications, audiovisual material, images, and the CERN archives. Its mission is to store and preserve all the content produced at CERN as well as to make it...
The observation of neutrino oscillations provides evidence of physics beyond the Standard Model, and the precise measurement of those oscillations remains an essential goal for the field of particle physics. The NOvA experiment is a long-baseline neutrino experiment composed of two finely-segmented liquid-scintillator detectors located off-axis from the NuMI muon-neutrino beam having as its...
The Large Hadron Collider (LHC) at CERN Geneva has entered the Run 2 era, colliding protons at a center of mass energy of 13 TeV at high instantaneous luminosity. The Compact Muon Solenoid (CMS) is a general-purpose particle detector experiment at the LHC. The CMS Electromagnetic Calorimeter (ECAL) has been designed to achieve excellent energy and position resolution for electrons and photons....
In 2017 the Large Hadron Collider (LHC) at CERN has provided an astonishing 50 fb-1 of proton-proton collisions at a center of mass energy of 13 TeV. The Compact Muon Solenoid (CMS) detector has been able to record the 90.3% of this data. During this period, the CMS Electromagnetic Calorimeter (ECAL), based on 75000 scintillating PbWO4 crystals and a silicon and lead preshower, has continued...
LHC Run2 began in April 2015. With the restart of the collisions in the CERN Large Hadron Collider. In the perspective of the offline event reconstruction, the most relevant detector updates appeared in 2017: they were the restructuring of the pixel detector, with an additional layer closer to the beams, and the improved photodetectors and readout chips for the hadron calorimeter, which will...
The central production system of CMS is utilizing the LHC grid and effectively about 200 thousand cores, over about a hundred computing centers worldwide. Such a wide and unique distributed computing system is bound to sustain a certain rate of failures of various types. These are appropriately addressed with site administrators a posteriori. With up to 50 different campaigns ongoing...
For over a year and a half we ran a CERN-wide trial of collaborative authoring platforms, understanding how the CERN community authors and co-authors, gathering the user needs and requirements and evaluating the available options. As a result, the Overleaf and ShareLaTeX cloud platforms are now fully available to the CERN Community. First, we will explain our user-centered approach...
The LHC delivers an unprecedented number of proton-proton collisions
to its experiments. In kinematic regimes first studied by earlier
generations of collider experiments, the limiting factor to more
deeply probing for new physics can be the online and offline
computing, and offline storage, requirements for the recording and
analysis of this data. In this contribution, we describe a...
CERN is using an increasing number of DNS based load balanced aliases (currently over 600). We explain the Go based concurrent implementation of the Load Balancing Daemon (LBD), how it is being progressively deployed using Puppet and how concurrency greatly improves scalability, ultimately allowing a single master-slave couple of Openstack VMs to server all LB aliases. We explain the Lbclient...
In preparation for Run 3 of the LHC, the ATLAS experiment is migrating
its offline software to use a multithreaded framework, which will allow
multiple events to be processed simultaneously. This implies that the
handling of non-event, time-dependent (conditions) data,
such as calibrations and geometry, must also be extended to allow
for multiple versions of such data to exist...
Containerization is a lightweight form of virtualization that allows reproducibility and isolation responding to a number of long standing use cases in running the ATLAS software on the grid. The development of Singularity in particular with the capability to run as a standalone executable allows for containers to be integrated in the ATLAS (and other experiments) submission framework....
Foundational software libraries such as ROOT are under intense pressure to avoid software regression, including performance regressions. Continuous performance benchmarking, as a part of continuous integration and other code quality testing, is an industry best-practice to understand how the performance of a software product evolves over time. We present a framework, built from industry best...
The LHC has planned a series of upgrades culminating in the High Luminosity LHC (HL-LHC) which will have an average luminosity 5-7 times larger than the design LHC value. The Tile Calorimeter (TileCal) is the hadronic sampling calorimeter installed in the central region of the ATLAS detector. It uses iron absorbers and scintillators as active material. TileCal will undergo a substantial...
The DUNE Collaboration is pursuing an experimental program (named protoDUNE)
which involves a beam test of two large-scale prototypes of the DUNE Far Detector
at CERN in 2018. The volume of data to be collected by the protoDUNE-SP (the single-phase detector) will amount to a few petabytes and the sustained rate of data sent to mass
storage will be in the range of a few hundred MB per second....
Since the current data infrastructure of the HEP experiments is based on gridftp, most computing centres have adapted and based their own access to the data on the X.509. This is an issue for smaller experiments who do not have the resources to train their researchers about the complexities of X.509 certificates and who clearly would prefer an approach based on username/password.
On the...
Various sites providing storage for experiments in high energy particle physics and photon science deploy dCache as flexible and modern large scale storage system. As such, dCache is a complex and elaborated software framework, which needs a test driven development in order to ensure a smooth and bug-free release cycle. So far, tests for dCache are performed on dedicated hosts emulating the...
The Dynamic Deployment System (DDS) is a tool-set that automates and significantly simplifies a deployment of user-defined processes and their dependencies on any resource management system (RMS) using a given topology. DDS is a part of the ALFA framework.
A number of basic concepts are taken into account in DDS. DDS implements a single responsibility principle command line tool-set and API....
The Belle II detector will begin its data taking phase in 2018. Featuring a state of the art vertex detector with innovative pixel sensors, it will record collisions of e+e- beams from the SuperKEKB accelerator which is slated to provide luminosities 40x higher than KEKB.
This large amount of data will come at the price of an increased beam background, as well as an operating point providing...
CMS Tier 3 centers, frequently located at universities, play an important role in the physics analysis of CMS data. Although different computing resources are often available at universities, meeting all requirements to deploy a valid Tier 3 able to run CMS workflows can be challenging in certain scenarios. For instance, providing the right operating system (OS) with access to the CERNVM File...
We investigate the automatic deployment and scaling of grid infrastructure components as virtual machines in OpenStack. To optimize the CVMFS usage per hypervisor, we study different approaches to share CVMFS caches and cache VMs between multiple client VMs.\newline
For monitoring, we study container solutions and extend these to monitor non-containerized applications within cgroups resource...
It is difficult to promote cyber security measures in research institutes, especially in DMZ network that allows connections from outside network. This difficulty mainly comes from two types of variety. One is the various requirements of servers operated by each research group. The other is the divergent skill level among server administrators. Unified manners rarely fit managing those...
Beside their increasing complexity and variety of provided resources and services, large data-centers nowadays often belong to a distributed network and need non-conventional monitoring tools. This contribution describes the implementation of a monitoring system able to provide active support for problem solving to the system administrators.
The key components are information collection and...
There is a growing need to incorporate sustainable software practices into High Energy Physics. Widely supported tools offering source code management, continuous integration, unit testing and software quality assurance can greatly help improve standards. However, for resource-limited projects there is an understandable inertia in deviating effort to cover systems maintenance and application...
The Standard Model in particle physics is refined. However, new physics beyond the Standard Model, such as dark matter, requires thousand to million times of simulation events compared to those of the Standard Model. Thus, the development of software is required, especially for the development of simulation tool kits. In addition, computing is evolving. It requires the development of the...
INFN Corporate Cloud (INFN-CC) is a geographically distributed private cloud infrastructure, based on OpenStack, that has recently been deployed in three of the major INFN data-centres in Italy. INFN-CC has a twofold purpose: on one hand its fully redundant architecture and its resiliency characteristics make of it the perfect environment for providing critical network services for the...
The experience gained in several years of storage system administration has shown that the WLCG distributed grid infrastructure is very performing for the needs of the LHC experiments. However, an excessive number of storage sites leads to inefficiencies in the system administration because of the needs of having experienced manpower in each site and of the increased burden on the central...
Dynafed is a system that allows the creation of flexible and seamless storage federations out of participating sites that expose WebDAV, HTTP, S3 or Azure interfaces. The core components are considered stable since a few years, and the recent focus has been on supporting various important initiatives willing to exploit the potential of Cloud storage in the context of Grid computing for various...
Replicability and efficiency of data processing on the same data samples are a major challenge for the analysis of data produced by HEP experiments. High-level data analyzed by end-users are typically produced as a subset of the whole experiment data sample to study interesting selection of data (streams). For standard applications, streams may be eventually copied from servers and analyzed on...
The work is devoted to the result of the creating a first module of the data processing center at the Joint Institute for Nuclear Research for modeling and processing experiments. The issues related to handling the enormous data flow from the LHC experimental installations and troubles of distributed storages are considered. The article presents a hierarchical diagram of the network farm and a...
WLCG, a Grid computing technology used by CERN researchers, is based on two kinds of middleware. One of them, UMD middleware, is widely used in many European research groups to build a grid computing environment. The most widely used system in the UMD middleware environment was the combination of CREAM-CE and the batch job manager "torque". In recent years, however, there have been many...
Transfer Time To Complete (T³C) is a new extension for the data management system Rucio that allows to make predictions about the duration of a file transfer. The extension has a modular architecture which allows to make predictions based on simple to more sophisticated models, depending on available data and computation power. The ability to predict file transfer times with reasonable...
CNAF is the national center of INFN for IT services. The Tier-1 data center operated at CNAF provides computing and storage resources mainly to scientific communities such as those working on the four LHC experiments and 30 more experiments in which INFN is involved.
In past years, every CNAF departments used to choose their preferred tools for monitoring, accounting and alerting. In...
VISPA (Visual Physics Analysis) is a web-platform that enables users to work on any SSH reachable resource using just their web-browser. It is used successfully in research and education for HEP data analysis.
The emerging JupyterLab is an ideal choice for a comprehensive, browser-based, and extensible work environment and we seek to unify it with the efforts of the VISPA project. The primary...
The CERN IT Communication Systems group is in charge of providing various wired and wireless based communication services across the laboratory. Among them, the group designs, installs and manages a large complex of networks: external connectivity, data-centre network (deserving central services and the WLCG), campus network (providing connectivity to users on site), and last but not least...
The current level of flexibility reached by Cloud providers enables Physicists to take advantage of extra resources to extend the distributed computing infrastructure supporting High Energy Physics Experiments. However, the discussion about the optimal usage of such kind of resources is still ongoing. Moreover, because each Cloud provider offers his own interfaces, API set and different...
We describe how the Blackett facility at the University of Manchester
High Energy Physics group has been extended to provide Docker container and
cloud platforms as part of the UKT0 initiative. We show how these new
technologies can be managed using the facility's existing fabric
management based on Puppet and Foreman. We explain how use of the
facility has evolved beyond its origins as a WLCG...
A key aspect of pilot-based grid operations are the pilot (glidein) factories. A proper and efficient use of any central blocks in the grid infrastructure is for operations inevitable, and glideinWMS factories are not the exception. The monitoring package for glideinWMS factory monitoring was originally developed when the factories were serving a couple of VO’s and tens of sites. Nowadays with...
A small Cloud infrastructure for scientific computing likely operates in a saturated regime, which imposes constraints to free applications’ auto-scaling. Tenants typically pay a priori for a fraction of the overall resources. Within this business model, an advanced scheduling strategy is needed in order to optimize the data centre occupancy.
FaSS, a Fair Share Scheduler service for...
The future heavy ion experiment CBM at the FAIR facility will study the QCD phase diagram in the region of high baryon chemical potential at relatively moderate temperatures, where a complex structure is predicted by modern theories. In order to detect possible signatures of this structures, the physics program of the experiment includes comprehensive study of the extremely rare probes like...
IceCube is a cubic kilometer neutrino detector located at the south pole. IceCube’s simulation and production processing requirements far exceed the number of available CPUs and GPUs in house. Collaboration members commit resources in the form of cluster time at institutions around the world. IceCube also signs up for allocations from large clusters in the United States like XSEDE. All of...
During the next major shutdown from 2019-2021, the ATLAS experiment at the LHC at CERN will adopt the Front-End Link eXchange (FELIX) system as the interface between the data acquisition, detector control and TTC (Timing, Trigger and Control) systems and new or updated trigger and detector front-end electronics. FELIX will function as a router between custom serial links from front end ASICs...
Fermilab is developing the Frontier Experiments RegistRY (FERRY) service that provides a centralized repository for the access control and job management attributes such as batch and storage access policies, quotas, batch priorities and NIS attributes for cluster configuration. This paper describes FERRY architecture, deployment and integration with services that consume the stored...
IceCube is a cubic kilometer neutrino detector located at the south pole. Data are processed and filtered in a data center at the south pole. After transfer to a data warehouse in the north, data are further refined through multiple levels of selection and reconstruction to reach analysis samples. So far, the production and curation of these analysis samples has been handled in an ad-hoc way...
g4tools is a collection of pure header classes intended to be a technical low level layer of the analysis category introduced in Geant4 release 9.5 to help Geant4 users to manage their histograms and ntuples in various file formats. In g4tools bundled with the latest Geant4 release (10.4, December 2017), we introduced a new HDF5 IO driver for histograms and column wise paged ntuples as well as...
Efficient handling of large data-volumes becomes a necessity in today's world. It is driven by desire to get more insight from the data and to gain a better understanding of user trends which can be transformed into economic incentives (profits, cost-reduction and various optimization of data workflows and pipelines). In this talk we discuss how modern technologies are transforming a well...
One of the key factors for the successful development of a physics Monte-Carlo is the ability to properly organize regression testing and validation. Geant4, a world-standard toolkit for HEP detector simulation, is one such example that requires thorough validation. The CERN/SFT group, which contributes to the development, testing, deployment and support of the toolkit, is also responsible for...
Central Exclusive Production (CEP) is a class of diffractional processes studied at the Large Hadron Collider, that offers a very clean experimental environment for probing the low energy regime of Quantum Chromodynamics.
As any other analyses in High Energy Physics, it requires a large amount of simulated Monte Carlo data, that is usually created by means of the so-called MC event generators....
A user : with PAW I had the impression to do physics, with ROOT I have the impression to type C++. Then why not returning to do physics?! We will present how gopaw is done, especially putting accent on its portability, its way to handle multiple file formats (including ROOT/IO and HDF5), its unified graphics based on the inlib/sg scene graph manager (see CHEP 2013 for softinex) and its...
ATLAS has developed and previously presented a new computing architecture, the Event Service, that allows real time delivery of fine grained workloads which process
dispatched events (or event ranges) and immediately streams outputs.
The principal aim was to profit from opportunistic resources such as commercial
cloud, supercomputing, and volunteer computing, and otherwise unused cycles on...
The long standing problem of reconciling the cosmological evidence of the existence of dark matter with the lack of any clear experimental observation of it, has recently revived the idea that the new particles are not directly connected with the Standard Model gauge fields, but only through mediator fields or "portals", connecting our world with new "secluded" or "hidden" sectors. One of the...
The LAN and WAN development of DE-KIT will be shown from the very beginning to the current status. DE-KIT is the German Tier-1 center collaborating with the Large Hadron Collider (LHC) at CERN. This includes the local area network capacity level ramp up from 10Gbps over 40 Gbps to 100 Gbps as well as the wide area connections. It will be demonstrated how the deployed setup serves the current...
The higher energy and luminosity from the LHC in Run2 has put increased pressure on CMS computing resources. Extrapolating to even higher luminosities (and thus higher event complexities and trigger rates) beyond Run3, it becomes clear that simply scaling up the the current model of CMS computing alone will become economically unfeasible. High Performance Computing (HPC) facilities, widely...
Software is an essential component of the experiments in High Energy Physics. Due to the fact that it is upgraded on relatively short timescales, software provides flexibility, but at the same time is susceptible to issues introduced during development process, which enforces systematic testing. We present recent improvements to LHCbPR, the framework implemented at LHCb to measure physics and...
HammerCloud is a framework to commission, test, and benchmark ATLAS computing resources and components of various distributed systems with realistic full-chain experiment workflows. HammerCloud contributes to ATLAS Distributed Computing (ADC) Operations and automation efforts, providing the automated resource exclusion and recovery tools, that help re-focus operational manpower to areas which...
Scheduling multi-core workflows in a global HTCondor pool is a multi-dimensional problem whose solution depends on the requirements of the job payloads, the characteristics of available resources, and the boundary conditions such as fair share and prioritization imposed on the job matching to resources. Within the context of a dedicated task force, CMS has increased significantly the...
Over 8000 Windows PCs are actively used on the CERN site for tasks ranging from controlling the accelerator facilities to processing invoices. PCs are managed through CERN's Computer Management Framework and Group Policies, with configurations deployed based on machine sets and a lot of autonomy left to the end-users. While the generic central configuration works well for the majority of the...
The online farm of the ATLAS experiment at the LHC, consisting of
nearly 4000 PCs with various characteristics, provides configuration
and control of the detector and performs the collection, processing,
selection, and conveyance of event data from the front-end electronics
to mass storage.
Different aspects of the farm management are already accessible via
several tools. The status and...
Input data for applications that run in cloud computing centres can be stored at remote repositories, typically with multiple copies of the most popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. In this approach, the closest copy of the data is used based on geographical or other...
High-Performance Computing (HPC) and other research cluster computing resources provided by universities can be useful supplements to the collaboration’s own WLCG computing resources for data analysis and production of simulated event samples. The shared HPC cluster "NEMO" at the University of Freiburg has been made available to local ATLAS users through the provisioning of virtual machines...
The Information Technology department at CERN has been using ITIL Service Management methodologies and ServiceNow since early 2011. In recent years, several developments have been accomplished regarding the data centre and service monitoring, as well as status management.
ServiceNow has been integrated with the data centre monitoring infrastructure, via GNI (General Notification...
IceCube is a cubic kilometer neutrino detector located at the south pole. Data handling has been managed by three separate applications: JADE, JADE North, and JADE Long Term Archve (JADE-LTA). JADE3 is the new version of JADE that merges these diverse data handling applications into a configurable data handling pipeline (“LEGO® Block JADE”). The reconfigurability of JADE3 has enabled...
The new version of JSROOT provides full implementation of the ROOT binary I/O, now including TTree. Powerful JSROOT.TreeDraw functionality provides a simple way to inspect complex data in web browsers directly, without need to involve ROOT-based code.
JSROOT is now fully integrated into Node.js environment. Without binding to any C++ code, one get direct access to all kinds of ROOT data....
Prometheus is a leading open source monitoring and alerting tool. Prometheus's local storage is limited in its scalability and durability, but it integrates very well with other solutions which provide us with robust long term storage. This talk will cover two solutions which interface excellently and do not require us to deal with HBase - KairosDB and Chronix. Intended audience are people who...
The ATLAS EventIndex has been in operation since the beginning of LHC Run 2 in 2015. Like all software projects, its components have been constantly evolving and improving in performance. The main data store in Hadoop, based on MapFiles and HBase, can work for the rest of Run 2 but new solutions are explored for the future. Kudu offers an interesting environment, with a mixture of BigData and...
In the past, several scaling tests have been performed on the HTCondor batch system regarding its job scheduling capabilities. In this talk we report on a first set of scalability measurements of the file transfer capabilities of the HTCondor batch system. Motivated by the GLUEX experiment needs we evaluate the limits and possible use of HTCondor as a solution to transport the output of jobs...
The design of the CMS detector is specially optimized for muon measurements and includes gas-ionization detector technologies to make up the muon system. Cathode strip chambers (CSC) with both tracking and triggering capabilities are installed in the forward region. The first stage of muon reconstruction deals with information from within individual muon chambers and is thus called local...
In the last few years the European Union has launched several initiatives aiming to support the development of an European-based HPC industrial/academic eco-system made of scientific and data analysis application experts, software developers and computer technology providers. In this framework the ExaNeSt and EuroExa projects respectively funded in H2020 research framework programs call...
One of the most important aspects of data processing at LHC experiments is the particle identification (PID) algorithm. In LHCb, several different sub-detector systems provide PID information: the Ring Imaging Cherenkov detectors, the hadronic and electromagnetic calorimeters, and the muon chambers. The charged PID based on the sub-detectors response is considered as a machine learning problem...
Current computing paradigms often involve concepts like microservices, containerisation and, of course, Cloud Computing.
Scientific computing facilities, however, are usually conservatively managed through plain batch systems and as such can cater to a limited range of use cases. On the other side, scientific computing needs are in general orthogonal to each other in several dimensions.
We...
In the latest years, CNAF worked at a project of Long Term Data Preservation (LTDP) for the CDF experiment, that ran at Fermilab after 1985. A part of this project has the goal of archiving data produced during Run I into recent and reliable storage devices, in order to preserve their availability for further access through not obsolete technologies. In this paper, we report and explain the...
The CMS muon system presently consists of three detector technologies equipping different regions of the spectrometer. Drift Tube chambers (DT) are installed in the muon system barrel, while Cathode Strip Chambers (CSC) cover the end-caps; both serve as tracking and triggering detectors. Moreover, Resistive Plate Chambers (RPC) complement DT and CSC in barrel and end-caps respectively and are...
The Cloud Area Padovana (CAP) is, since 2014, a scientific IaaS cloud, spread across two different sites: the INFN Padova Unit and the INFN Legnaro National Labs. It provides about 1100 logical cores and 50 TB of storage. The entire computing facility, owned by INFN, satisfies the computational and storage demands of more than 100 users afferent to about 30 research projects, mainly related to...
Most supercomputers provide computing resources that are shared between users and projects, with utilization determined by predefined policies, load and quotas. The efficiency of the utilization of resources in terms of user/project depends on factors such as particular supercomputer policy and dynamic workload of supercomputer based on users' activities. The load on a resource is...
The CMS muon system presently consists of three detector technologies equipping different regions of the spectrometer. Drift Tube chambers (DT) are installed in the muon system barrel, while Cathode Strip Chambers (CSC) cover the end-caps; both serve as tracking and triggering detectors. Moreover, Resistive Plate Chambers (RPC) complement DT and CSC in barrel and end-caps respectively and are...
At the start of 2017, GridPP deployed VacMon, a new monitoring system
suitable for recording and visualising the usage of virtual machines and
containers at multiple sites. The system uses short JSON messages
transmitted by logical machine lifecycle managers such as Vac and
Vcycle. These are directed to a VacMon logging service which records the
messages in an ElasticSearch database. The...
We want to propose here a smooth migration plan for ROOT in order to have for 2040 at least and last an acceptable histogram class (a goal clearly not stated in the HSF common white paper for HL-LHC for 2020), but also to have a solid rock basement at this time for good part of this toolkit (IO, plotting, graphics, UI, math, etc...). The proposal is going to be technical because centred on a...
Starting with Upgrade 1 in 2021, LHCb will move to a purely software-based trigger system. Therefore, the new trigger strategy is to process events at the full rate of 30MHz. Given that the increase of CPU performance has slowed down in recent years, the predicted performance of the software trigger currently falls short of the necessary 30MHz throughput. To cope with this shortfall, LHCb's...
For a successful experiment, it is of utmost importance to provide a consistent detector description originating from a single source of information. This is also the main motivation behind DD4hep, which addresses detector description in a broad sense including the geometry and the materials used in the device, and additionally parameters describing, e.g., the detection techniques, constants...
Muons with high momentum -- above 500 GeV/c -- are an important constituent of new physics signatures in many models. Run-2 of the LHC is greatly increasing ATLAS's sensitivity to such signatures thanks to an ever-larger dataset of such particles. The ATLAS Muon Spectrometer chamber alignment contributes significantly to the uncertainty of the reconstruction of these high-momentum objects. The...
Gas Electron Multiplier (GEM) based detectors have been used in many applications since their introduction in 1997. Large areas of GEM are foreseen in several experiments such as the future upgrade of the CMS muon detection system, where triple GEM based detectors will be installed and operated. During the assembly and operation, GEM foils are stretched in order to keep the vertical distance...
Various workflows used by ATLAS Distributed Computing (ADC) are now using object stores as a convenient storage resource via boto S3 libraries. The load and performance requirement varies widely across the different workflows and for heavier cases it has been useful to understand the limits of the underlying object store implementation. This work describes the performance of various object...
The CBM experiment is a future fixed-target experiment at FAIR/GSI (Darmstadt, Germany). It is being designed to study heavy-ion collisions at extremely high interaction rates of up to 10 MHz. Therefore, the experiment will use a very novel concept of data processing based on free streaming triggerless front-end electronics. In CBM time-stamped data will be collected into a readout buffer in a...
In view of the LHC Run3 starting in 2021, the ALICE experiment is preparing a major upgrade including the construction of an entirely new inner silicon tracker (the Inner Tracking System) and a complete renewal of its Online and Offline systems (O²).
In this context, one of the requirements for a prompt calibration of external detectors and a fast offline data processing is to run online the...
The upcoming LHC Run 3 brings new challenges for the ALICE online reconstruction which will be used also for the offline data processing in the O2 (combined Online-Offline) framework. To improve the accuracy of the existing online algorithms they need to be enhanced with all the necessary offline features, while still satisfying speed requirements of the synchronous data processing.
Here we...
We describe the central operation of the ATLAS distributed computing system. The majority of compute intensive activities within ATLAS are carried out on some 350,000 CPU cores on the Grid, augmented by opportunistic usage of significant HPC and volunteer resources. The increasing scale, and challenging new payloads, demand fine-tuning of operational procedures together with timely...
The University of Adelaide has invested several million dollars in the Phoenix HPC facility. Phoenix features a large number of GPUs, which were
critical to its entry in the June 2016 Top500 supercomputing list. The status of high performance computing in Australia relative to other nations
poses a unique challenge to researchers, in particular those involved in computationally intensive...
Since the start of 2017, the RAL Tier-1’s Echo object store has been providing disk storage to the LHC experiments. Echo provides access via both the GridFTP and XRootD protocols. GridFTP is primarily used for WAN transfers between sites while XRootD is used for data analysis.
Object stores and those using erasure coding in particular are designed to efficiently serve entire objects which...
The processing of ATLAS event data requires access to conditions data which is stored in database systems. This data includes, for example alignment, calibration, and configuration information which may be characterized by large volumes, diverse content, and/or information which evolves over time as refinements are made in those conditions. Additional layers of complexity are added by the...
The LLVM community advances its C++ Modules technology providing an io-efficient, on-disk code representation capable of reducing build times and peak memory usage. Significant amount of efforts were invested in teaching ROOT and its toolchain to operate with clang's implementation of the C++ Modules. Currently, C++ Modules files are used by: cling to avoid header re-parsing; rootcling to...
Following the deployment of OpenShift Origin by the CERN Web Frameworks team in 2016, this Platform-as-a-Service “PaaS” solution oriented for web applications has become rapidly a key component of the CERN Web Services infrastructure. We will present the evolution of the PaaS service since its introduction, detailed usage trends and statistics, its integration with other CERN services and the...
Lattice QCD (LQCD) is a well-established non-perturbative approach to solving the quantum chromodynamics (QCD) theory of quarks and gluons. It is understood that future LQCD calculations will require exascale computing capacities and workload management system (WMS) in order to manage them efficiently.
In this talk we will discuss the use of the PanDA WMS for LQCD simulations. The PanDA WMS...
With the planned addition of the tracking information in the Level 1 trigger in CMS for the HL-LHC, the algorithms for Level 1 trigger can be completely reconceptualized. Following the example for offline reconstruction in CMS to use complementary subsystem information and mitigate pileup, we explore the feasibility of using Particle Flow-like and pileup per particle identification techniques...
A core component of particle tracking algorithms in LHC experiments is the Kalman Filter. Its capability to iteratively model dynamics (linear or non-linear) in noisy data makes it powerful for state estimation and extrapolation in a combinatorial track builder (the CKF). In practice, the CKF computational cost scales quadratically with the detector occupancy and will become a heavy burden on...
Starting from 2017, during CMS Phase-I, the increased accelerator luminosity with the consequently increased number of simultaneous proton-proton collisions (pile-up) will pose significant new challenges for the CMS experiment. The main goal of the HLT is to apply a specific set of physics selection algorithms and to accept the events with the most interesting physics content. To cope with the...
The Alpha Magnetic Spectrometer (AMS) is a high energy physics experiment installed and operating on board of the International Space Station (ISS) from May 2011 and expected to last through Year 2024 and beyond. More than 50 million of CPU hours has been delivered for AMS Monte Carlo simulations using NERSC and ALCF facilities in 2017. The details of porting of the AMS software to the 2nd...
Systems of linear algebraic equations (SLEs) with heptadiagonal (HD), pentadiagonal (PD) and tridiagonal (TD) coefficient matrices arise in many scientific problems. Three symbolic algorithms for solving SLEs with HD, PD and TD coefficient matrices are considered. The only assumption on the coefficient matrix is nonsingularity. These algorithms are implemented using the GiNaC library of C++...
The LHCb experiment uses a custom made C++ detector and geometry description toolkit, integrated with the Gaudi framework, designed in the early 2000s when the LHCb software was first implemented. With the LHCb upgrade scheduled for 2021, it is necessary for the experiment to review this choice to adapt to the evolution of software and computing (need to support multi-threading, importance of...
The accurate calculation of the power usage effectiveness (PUE) is the most important factor when trying to analyse the overall efficiency of the power consumption in a big data center. In the INFN CNAF Tier-1, a new monitoring infrastructure as Building Management System (BMS) was implemented during the last years using the Schneider StruxureWare Building Operation (SBO) software. During this...
Despite their frequent use, the hadronic models implemented in Geant4 have shown severe limitations in reproducing the measured yield of secondaries in ions interaction below 100 MeV/A, in term of production rates, angular and energy distributions [1,2,3]. We will present a benchmark of the Geant4 models with double-differential cross section and angular distributions of the secondary...
The CMS experiment has an HTCondor Global Pool, composed of more than 200K CPU cores available for Monte Carlo production and the analysis of data. The submission of user jobs to this pool is handled by either CRAB3, the standard workflow management tool used by CMS users to submit analysis jobs requiring event processing of large amounts of data, or by CMS Connect, a service focused on final...
ALFA is a modern software framework for simulation, reconstruction and analysis of particle physics experiments. ALFA provides building blocks for highly parallelized processing pipelines required by the next generation of experiments, e.g. the upgraded ALICE detector or the FAIR experiments. The FairMQ library in ALFA provides the means to easily create actors (so-called devices) that...
The "File Transfer Service" (FTS) has been proven capable of satisfying the equirements – in terms of functionality, reliability and volume – of three major LHC experiments: ATLAS, CMS and LHCb.
We believe small experiments, or individual scientists, can also benefit from FTS advantages, and integrate it into their frameworks, allowing to effectively outsource the complexities of data...
GlideinWMS is a workload management system that allows different scientific communities, or Virtual Organizations (VO), to share computing resources distributed over independent sites. A dynamically sized pool of resources is created by different VO-independent glideinWMS pilot factories, based on the requests made by the several VO-dependant glideinWMS frontends. For example, the CMS VO...
This work describes the technique of remote data access from computational jobs on the ATLAS data grid. In comparison to traditional data movement and stage-in approaches it is well suited for data transfers which are asynchronous with respect to the job execution. Hence, it can be used for optimization of data access patterns based on various policies. In this study, remote data access is...
At IHEP, computing resources are contributed by different experiments including BES, JUNO, DYW, HXMT, etc. The resources were divided into different partitions to satisfy the dedicated experiment data processing requirements. IHEP had a local torque maui cluster with 50 queues serving for above 10 experiments. The separated resource partitions leaded to resource imbalance load. Sometimes, BES...
The Muon to Central Trigger Processor Interface (MUCTPI) of the ATLAS experiment at the
Large Hadron Collider (LHC) at CERN is being upgraded for the next run of the LHC in order
to use optical inputs and to provide full-precision information for muon candidates to the
topological trigger processor (L1TOPO) of the Level-1 trigger system. The new MUCTPI is
implemented as a single ATCA blade...
CERN IT department is providing production services to run container technologies. Given that, the IT-DB team, responsible to run the Java based platforms, has started a new project to move the WebLogic deployments from virtual or bare metal servers to containers: Docker together with Kubernetes allow us to improve the overall productivity of the team, reducing operations time and speeding up...
In early 2016 CERN IT created a new project to consolidate and centralise Elasticsearch instances across the site, with the aim to offer a production quality new IT services to experiments and departements. We'll present the solutions we adapted for securing the system using open source only tools, which allowes us to consolidate up to 20 different use cases on a single Elasticsearch cluster.
Tier-1 for CMS was created in JINR in 2015. It is important to keep an eye on the Tier-1 center all the time in order to maintain its performance. The one monitoring system is based on Nagios: it monitors the center on the several levels: engineering infrastructure, network and hardware. It collects many metrics, creates plots and determines some statuses like HDD state, temperatures, loads...
The Italian Tier1 center is mainly focused on LHC and physics experiments in general. Recently we tried to widen our area of activity and established a collaboration with the University of Bologna to set-up an area inside our computing center for hosting expriments with high demands of security and privacy requirements on stored data. The first experiment we are going to host is Harmony, a...
The high data rates expected for the next generation of particle physics experiments (e.g.: new experiments at FAIR/GSI and the upgrade of CERN experiments) call for dedicated attention with respect to design of the needed computing infrastructure. The common ALICE-FAIR framework ALFA is a modern software layer, that serves as a platform for simulation, reconstruction and analysis of particle...
SHiP is a new proposed fixed-target experiment at the CERN SPS accelerator. The goal of the experiment is to search for hidden particles predicted by models of Hidden Sectors. The purpose of the SHiP Spectrometer Tracker is to reconstruct tracks of charged particles from the decay of neutral New Physics objects with high efficiency. Efficiency of the track reconstruction depends on the...
The goal of the project is to improve the computing network topology and performance of the China IHEP Data Center taking into account growing numbers of hosts, experiments and computing resources. The analysis of the computing performance of the IHEP Data Center in order to optimize its distributed data processing system is a really hard problem due to the great scale and complexity of shared...
Full MC simulation is a powerful tool for designing new detectors and guide the construction of new prototypes.
Improved micro-structure technology has lead to the rise of Micro-Pattern Gas Detectors (MPGDs), with main features: fexible geometry; high rate capability; excellent spatial resolution; and reduced radiation length. A new detector layout, the Fast Timing MPGD (FTM), could combine...
Software is an essential and rapidly evolving component of modern high energy physics research. The ability to be agile and take advantage of new and updated packages from the wider data science community is allowing physicists to efficiently utilise the data available to them. However, these packages often introduce complex dependency chains and evolve rapidly introducing specific, and...
Since the beginning of the WLCG Project the Spanish ATLAS computer centres have contributed with reliable and stable resources as well as personnel for the ATLAS Collaboration.
Our contribution to the ATLAS Tier2s and Tier1s computing resources (disk and CPUs) in the last 10 years has been around 5%, even though the Spanish contribution to the ATLAS detector construction as well as the number...
Track finding procedure is one of the key steps of events reconstruction in high energy physics experiments. Track finding algorithms combine hits into tracks and reconstruct trajectories of particles flying through the detector. The tracking procedure is considered as an extremely time consuming task because of large combinatorics. Thus, calculation speed is crucial in heavy ion experiments,...
SPT-3G, the third generation camera on the South Pole Telescope (SPT), was deployed in the 2016-2017 Austral summer season. The SPT is a 10-meter telescope located at the geographic South Pole and designed for observations in the millimeter-wave and submillimeter-wave regions of the electromagnetic spectrum. The SPT is primarily used to study the Cosmic Microwave Background (CMB). The upgraded...
The ATLAS experiment is operated daily by many users and experts working concurrently on several aspects of the detector.
The safe and optimal access to the various software and hardware resources of the experiment is guaranteed by a role-based access control system (RBAC) provided by the ATLAS Trigger and Data Acquisition (TDAQ) system. The roles are defined by an inheritance hierarchy....
Events containing muons in the final state are an important signature
for many analyses being carried out at the Large Hadron Collider
(LHC), including both standard model measurements and searches for new
physics. To be able to study such events, it is required to have an
efficient and well-understood muon trigger. The ATLAS muon trigger
consists of a hardware based system (Level 1), as well...
The Online Luminosity software of the ATLAS experiment has been upgraded in the last two years to improve scalability, robustness, and redundancy and to increase automation keeping Run-3 requirements in mind.
The software package is responsible for computing the instantaneous and integrated luminosity for particle collisions at the ATLAS interaction point at the Large Hadron Collider (LHC)....
The ATLAS experiment records about 1 kHz of physics collisions, starting from an LHC design bunch crossing rate of 40 MHz. To reduce the large background rate while maintaining a high selection efficiency for rare and Beyond-the-Standard-Model physics events, a two-level trigger system is used.
Events are selected based on physics signatures, such as the presence
of energetic leptons,...
Physics analyses at the LHC require accurate simulations of the detector response and the event selection processes. The accurate simulation of the trigger response is crucial for determining the overall selection efficiencies and signal sensitivities. For the generation and reconstruction of simulated event data, the most recent software releases are used to ensure the best agreement between...
In HEP experiments at LHC the database applications often become complex by reflecting the ever demanding requirements of the researchers. The ATLAS experiment has several Oracle DB clusters with over 216 database schemes each with its own set of database objects. To effectively monitor them, we designed a modern and portable application with exceptionally good characteristics. Some of them...
University ITMO (ifmo.ru) is developing the cloud of geographically distributed data
centers. The “geographically distributed” means data centers (DC) located in
different places far from each other by hundred or thousand kilometers.
Usage of the geographically distributed data centers promises a number of advantages
for end users such as opportunity to add additional DC and service...
The High Performance Computing (HPC) domain aims to optimize code in order to use the last multicore and parallel technologies including specific processor instructions. In this computing framework, portability and reproducibility are key concepts. A way to handle these requirements is to use Linux containers. These "light virtual machines" allow to encapsulate applications within its...
In January 2017, a consortium of European companies, research labs, universities, and education networks started the “Up to University” project (Up2U). Up2U is a 3-year EU-funded project that aims at creating a bridge between high schools and higher education. Up2U addresses both the technological and methodological gaps between secondary school and higher education by (a.) provisioning the...
A tape system usually comprises lots of tape drives, several thousand or even tens of thousands of cartridges, robots, software applications and machines which are running these applications. All involved components are able to log failures and statistical data. However, correlation is a laborious and ambiguous process and a wrong interpretation can easily result in a wrong decision. A single...
The GridKa center is serving the ALICE, ATLAS, CMS, LHCb and Belle-II experiments as one of the biggest WLCG Tier-1 centers world wide with compute and storage resources. It is operated by the Steinbuch Centre for Computing at Karlsruhe Institute of Technology in Germany. In this presentation, we will describe the current status of the compute, online and offline storage resources and we will...
Online selection is an essential step to collect the most interesting collisions among a very large number of events delivered by the ATLAS detector at the Large Hadron Collider (LHC). The Fast TracKer (FTK) is a hardware based track finder, for the ATLAS trigger system, that rapidly identifies important physics processes through their track-based signatures, in the Inner Detector pixel and...
The INFN scientific computing infrastructure is composed of more than 30 sites, ranging from CNAF (Tier-1 for LHC and main data center for nearly 30 other experiments) and 9 LHC Tier-2s to ~20 smaller sites, including LHC Tier-3s and not-LHC experiment farms.
A comprehensive review of the installed resources, together with plans for the near future, has been collected during the second half of...
The Production and Distributed Analysis system (PanDA) is a pilot-based workload management system that was originally designed for the ATLAS Experiment at the LHC to operate on grid sites. Since the coming LHC data taking runs will require more resources than grid computing alone can provide, the various LHC experiments are engaged in an ambitious program to extend the computing model to...
Modern experiments demand a powerful and efficient Data Acquisition System (DAQ). The intelligent, FPGA-based Data Acquisition System (iFDAQ) of the COMPASS experiment at CERN is composed of many processes communicating between each other. The DIALOG library covers a communication mechanism between processes and establishes a communication layer to each of them. It has been introduced to the...
JAliEn (Java-AliEn) is the ALICE’s next generation Grid framework which will be used for the top-level distributed computing resources management during the LHC Run3 and onward. While preserving an interface familiar to the ALICE users, its performance and scalability are an order of magnitude better than the currently used system.
To enhance the JAliEn security, we have developed the...
Cloud computing became a routine tool for scientists in many domains. The JINR cloud infrastructure provides JINR users computational resources for performing various scientific calculations. In order to speed up achievements of scientific results the JINR cloud service for parallel applications was developed. It consists of several components and implements a flexible and modular architecture...
The ProtoDUNE-SP is a single-phase liquid argon time projection chamber (LArTPC) prototype for the Deep Underground Neutrino Experiment (DUNE). Signals from 15,360 electronic channels are received by 60 Reconfigurable Cluster Elements (RCEs), which are processing elements designed at SLAC for a wide range of applications and are based upon the "system-on-chip” Xilinx Zynq family of FPGAs....
The HEP community has voted strongly with its feet to adopt ROOT as the current de facto analysis toolkit. It is used to write out and store our RAW data, our reconstructed data, and to drive our analysis. Almost all modern data models in particle physics are written in ROOT. New tools in industry have are making appearance in particle physics analysis, however, driven by the massive interest...
Up until September 2017 LHCb Online was running on Puppet 3.5 Master/Server non redundant architecture. As a result, we had problem with outages, both planned and unplanned, as well as with scalability issues (How do you run 3000 nodes at the same time? How do you even run 100 without bringing down the Puppet Master). On top of that Puppet 5.0 was released, so we were running now 2 versions...
The CernVM File System (CernVM-FS) provides a scalable and reliable software distribution service implemented as a POSIX read-only filesystem in user space (FUSE). It was originally developed at CERN to assist High Energy Physics (HEP) collaborations in deploying software on the worldwide distributed computing infrastructure for data processing applications. Files are stored remotely as...
I describe the charged-track extrapolation and muon-identification modules in the Belle II data-analysis code framework (basf2). These modules use GEANT4E to extrapolate reconstructed charged tracks outward from the Belle II Central Drift Chamber into the outer particle-identification detectors, the electromagnetic calorimeter, and the K-long and muon detector (KLM). These modules propagate...
The Baryonic Matter at Nuclotron (BM@N) experiment represents the 1st phase of Nuclotron-based Ion Collider fAcility (NICA) Mega science project at the Joint Institute for Nuclear Research. It is a fixed target experiment built for studying nuclear matter in conditions of extreme density and temperature.
The tracking system of the BM@N experiment consists of three main detector systems:...
We describe the development of a tool (Trident) using a three pronged approach to analysing node utilisation while aiming to be user friendly. The three areas of focus are data IO, CPU core and memory.
Compute applications running in a batch system node will stress different parts of the node over time. It is usual to look at metrics such as CPU load average and memory consumed. However,...
One of the major challenges for the Compact Muon Solenoid (CMS) experiment, is the task of reducing event rate from roughly 40 MHz down to a more manageable 1 kHz while keeping as many interesting physics events as possible. This is accomplished through the use of a Level-1 (L1) hardware based trigger as well as a software based High-Level-Trigger (HLT). Monitoring and understanding the output...
Hadronic signatures are critical to the ATLAS physics program, and are used extensively for both Standard Model measurements and searches for new physics. These signatures include generic quark and gluon jets, as well as jets originating from b-quarks or the decay of massive particles (such as electroweak bosons or top quarks). Additionally, missing transverse momentum from non-interacting...
The ATLAS Distributed Computing system uses the Frontier system to access the Conditions, Trigger, and Geometry database data stored in the Oracle Offline Database at CERN by means of the http protocol. All ATLAS computing sites use squid web proxies to cache the data, greatly reducing the load on the Frontier servers and the databases. One feature of the Frontier client is that in the event...
We report status of the CMS full simulation for run-2. Initially, Geant4 10.0p02 was used in sequential mode, about 16 billion events were produced for analysis of 2015-2016 data. In 2017, the CMS detector was updated: new tracking pixel detector is installed, hadronic calorimeter electronics is modified, and extra muon detectors are added. Corresponding modifications were introduced in the...
CVMFS helps ATLAS in distributing software to the Grid, and isolating software lookup to batch nodes’ local filesystems. But CVMFS is rarely available in HPC environments. ATLAS computing has experimented with "fat" containers, and later developed an environment to produce such containers for both Shifter and Singularity. The fat containers include most of the recent ATLAS software releases,...
The Queen Mary University of London Grid site has investigated the use of its' Lustre file system to support Hadoop work flows using the newly open sourced Hadoop adaptor for Lustre. Lustre is an open source, POSIX compatible, clustered file system often used in high performance computing clusters and, is often paired with the SLURM batch system as it is at Queen Mary. Hadoop is an open-source...
The use of machine learning techniques for classification is well established. They are applied widely to improve the signal-to-noise ratio and the sensitivity of searches for new physics at colliders. In this study I explore the use of machine learning for optimizing the output of high precision experiments by selecting the most sensitive variables to the quantity being measured. The precise...
Containers are becoming ubiquitous within the WLCG with CMS announcing a requirement for Singularity at supporting sites in 2018. The ubiquity of containers means it is now possible to reify configuration along with applications as a single easy to deploy unit rather than via a myriad of configuration management tools such as Puppet, Ansible or Salt. This allows more use of industry devops...
ZFS is a powerful storage management technology combining filesystem, volume management and software raid technology into a single solution. The WLCG Tier2 computing at Edinburgh was an early adopter of ZFS on Linux, with this technology being used to manage all of our storage systems including servers with aging components. Our experiences of ZFS deployment have been shared with the Grid...
As most are fully aware, cybersecurity attacks are an ever-growing problem as larger parts of our lives take place on-line. Distributed digital infrastructures are no exception and action must be taken to both reduce the security risk and to handle security incidents when they inevitably happen. These activities are carried out by the various e-Infrastructures and it has become very clear in...
Built upon the Xrootd Proxy Cache (Xcache), we developed additional features to adapt the ATLAS distributed computing and data environment, especially its data management system Rucio, to help improve the cache hit rate, as well as features that make the Xcache easy to use, similar to the way the Squid cache is used by the HTTP protocol. We packaged the software in CVMFS and in singularity...
XRootD is distributed low-latency file access system with its own communication protocol and scalable, plugin based architecture. It is the primary data access framework for the high-energy physics community, and the backbone of the EOS service at CERN.
In order to bring the potential of Erasure Coding (EC) to the XrootD / EOS ecosystem an effort has been undertaken to implement a native EC...
XRootD has been established as a standard for WAN data access in HEP and HENP. Site specific features, like those existing at GSI, have historically been hard to implement with native methods. XRootD allows a custom replacement of basic functionality for native XRootD functions through the use of plug-ins. XRootD clients allow this since version 4.0. In this contribution, our XRootD based...
IBM Spectrum Protect (ISP) software, one of the leader solutions in data protection, contributes to the data management infrastructure operated at CNAF, the central computing and storage facility of INFN (Istituto Nazionale di Fisica Nucleare – Italian National Institute for Nuclear Physics). It is used to manage about 44 Petabytes of scientific data produced by LHC (Large Hadron Collider at...