To accomplish its mission, the European Centre for Nuclear Research (CERN, Switzerland) is committed to the continuous development of its personnel through a systematic and sustained learning culture, that aims at keeping the knowledge and competences of the personnel in line with the evolving needs of the Organisation.
With this goal in mind, CERN supports learning in its broadest sense and...
RooFit and RooStats, the toolkits for statistical modelling in ROOT, are used in most searches and measurements at the Large Hadron Collider, as well as B factories. The large datasets to be collected in Run 3 will enable measurements with higher precision, but will require faster data processing to keep fitting times stable.
In this talk, a redesign of RooFit’s internal dataflow will be...
Based on work in the ROOTLINQ project, we’ve re-written a functional declarative analysis language in Python. With a declarative language, the physicist specifies what they want to do with the data, rather than how they want to do it. Then the system translates the intent into actions. Using declarative languages would have numerous benefits for the LHC community, ranging from analysis...
The Gitlab continuous integration system (http://gitlab.com) is an invaluable tool for software developer to test and validate their software. LHCb analysts have also been using it to validate physics software tools and data analysis scripts, but this usage faced issues differing from standard software testing, as it requires significant amount of CPU resources and credentials to access...
A present-day detection system for charged tracks in particle physics experiments is typically composed of two or more types of detectors. Then global track finding with these sub-detectors is one important topic. This contribution is to describe a global track finding algorithm with Hough Transform for a detection system consist of a Cylindrical-Gas-Electron-Multiplier (CGEM) and a Drift...
The Open Science Grid (OSG) provides a common service for resource providers and scientific institutions, and supports sciences such as High Energy Physics, Structural Biology, and other community sciences. As scientific frontiers expand, so does the need for resources to analyze new data. For example, high energy physics (LHC) sciences foresee an exponential growth in the amount of data...
The ARM platform extends from the mobile phone area to development board computers and servers. It could be that in the future the importance of the ARM platform will increase if new more powerful (server) boards are released. For this reason CMSSW has previously been ported to ARM in earlier work.
The CMS software is deployed using CVMFS and the jobs are run inside Singularity containers....
We will present techniques developed in collaboration with the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) and SLATE (NSF Award #1724821) for orchestrating software defined network slices with a goal of building reproducible and reliable computer networks for large data collaborations. With this project we have explored methods of utilizing passive and active measurements to...
BAT.jl, the Julia version of the Bayesian Analysis Toolkit, is a software package which is designed to help solve statistical problems encountered in Bayesian inference. Typical examples are the extraction of the values of the free parameters of a model, the comparison of different models in the light of a given data set, and the test of the validity of a model to represent the data set at...
Many physics analyses using the Compact Muon Solenoid (CMS) detector at the LHC require accurate, high resolution electron and photon energy measurements. Excellent energy resolution is crucial for studies of Higgs boson decays with electromagnetic particles in the final state, as well as searches for very high mass resonances decaying to energetic photons or electrons. The CMS electromagnetic...
During the last few years, the EOS distributed storage system at CERN has seen a steady increase in use, both in terms of traffic volume as well as sheer amount of stored data.
This has brought the unwelcome side effect of stretching the EOS software stack to its design constraints, resulting in frequent user-facing issues and occasional downtime of critical services.
In this paper, we...
The LHCb detector will be upgraded in 2021, where the hardware-level trigger will be replaced by a High Level Trigger 1 software trigger that needs to process the full 30 MHz data-collision rate. As part of the efforts to create a GPU High Level Trigger 1, tracking algorithms need to be optimized for SIMD architectures in order to achieve high-throughput. We present a SPMD (Single Program,...
CERN is launching the Science Gateway, a new scientific education and outreach centre targeting the general public of all ages. Construction is planned to start in 2020 and to be completed in 2022. In addition to Physics exhibits, the Science Gateway will include immersive, hands-on activities that explore Computer Science and Technology. This poster will present the methodology used to...
Conditions databases is an important class of database applications where the database is used
to record the state of a set of quantities as a function of observation time.
Conditions databases are used in Hight Energy Physics to record the state of
the detector apparatus during data taking, and then to use the data during
the event reconstruction and analysis phases.
At FNAL, we...
China Spallation Neutron Source (CSNS) is a large science facility, and it is public available to researchers from all over the world. The data platform of CSNS is aimed for diverse data and computing supports, the design philosophy behind is data safety, big-data sharing, and user convenience.
In order to manage scientific data, a metadata catalogue based on ICAT is built to manage full...
ATLAS Metadata Interface (AMI) is a generic ecosystem for metadata aggregation, transformation and cataloging benefiting from about 20 years of feedback in the LHC context. This poster describes the design principles of the Metadata Querying Language (MQL) implemented in AMI, a metadata-oriented domain-specific language allowing to query databases without knowing the relation between tables....
During the third long shutdown of the CERN Large Hadron Collider, the CMS Detector will undergo a major upgrade to prepare for Phase-2 of the CMS physics program, starting around 2026. Upgrade projects will replace or improve detector systems to provide the necessary physics performance under the challenging conditions of high luminosity at the HL-LHC. Among other upgrades, the new CMS...
With the evolution of the WLCG towards opportunistic resource usage and cross-site data access, new challenges for data analysis have emerged in recent years. To enable performant data access without relying on static data locality, distributed caching aims at providing data locality dynamically. Recent work successfully employs various approaches for effective and coherent caching, from...
To address the increase in computational costs and speed requirements for simulation related to the higher luminosity and energy of future accelerators, a number of Fast Simulation tools based on Deep Learning (DL) procedures have been developed. We discuss the features and implementation of an end-to-end framework which integrates DL simulation methods with an existing Full Simulations...
The development of the Interactive Visual Explorer (InVEx), a visual analytics tool for ATLAS computing metadata, includes research of various approaches for data handling both on server and on client sides. InVEx is implemented as a web-based application which aims at the enhancing of analytical and visualization capabilities of the existing monitoring tools, and facilitate the process of...
With the explosion of the number of distributed applications, a new dynamic server environment emerged grouping servers into clusters, utilization of which depends on the current demand for the application. To provide reliable and smooth services it is crucial to detect and fix possible erratic behavior of individual servers in these clusters. Use of standard techniques for this purpose...
Large experiments in high energy physics require efficient and scalable monitoring solutions to digest data of the detector control system. Plotting multiple graphs in the slow control system and extracting historical data for long time periods are resource intensive tasks. The proposed solution leverages the new virtualization, data analytics and visualization technologies such as InfluxDB...
System on Chip (SoC) devices have become popular for custom electronics HEP boards. Advantages include the tight integration of FPGA logic with CPU, and the option for having relatively powerful CPUs, with the potential of running a fully fledged operating system.
In the CMS trigger and data acquisition system, there are already a small number of back-end electronics boards with Xilinx Zync...
The DUNE Collaboration has successfully implemented and currently operates
an experimental program based at CERN which includes a beam test and an extended
cosmic ray run of two large-scale prototypes of the DUNE Far Detector. The volume of data already collected by the protoDUNE-SP (the single-phase Liquid Argon TPC prototype) amounts to approximately 3PB and the sustained rate of data sent...
The Load Balance Service at CERN handles more that 400 aliases,
distributed over more than 2000 nodes. After being in production for
more than thirteen years, it has been going through a mayor redesign
over the last two years. Last year, the server part got reimplemented in
golang, taking advantage of the concurrency features offered by the
language to improve the scaling of the system. This...
ABSTRACT
Apache Spark is a splendid framework for big data analysis nowadays. A Spark application can be divided into some jobs which are triggered by an action of RDD, then the jobs will be divided into stages by the DAGScheduler, after these processes, we will get the task which is a unit of work within a stage, corresponding to one RDD partition.
Task is the smallest unit when Spark...
BNL SDCC(Sentific Data and Computing Center) recently enabled centralized identity management solution. With SSO authentication process being enabled to cross multiple IT systems or organizations including federated login access via CILogon InCommon. With the combination of MFA/DUO to meet security standards for various application & services such as Jupyterhub/Invenio provided to the...
The goal to obtain more precise physics results in current collider experiments drives the plans to significantly increase the instantaneous luminosity collected by the experiments. The increasing complexity of the events due to the resulting increased pileup requires new approaches to triggering, reconstruction, analysis,
and event simulation. The last task brings to a critical problem:...
The second-generation Belle II experiment at the SuperKEKB colliding-beam accelerator in Japan searches for new-physics signatures and studies the behaviour of heavy quarks and leptons produced in electron-positron collisions. The KLM (K-long and Muon) subsystem of Belle II identifies long-lived neutral kaons via hadronic-shower byproducts and muons via their undeflected penetration through...
Triple-GEM detectors are gaseous devices used in high energy physics to measure the path of the particles which cross them. The characterisation of triple GEM detectors and the estimation of the performance for real data experiments require a complete comprehension of the mechanisms which transform the passage of one particle in the detector into electric signals, and dedicated MonteCarlo...
NOvA is a long-baseline neutrino experiment aiming to study neutrino oscillation phenomenon in the muon neutrino beam from complex NuMI at Fermilab (USA). Two identical detectors have been built to measure the initial neutrino flux spectra at the near site and the oscillated one at a 810 km distance, which significantly reduces many systematic uncertainties. To improve electron neutrino and...
This paper presents the network architecture of the TIER 1 data center at JINR using the modern multichannel data transfer protocol TRILL. The obtained experimental data folow our activity to further study the nature of traffic distribution in redundant topologies. There are several questions. How the distribution of packet data occurs on four (or more) equivalent routes? What happens when the...
ATLAS Metadata Interface (AMI) is a generic ecosystem for metadata aggregation, transformation and cataloging. Benefiting from about 20 years of feedback in the LHC context, the second major version was released in 2018. This poster describes how to install and administrate AMI version 2. A particular focus is given to the registration of existing databases in AMI, the adding of additional...
J. Hollocombe[1] , Eurofusion WPISA CPT, Eurofusion WPCD
1. UKAEA, Culham Science Centre, OX14 3DB
The ITER Data Model has been created to allow for a common data representation to be used by codes simulating ITER relevant physics. A suite of tools has been created to leverage this data structure called the Integrated Modelling & Analysis Suite (IMAS). As part of an exercise to...
The Jiangmen Underground Neutrino Observatory (JUNO) is designed to primarily measure the neutrino mass hierarchy. The JUNO central detector (CD) would be the world largest liquid scintillator (LS) detector with an unprecedented energy resolution of 3\%/\sqrt{E(MeV)} and a superior energy nonlinearity better than 1%. A calibration complex, including Cable Loop System (CLS), Guide Tube...
The JUNO (Jiangmen Underground Neutrino Observatory) is designed to determine the neutrino mass hierarchy and precisely measure oscillation parameters. The JUNO central detector is a 20 kt spherical volume of liquid scintillator (LS) with 35m diameter instrumented with 18,000 20-inch photomultiplier tubes (PMTs). Neutrinos are captured by protons of the target via the inverse beta decay...
In modern physics experiments, data analysis need considerable computing capacity. Computing resources of a single site are often limited and distributed computing is often inexpensive and flexible. While several large-scale grid solutions exist, for example DiRAC (Distributed Infrastructure with Remote Agent Control), there are few schemes devoted to solve the problem at small-scale. For the...
This work addresses key technological challenges in the preparation of data pipelines for machine learning and deep learning at scale of interest for HEP. A novel prototype to improve the event filtering system at LHC experiments, based on a classifier trained using deep neural networks has recently been proposed by T. Nguyen et al. https://arxiv.org/abs/1807.00083. This presentation covers...
The LHCb software stack has to be run in very different computing environments: the trigger farm at CERN, on the grid, on shared clusters, on software developer's desktops... The old model assumes the availability of CVMFS and relies on custom scripts (a.k.a LbScripts) to configure the environment to build and run the software. It lacks flexibility and does not allow, for example running in...
The CMS Collaboration has recently commissioned a new compact data format, named NANOAOD, reducing the per-event compressed size to about 1-2 kB. This is achieved by retaining only high level information on physics objects, and aims at supporting a considerable fraction of CMS physics analyses with a ~20x reduction in disk storage needs. NANOAOD also facilitates the dissemination of analysis...
The Czech Tier-2 center hosted and operated by Institute of Physics of the Czech Academy o Sciences significantly upgraded external network connection in 2019. The older edge router Cisco 6509 provided several 10 Gbps connections via a 10 Gigabit Ethernet Fiber Module, from which 2 ports were used for external LHCONE conection, 1 port for generic internet traffic and 1 port to reach other...
Virtual Monte Carlo (VMC) provides a unified interface to different detector simulation transport engines such as GEANT3 and Geant4. Since recently, all VMC packages: the VMC core library, also included in ROOT, Geant3 and Geant4 VMC are distributed via the VMC Project GitHub organization. In addition to these VMC related packages, the VMC project also includes the Virtual Geometry Model...
The Weakly Interacting Massive Particle or "WIMP" has been a widely studied solution to the dark matter problem. A plausible scenario is that DM is not made up of a single WIMP species, but that it has a multi-component nature. In this talk I give an overview of recently published work in which we studied direct detection signals in the presence of multi-component WIMP-like DM. I will give an...
The building, testing and deployment of coherent large software stacks is very challenging, in particular when they consist of the diverse set of packages required by the LHC experiments, the CERN Beams department and data analysis services such as SWAN. These software stacks comprise a large number of packages (Monte Carlo generators, machine learning tools, Python modules, HEP specific...
Partial wave analysis is an important tool in hadron physics. Large data sets from the experiments in high precision frontier require high computational power. To utilize GPU cluster and the resource of supercomputers with various types of the accelerator, we implement a software framework for partial wave analysis using OpenAcc, OpenAccPWA. OpenAccPWA provides convenient approaches for...
The Production Operations Management System (POMS) is a set of software tools which allows production teams and analysis groups across multiple Fermilab experiments to launch, modify and monitor large scale campaigns of related Monte Carlo or data processing jobs.
POMS provides a web service interface that enables automated jobs submission on distributed resources according to customers’...
The HistFactory
p.d.f. template [CERN-OPEN-2012-016] is per-se independent of its implementation in ROOT
and it is useful to be able to run statistical analysis outside of the ROOT
, RooFit
, RooStats
framework. pyhf
is a pure-python implementation of that statistical model for multi-bin histogram-based analysis and its interval estimation is...
For the latest years, the INFN-CNAF team has been working on the Long Term Data Preservation (LTDP) project for the CDF experiment, active at Fermilab from 1990 to 2011.
The main aims of the project are to protect data of the CDF RUN-2 (4 PB) collected between 2001 and 2011 and already stored on CNAF tapes and to ensure the availability and the access to the analysis facility to those data...
As an important detector spectrum for the Nuclotron-based Ion Collider fAcility(NICA) accelerator complex at JINR, the MultiPurpose Detector(MPD) is proposed to investigate the hot and dense baryonic matter in heavy-ion collisions over a wide range of atomic masses, from Au+Au collisions at a centre-of-mass energy of $\sqrt{s_{nn}}=11GeV(for\ Au^{79+})$ to proton-proton collisions with...
The SAGE2 project is a collaboration between industry, data centres and research institutes demonstrating an exascale-ready system based on layered hierarchical storage and a novel object storage technology. The development of this system is based on a significant co-design exercise between all partners, with the research institutes having well established needs for exascale computing...
The Caltech team in collaboration with network, computer science, and HEP partners at the DOE laboratories and universities, building smart network services ("The Software-defined network for End-to-end Networked Science at Exascale (SENSE) research project") to accelerate scientific discovery.
The overarching goal of SENSE is to enable National Labs and universities to request and...
Current and future end-user analyses and workflows in High Energy Physics demand the processing of growing amounts of data. This plays a major role when looking at the demands in the context of the High-Luminosity-LHC. In order to keep the processing time and turn-around cycles as low as possible analysis clusters optimized with respect to these demands can be used. Since hyperconverged...
Lattice quantum chromodynamics (QCD) has provided great insight into the nature of empty space, but quantum chromodynamics alone does not describe the vacuum in its entirety. Recent developments have introduced Quantum Electrodynamic (QED) effects directly into the generation of lattice gauge field configurations. Using lattice ensembles incorporating fully dynamical QCD and QED effects we are...
This talk describes the deployment of ATLAS offline software in containers for use in production workflows such as simulation and reconstruction. For this purpose we are using Docker and Singularity, which are both lightweight virtualization technologies that can encapsulate software packages inside complete file systems. The deployment of offline releases via containers removes the...
Detector Control Systems (DCS) for modern High-Energy Physics (HEP) experiments are based on complex distributed (and often redundant) hardware and software implementing real-time operational procedures meant to ensuring that the detector is always in a "safe" state, while at the same time maximizing the live time of the detector during beam collisions. Display, archival and often analysis of...
CERN storage architecture is evolving to address run3 and run4 challenges. CTA and EOS integration requires parallel development of features in both software that needs to be synchronized and systematically tested on a specific distributed development infrastructure for each commit in the code base.
CTA Continuous Integration development initially started as a place to run functional system...
In the High Luminosity LHC, planned to start with Run4 in 2026, the ATLAS experiment will be equipped with the Hardware Track Trigger (HTT) system, a dedicated hardware system able to reconstruct tracks in the silicon detectors with short latency. This HTT will be composed of about 700 ATCA boards, based on new technologies available on the market, like high speed links and powerful FPGAs, as...
ALICE (A Large Ion Collider Experiment) is currently ongoing a major upgrade of the detector, read-out and computing system for LHC Run 3. A new facility called O2 (Online-Offline) will perform data acquisition and event processing.
To efficiently operate the experiment and the O2 facility a new observability system has been developed. It will provide a complete overview of the overall...
The eXtreme DataCloud (XDC) project is aimed at developing data management services capable to cope with very large data resources allowing the future e-infrastructures to address the needs of the next generation extreme scale scientific experiments. Started in November 2017, XDC is combining the expertise of 8 large European research organisations, the project aims at developing scalable...
Data growth over several years within HEP experiments requires a wider use of storage systems for WLCG Tiered Centers. It also increases the complexity of storage systems, which includes the expansion of hardware components and thereby complicates existing software products more. To cope with such systems is a non-trivial task and requires highly qualified specialists.
Storing petabytes of...
The Electromagnetic Calorimeter (ECAL) is one of the sub-detectors of the Compact Muon Solenoid (CMS), a general-purpose particle detector at the CERN Large Hadron Collider (LHC). The CMS ECAL Detector Control System (DCS) and the CMS ECAL Safety System (ESS) have supported the detector operations and ensured the detector's integrity since the CMS commissioning phase, more than 10 years ago....
The Virtual Geometry Model (VGM) is a geometry conversion tool, currently providing conversion between Geant4 and ROOT TGeo geometry models. Its design allows the inclusion of another geometry model by implementing a single sub-module instead of writing bilateral converters for all already supported models.
The VGM was last presented at CHEP in 2008 and since then it has been under continuous...
The Tile Calorimeter (TileCal) is a crucial part of the ATLAS detector which jointly with other calorimeters reconstructs hadrons, jets, tau-particles, missing transverse energy and assists in muon identification. It is constructed of alternating iron absorber layers and active scintillating tiles and covers region |eta| < 1.7. The TileCal is regularly monitored by several different systems,...
Support for token-based authentication and authorization has emerged in recent years as a key requirement for storage elements powering WLCG data centers. Authorization tokens represent a flexible and viable alternative to other credential delegation schemes (e.g. proxy certificates) and authorization mechanisms (VOMS) historically used in WLCG, as documented in more detail in other submitted...
Designing new experiments, as well as upgrade of ongoing experiments, is a continuous process in experimental high energy physics. Frontier R&Ds are used to squeeze the maximum physics performance using cutting edge detector technologies.
The evaluating of physics performance for particular configuration includes sketching this configuration in Geant, simulating typical signals and...
The gluon field configurations that form the foundation of every lattice QCD calculation contain a rich diversity of emergent nonperturbative phenomena. Visualisation of these phenomena creates an intuitive understanding of their structure and dynamics. This presentation will illustrate recent advances in observing the chromo-electromagnetic vector fields, their energy and topological charge...
The CernVM File System provides the software and container distribution backbone for most High Energy and Nuclear Physics experiments. It is implemented as a file system in user-space (fuse) module, which permits its execution without any elevated privileges. Yet, mounting the file system in the first place is handled by a privileged suid helper program that is installed by the fuse package on...
In HEP experiments, remote access to control systems is one of the fundamental pillars of efficient operations. At the same time, development of user interfaces with emphasis on usability can be one of the most labor-intensive software tasks to be undertaken in the life cycle of an experiment. While desirable, the development and maintenance of a large variety of interfaces (e.g., desktop...
PyROOT is the name of ROOT’s automatic Python bindings, which allow to access all the ROOT functionality implemented in C++ from Python. Thanks to the ROOT type system and the Cling C++ interpreter, PyROOT creates Python proxies for C++ entities on the fly, thus avoiding to generate static bindings beforehand.
PyROOT has been enhanced and modernised to meet the demands of the HEP Python...
Simulation is an important tool in the R&D process of detectors and their optimization. Fine tuning of detector parameters and running conditions can be achieved by means of advanced simulation tools thus reducing cost associated to prototyping.
This simulation, however, in complex detector geometries, large volumes and high gas gain becomes computationally expensive and can run for several...
The end cap time-of-flight (ETOF) at Beijing Spectrometer (BESIII) was upgraded with multi-gap resistive plate chamber technology in order to improve the particle identification capability. The accurate knowledge of the detector real misalignment is important for getting close to the designed time resolution and the expected reconstruction efficiency of the end cap time-of-flight system. The...
HTCondor, with high scheduling performance, has been widely adopted for HEP clusters. Unlike other schedulers, HTCondor provides loose management functions to the work-nodes. We developed a Maintenance Automation Tool acronym as “HTCondor MAT“, focusing on resource management dynamically and error handing automatically.
A central database is used to record various attributes of all computing...
Running a data center is never a trivial job. In addition to daily
routine tasks, service operation teams have to provide a meaningful
information for monitoring, reporting and access pattern analytic.
The dCache production instances at DESY, produce gigabytes of billing
files per day. However, with a help of modern BigData analysis tools
like Apache-Spark and Jupiter notebooks such task...
Two recent software development projects are described: The first is a framework for generating load for an xrootd based disk caching proxy (known as xcache) and verifying the generated data as delivered by the cache. The second is a service to reduce the effect of network latency on application execution time due to writing files to remote storage via the xrootd protocol. For both projects...
During the long shutdown, ATLAS is preparing several fundamental changes to its offline event processing framework and analysis model. These include moving to multi-threaded reconstruction and simulation and reducing data duplication during derivation analysis by producing a combined mini-xAOD stream. These changes will allow ATLAS to take advantage of the higher luminosity at Run 3 without...
The LHC is expected to increase its center-of-mass energy to 14 GeV and an instantaneous luminosity to 2.4×1034 cm-2s-1 for Run-3 scheduled from 2021 to 2023. In order to cope with the high event rate, an upgrade of the ATLAS trigger system is required.
The level-1 Endcap Muon trigger system identifies muons with high transverse momentum by combining data from a fast muon trigger detector,...
An overview of the Conditions Database (DB) structure for the hadronic Tile Calorimeter (TileCal), one of the ATLAS Detector sub-systems, is presented. ATLAS Conditions DB stores the data on the ORACLE backend, and the design and implementation has been developed using COOL (Conditions Objects for LCG) software package as a common persistency solution for the storage and management of the...
Drift chamber is the main tracking detector for high energy physics experiment like BESIII. Deep learning developments in the last few years have shown tremendous improvements in the analysis of data especially for object classification and parameter regression. Here we present a first study of deep learning architectures applied to BESIII Monte-carlo data to make estimation of the track...
Supercomputer and other high performance computing resources can be useful supplements to the BESIII computing resources for simulation productions and data analysis. The supercomputer Tianhe-2 has ranked the No.1 on the Top500 certificate list for the sixth consecutive times during the year 2013 to 2015. This paper will describe the deployment singularity containers as well as the integration...
Large scientific data centers have recently begun providing a number of different types of data storage, to satisfy the various needs of their users. Users with interactive accounts, for example, might want a posix interface for easy access to the data from their interactive machines. Grid computing sites, on the other hand, likely need to provide an X509 based storage protocol, like SRM and...
Circular Electron Positron Collider (CEPC) is designed as a future Higgs Factory. Like other high energy physics experiment, the offline software consists of many packages. BSM (Bundled Software Manager) is thus created in order to simplify the deployment and usage of software which has many packages and dependencies.
BSM utilizes git as the software repository. Different software versions...
The CMS software system, known as CMSSW, has a generalized conditions, calibration, and geometry data products system called the EventSetup. The EventSetup caches results of reading or calculating data products based on the 'interval of validity', IOV, which is based on the time period for which that data product is appropriate. With the original single threaded CMSSW framework, updating only...
CERN IT is reviewing its portfolio of applications, with the aim to incorporate open-source solutions wherever possible. In particular, the Windows-centric DFS file system is replaced by CERNBox for certain use-cases.
Access to storage from Windows managed devices for end-users is largely covered by synchronization clients. However, online access using standard CIFS/SMB protocol is required...
The High Performance Computing (HPC) domain aims to optimize code in order to use the last multicore and parallel technologies including specific processor instructions. In this computing framework, portability and reproducibility are key concepts. A way to handle these requirements is to use Linux containers. These "light virtual machines" allow to encapsulate applications within its...
DESY manages not only one of the largest Tier-2 sites with about 18 500 CPU cores for Grid workloads but also about 8000 CPU cores for interactive user analyses. In this presentation, we recapitulate the consolidation of the batch systems in a common HTCondor based setup and the lessons learned as both use cases differ in their goals. Followingly, we will give an outlook
on the future...
HTCondor is adopted to manage the High Throughput Computing (HTC) cluster at IHEP since 2017. Two months later in the same year, a Slurm cluster is set up to run High Performance Computing (HPC) jobs. To provide accounting service both for HTCondor and Slurm clusters, a unified accounting system named Cosmos is necessary to develop.
However, different job workload brings different accounting...
In recent years, along with the rapid development of large scientific facilities and e-science worldwide, various cyber security threats have becoming a noticeable challenge in many data centers for scientific research, such as DDoS attack, ransomware, crypto-currency mining, data leak, etc.
Intrusion and abnormality detection by collecting and analyzing security data is an important...
ATLAS EventIndex Service keeps references to all real and simulated ATLAS events. Hadoop Mapfiles and HBase tables are used to store the Event Index data, a subset of data is also stored in the Oracle database. Several user interfaces are currently used to access and search the data. From the simple command line interface, through programmatical API to sophisticated Graphical Web Services. The...
Belle II is a global collaboration with over 700 physicists from 113 institutes. In order to fuel the physics analyses, a distributed grid of computing clusters consisting of tens of thousands of CPU-cores will house the multiple petabytes of data that will come out of the detector in years to come. However, the task of easily finding the particular datasets of interest to physicists with...
Monitoring is an indispensable tool for the operation of any
large installment of grid or cluster computing, be it high
energy physics or elsewhere. Usually, monitoring is configured
to collect a small amount of data, just enough to enable
detection of abnormal conditions. Once detected, the abnormal
condition is handled by gathering all information from the
affected components....
The Project 8 collaboration aims to measure the absolute neutrino mass or improve on the current limit by measuring the tritium beta decay electron spectrum. We present the current distributed computing model for the Project 8 experiment and requirements for future phases. Project 8 is in its second phase of data taking with a near continuous data rate of 1Gbps. The current computing model...
The Computing Center of the Institute of Physics (CC FZU) of the Czech
Academy of Sciences provides compute and storage capacity to several
physics experiments. Most resources are used by two LHC experiments,
ALICE and ATLAS. In the WLCG, which coordinates computing activities for
the LHC experiments, the computing center is a Tier-2. The rest of
computing resources is used by astroparticle...
The ATLAS Event Streaming Service (ESS) is an approach to preprocess and deliver data for Event Service (ES) that has implemented a fine-grained approach for ATLAS event processing. The ESS allows one to asynchronously deliver only the input events required by ES processing, with the aim to decrease data traffic over WAN and improve overall data processing throughput. A prototype of ESS is...
CloudVeneto.it was initially funded and deployed by INFN in 2014 for serving the computational and storage demands of INFN research projects mainly related to HEP and Nuclear Physics. It is an OpenStack-based scientific cloud with resources spread across two different sites connected with a high speed optical link: the INFN Padova Unit and the INFN Legnaro National Laboratories. The...
With the ongoing decomissioning of the AFS filesystem at CERN many use cases have been migrated to the EOS storage system at CERN.
To cope with additional requirements the filesystem interface implemented using FUSE had been rewritten since 2017. The new implementation supports strong security in conventional, VM and container environments. It is in production for the CERNBOX EOS service...
The STFC CASTOR tape service is responsible for the management of over 80PB of data including 45PB generated by the LHC experiments for the RAL Tier-1. In the last few years there have been several disruptive changes that have or are necessitating significant changes to the service. At the end of 2016, Oracle, which provided the tape libraries, drives and media announced they were leaving the...
With the beginning of LHC Run 3, the upgraded ALICE detector will record Pb-Pb collisions at an interaction rate of 50 kHz using continuous readout, resulting in raw data rates of over 3.5TB/s marking a hundredfold increase over Run 2. Since permanent storage at this rate is unfeasible and exceeds available capacities, a sequence of highly effective compression and data reduction steps is...
ROOT provides, through TMVA, machine learning tools for data analysis at HEP experiments and beyond. However, with the rapidly evolving ecosystem for machine learning, the focus of TMVA is shifting.
In this poster, we present the new developments and strategy of TMVA, which will allow the analyst to integrate seamlessly, and effectively, different workflows in the diversified...
The ATLAS physics program relies on very large samples of simulated events. Most of these samples are produced with GEANT4, which provides a highly detailed and accurate simulation of the ATLAS detector. However, this accuracy comes with a high price in CPU, and the sensitivity of many physics analysis is already limited by the available Monte Carlo statistics and will be even more so in the...
This paper evaluates the utilization of RDMA over Converged Ethernet (RoCE) for the Run3 LHCb event building at CERN. The acquisition system of the detector will collect partial data from approximately 1000 separate detector streams. Total estimated throughput equals 40 terabits per second. Full events will be assembled for subsequent processing and data selection in the filtering farm of the...
Modeling the physics of a detector's response to particle collisions is one of the most CPU intensive and time consuming aspects of LHC computing. With the upcoming high-luminosity upgrade and the need to have even larger simulated datasets to support physics analysis, the development of new faster simulation techniques but with sufficiently accurate physics performance is required. The...
Composite Higgs models (CHMs), in which the Higgs boson is a bound state of an as-yet undetected strongly interacting sector, offer an attractive solution to the hierarchy problem while featuring rich particle phenomenology at the few-TeV scale. Of particular interest is the minimal CHM (MCHM), based on the $SO(5) \to SO(4)$ symmetry breaking pattern. However, the complexity of its parameter...
Analysis languages must, first and foremost, carefully describe how to extract and aggregate data. All analysis languages must be able to make a plot of an event’s Missing Energy, for example. Of course, much more complex queries must also be supported, like making the plot of Missing Energy only for events with at least two jets that satisfy certain requirements. A project was started to try...
In this paper we introduce and study the feasibility of running hybrid analysis pipelines using the REANA reproducible analysis platform. The REANA platform allows researchers to specify declarative computational workflow steps describing the analysis process and to execute the workflow pipelines on remote containerised Kubernetes-orchestrated compute clouds. We have designed an abstract job...
In many countries around the world, the development of national infrastructures for science either has been implemented or are under serious consideration by governments and funding bodies. Current examples include ARDC in Australia, CANARIE in Canada and MTA Cloud in Hungary. These infrastructures provide access to compute and storage to a wide swathe of user communities and represent a...
The VISPA (VISual Physics Analysis) project provides a streamlined work environment for physics analyses and hands-on teaching experiences with a focus on deep learning.
VISPA has already been successfully used in HEP analyses and teaching and is now being further developed into an interactive deep learning platform.
One specific example is to meet knowledge sharing needs in deep learning by...
Within the DOMA working group, the QoS activity is looking at how best to describe innovative technologies and deployments. Once scenario that has emerged is providing storage that uses end-of-warranty disks: the cheap (almost free) nature of this storage is offset by a much larger likelihood of data loss. In some situations, this trade-off is acceptable, provided the operational overhead of...
EOS is the key component of the CERN Storage strategy and is behind the success of CERNBox, the CERN cloud synchronisation service which allows syncing and sharing files on all major mobile and desktop platforms aiming to provide offline availability to any data stored in the infrastructure.
CERNBox faced and enormous success within the CERN users' community thanks to its always increasing...
In modern data centers an effective and efficient monitoring system is a critical asset, yet a continuous concern for administrators. Since its birth, INFN Tier-1 data center, hosted at CNAF, has used various monitoring tools all replaced, since a few years, by a system common to all CNAF departments (based on Sensu, Influxdb, Grafana).
Given the complexity of the inter-dependencies of the...
We present an NDN-based XRootD plugin and associated methods which have been built for data access in the CMS and other experiments at the LHC, its status and plans for ongoing development.
Named Data Networking (NDN) is a leading Future Internet Architecture where data in the network is accessed directly by its name rather than the location of the host where it resides. NDN enables the...
The Alpha Magnetic Spectrometer (AMS) is a particle physics experiment installed and operating on board of the International Space Station (ISS) from May 2011 and expected to last through 2024 and beyond. The AMS offline software is used for data reconstruction, Monte-Carlo simulation and physics analysis. This paper presents how we manage the offline software, including the version control,...
Accurate particle track reconstruction will be a major challenge for the High Luminosity LHC experiments. Increase in the expected number of simultaneous collisions and the high detector occupancy will make the algorithms extremely demanding in terms of time and computing resources.
The sheer increase in the number of hits would increase the complexity exponentially, however the finite...
Job schedulers in high energy physics require accurate information about predicted resource consumption of a job to assign jobs to the most reasonable, available resources. For example, job schedulers evaluate information about the runtime, numbers of requested cores, or size of memory, and disk space. Users, therefore, specify those information when submitting their jobs and workflows. Yet,...
The Worldwide LHC Computing Grid (WLCG) processes all LHC data and it has been the computing platform that has allowed the discovery of the Higgs Boson. Optimal usage of its resources represents a major challenge. Attempts at simulating this complex and highly non-linear environment did not yield practically usable results. For job submission and management, a satisfactory solution was...
We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster Bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiments’ software performance.
The need for efficient lossless data...
Wireless local area network (WLAN) technology is widely used in various enterprises and institutions. In order to facilitate the use of users, they often provide a single ssid access point, resulting in different identities of users authenticated and authorized can connect to the wireless network anytime, anywhere as needed and obtain the same accessible network resources such as bandwidth,...
In the upcoming LHC Run 3, starting in 2021, the upgraded Time Projection Chamber (TPC) of the ALICE experiment will record minimum bias Pb--Pb collisions in a continuous readout mode at 50 kHz interaction rate. This corresponds to typically 4-5 overlapping collisions in the detector. Despite careful tuning of the new quadruple GEM-based readout chambers, which fulfill the design requirement...
The SuperKEKB collider and the Belle II experiment started Phase III at the beginning of 2019. The run is designed to collect a data sample of up to 50/ab at the collision energy of the Upsilon(4S) resonance for the next decade. The Belle II software library is created to ensure the accuracy and efficiency needed to
accomodate this next generation B factory experiment.
The central...
Traditionally, High Energy data analysis is based on the model where data are stored in files and analyzed
by running multiple analysis processes, each reading one or more of the data files. This process involves
repeated data reduction step, which produces smaller files, which is time consuming and leads to data duplication. We propose an alternative approach to data storage and analysis,...
In this work, we focus on assessing the contribution of the initial-state fluctuations of heavy ion collision in the hydrodynamic simulations. We try to answer the question of whether the hydrodynamic simulation retains the same level of fluctuation in the final-state as for the initial stage. In another scenario, the hydrodynamic simulations the fluctuation drowns in the final distribution of...
LHC data is constantly being moved between computing and storage sites to support analysis, processing, and simulation; this is done at a scale that is currently unique within the science community. For example, the CMS experiment on the LHC manages approximately 200PB of data and, on a daily basis, moves 1PB between sites. This talk shows the performance results we have produced of exploring...
TGenBase is a virtual database engine which allows to communicate with and store data in different underlying database management systems such as PostgreSQL, MySQL, SQLite, based on the configuration. It is universally applicable for any data storage task, such as parameter handling, detector component description, logistics, etc. In addition to usual CRUD (create, read, update, delete), it...
The Belle II experiment is a major upgrade of the e+e- asymmetric collider Belle, expected to produce tens of peta-bytes of data per year due to the luminosity increase with the SuperKEKB accelerator. The distributed computing system of the Belle II experiment plays a key role, storing and distributing data in a reliable way, to be easily access and analyzed along the more than 800...
The DESGW group seeks to identify electromagnetic counterparts of gravitational wave events seen by the LIGO-VIRGO network, such as those expected from binary neutron star mergers or neutron star- black hole mergers. DESGW was active throughout the first two LIGO observing seasons, following up several binary black hole mergers and the first binary neutron star merger, GW170817. We describe...
The success of Convolutional Neural Networks (CNNs) in image classification has prompted efforts to study their use for classifying image data obtained in Particle Physics experiments.
In this poster, I will discuss our efforts to apply CNNs to 3D image data from particle physics experiments to classify signal and background.
In this work, we present an extensive 3D convolutional neural...
Communication among processes is generating considerable interest in the scientific computing community due to the increasing use of distributed memory systems. In the field of high energy physics (HEP), however, little research has been addressed on this topic. More precisely in ROOT I/O, the de facto standard for data persistence in HEP applications, no such feature is provided. In order to...
The CMS experiment at CERN is working to improve the selection capability of the High Level Trigger (HLT) system, in view of the re-start of the collisions for Run 3. One key factor on this scope is to enhance the ability of the Trigger to track the detector evolution during the data taking, along with the LHC Fill cycles. In particular, the HLT performance is sensitive to two areas of...
The KEDR experiment is ongoing at the VEPP-4M e+e- collider at Budker INP in Novosibirsk. The collider center of mass energy range covers wide area from 2 to 11 GeV. Most of the up-to-date statistics were taken at the lower end of the energy range around charmonia region.
Planned activities at greater energies up to bottomonia would lead to significant rise of event recording rates and...
The EOS storage system in use at CERN and several other HEP sites was developed with an access control system driven by known use cases, which is still in its infancy.
Here we motivate the decision to strive supporting the RichACL standard as far as the EOS design allows. We highlight a characteristic that fits particularly well with access control for other applications at CERN, and show...
Despite the success of quantum chromodynamics (QCD) in describing the strong nuclear force, a clear picture of how this theory gives rise to the distinctive properties of confinement and dynamical chiral symmetry breaking at low energy is yet to be found. One of the more promising models used to explain these phenomena in recent times is known as the centre vortex model. In this work we...
In the CERN laboratory, users have access to a large number of different licensed software assets. The landscape of such assets is very heterogeneous including Windows operating systems, office tools and specialized technical and engineering software. In order to improve management of the licensed software and to understand better needs of the users, it was decided to develop a Winventory...