- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !
Welcome to the 24th International Conference on Computing in High-Energy and Nuclear Physics. The CHEP conference series addresses the computing, networking and software issues for the world’s leading data‐intensive science experiments that currently analyse hundreds of petabytes of data using worldwide computing resources.
CHEP 2019 will be held in Adelaide, South Australia, between Monday-Friday 4-8 November 2019. The venue for the CHEP 2019 conference is the Adelaide Convention Centre, conveniently located on the picturesque banks of the Torrens Lake in the heart of the city.
The optional pre-conference WLCG & HSF workshop will be held at the North Terrace campus of the University of Adelaide between Saturday-Sunday 2-3 November 2019.
More details about the conference can be found at the conference website: CHEP2019.org
Data acquisition systems (DAQ) for high energy physics experiments utilize complex FPGAs to handle unprecedented high data rates. This is especially true in the first stages of the processing chain. Developing and commissioning these systems becomes more complex as additional processing intelligence is placed closer to the detector, in a distributed way directly on the ATCA blades, in the other hand, sophisticated slow control is as well desirable. In this contribution, we introduce a novel solution for ATCA based systems, which combines the IPMI, a Linux based slow-control software, and an FPGA for custom slow-control tasks in one single Zynq Ultrascale+ (US+) System-on-Chip (SoC) module.
The Zynq US+ SoC provides FPGA logic, high-performance ARM-A53 multi-core processors and two ARM-R5 real-time capable processors. The ARM-R5 cores are used to implement the IPMI/IPMC functionality and communicate via backplane with the shelf manager at power-up. The ARM-R5 are also connected to the power supply (via PMBus), to voltage and current monitors, to clock generators and jitter cleaners (via I2C, SPI). Once full power is enabled from the crate, a Linux based operating system starts on the ARM-A53 cores. The FPGA is used to implement some of the low-level interfaces, including IPBus, or glue-logic. The SoC is the central entry point to the main FPGAs on the motherboard via IPMB and TCP/IP based network interfaces. The communication between the Zynq US+ SoC and the main FPGAs uses the AXI chip-to-chip protocol via MGT pairs keeping infrastructure requirements in the main FPGAs to a minimum.
The data acquisition (DAQ) software for most applications in high energy physics is composed of common building blocks, such as a networking layer, plug-in loading, configuration, and process management. These are often re-invented and developed from scratch for each project or experiment around specific needs. In some cases, time and available resources can be limited and make development requirements difficult or impossible to meet.
Moved by these premises, our team developed an open-source lightweight C++ software framework called DAQling, to be used as the core for the DAQ systems of small and medium-sized experiments and collaborations.
The framework offers a complete DAQ ecosystem, including communication layer based on the widespread ZeroMQ messaging library, configuration management based on the JSON format, control of distributed applications, extendable operational monitoring with web-based visualization, and a set of generic utilities. The framework comes with minimal dependencies, and provides automated host and build environment setup based on the Ansible automation tool. Finally, the end-user code is wrapped in so-called “Modules”, that can be loaded at configuration time, and implement specific roles.
Several collaborations already chose DAQling as the core for their DAQ systems, such as FASER, RD51, and NA61. We will present the framework and project-specific implementations and experiences.
The DAQ system of ProtoDUNE-SP successfully proved its design principles and met the requirements of the beam run of 2018. The technical design of the DAQ system for the DUNE experiment has major differences compared to the prototype due to different requirements and the environment. The single-phase prototype in CERN is the major integration facility for R&D aspects of the DUNE DAQ system. This covers the exploration of additional data processing capabilities and optimization of the FELIX system, which is the chosen TPC readout solution for the DUNE Single Phase supermodules. One of the fundamental differences is that DUNE DAQ relies on self-triggering. Therefore real-time processing of the data stream for hit and trigger primitive finding is essential for the requirement of continuous readout, where Intel AVX register instructions are used for better performance. The supernova burst trigger requires a large and fast buffering technique, where 3D XPoint persistent memory solutions are evaluated and integrated. In order to maximize resource utilization of the FELIX hosting servers, the elimination of the 100Gb network communication stack is desired. This implies the design and development of a single-host application layer, which is a fundamental element of the self-triggering chain.
This paper discusses the evaluation and integration of these developments for the DUNE DAQ, in the ProtoDUNE environment.
After the current LHC shutdown (2019-2021), the ATLAS experiment will be required to operate in an increasingly harsh collision environment. To maintain physics performance, the ATLAS experiment will undergo a series of upgrades during the shutdown. A key goal of this upgrade is to improve the capacity and flexibility of the detector readout system. To this end, the Front-End Link eXchange (FELIX) system has been developed. FELIX acts as the interface between the data acquisition; detector control and TTC (Timing, Trigger and Control) systems; and new or updated trigger and detector front-end electronics. The system functions as a router between custom serial links from front end ASICs and FPGAs to data collection and processing components via a commodity switched network. The serial links may aggregate many slower links or be a single high bandwidth link. FELIX also forwards the LHC bunch-crossing clock, fixed latency trigger accepts and resets received from the TTC system to front-end electronics. FELIX uses commodity server technology in combination with FPGA-based PCIe I/O cards. FELIX servers run a software routing platform serving data to network clients. Commodity servers connected to FELIX systems via the same network run the new multi-threaded Software Readout Driver (SW ROD) infrastructure for event fragment building, buffering and detector-specific processing to facilitate online selection. This presentation will cover the design of FELIX, the SW ROD, and the results of the installation and commissioning activities for the full system in summer 2019.
LHCb is one of the 4 experiments at the LHC accelerator at CERN. During the upgrade phase of the experiment, several new electronic boards and Front End chips that perform the data acquisition for the experiment will be added by the different sub-detectors. These new devices will be controlled and monitored via a system composed of GigaBit Transceiver (GBT) chips that manage the bi-directional slow control traffic to the Slow Control Adapter(s) (SCA) chips. The SCA chips provide multiple field buses to interface the new electronics devices (I2C, GPIO, etc). These devices will need to be integrated in the Experiment Control System (ECS) that drives LHCb. A set of tools was developed that provide an easy integration of the control and monitoring of the devices in the ECS. A server (GbtServ) provides the low level communication layer with the devices via the several user buses in the SCA chip and exposes an interface for control to the experiment SCADA (WinCC OA), the fwGbt component provides the interface between the SCADA and the GbtServ and the fwHw component, a tool that allows the abstraction of the devices models into the ECS. Using the graphical User Interfaces or XML files describing the structure and registers of the devices it creates the necessary model of the hardware as a data structure in the SCADA. It allows then the control and monitoring of the defined registers using their name, without the need to know the details of the hardware behind. The fwHw tool also provides the facility of defining and applying recipes - named sets of configurations - which can be used to easily configure the hardware according to specific needs.
The Project 8 collaboration seeks to measure, or more tightly bound, the mass of the electron antineutrino by applying a novel spectroscopy technique to precision measurement of the tritium beta-decay spectrum. For the current, lab-bench-scale, phase of the project a single digitizer produces 3.2 GB/s of raw data. An onboard FPGA uses digital down conversion to extract three 100 MHz wide (roughly 1.6 keV) frequency regions of interest, and transmits both the time and frequency domain representation of each region over a local network connection for a total of six streams, each at 200 MB/s. Online processing uses the frequency-domain representation to implement a trigger based on excesses of power within a narrow frequency band and extended in time. When the trigger condition is met, the corresponding time-domain data are saved, reducing the total data volume and rate while allowing for more sophisticated offline event reconstruction. For the next phase of the experiment, the channel count will increase by a factor of sixty. Each channel will receive signals from the full source volume, but at amplitudes below what is detectable. Phase-shifted combinations of all channels will coherently enhance the amplitude of signals from a particular sub-volume to detectable levels, and tiling set of such phase shifts will allow the entire source volume to be scanned for events. We present the online processing system which has successfully been deployed for the current, single-channel, phase. We also present the status and design for a many-channel platform.
The Deep Underground Neutrino Experiment (DUNE) is an international effort to build the next-generation neutrino observatory to answer fundamental questions about the nature of elementary particles and their role in the universe. Integral to DUNE is the process of reconstruction, where the raw data from Liquid Argon Time Projection Chambers (LArTPC) are transformed into products that can be used for physics analysis. Experimental data is currently obtained from a prototype of DUNE (ProtoDUNE) that is built as a full scale engineering prototype and uses a beam of charged particles, rather than a neutrino beam to test the detector response. The reconstruction software consumes on average 35% of the computational resources in Fermilab. For DUNE it is expected that reconstruction will play a far greater, and computationally more expensive role, as signal activity will be significantly reduced upon deployment of the neutrino beam. Consequently, identifying signals within the raw data will be a much harder task. Alternative approaches to neutrino signal reconstruction must be investigated in anticipation of DUNE. Machine learning approaches for reconstruction are being investigated, but currently, no end-to-end solution exists. As part of an end-to-end reconstruction solution, we propose an approach using Graph Neural Networks (GNN) to identify signals (i.e. hits) within the raw data. In particular, since the raw data of LarTPCs are both spatial and temporal in nature, Graph Spatial-Temporal Networks (GSTNs), capable of capturing dependency relationships among hits, are promising models. Our solution can be deployed for both online (trigger-level) and offline reconstruction. In this work, we describe the methodology of GNNs (and GSTNs in particular) for neutrino signal reconstruction and the preliminary results.
We use Graph Networks to learn representations of irregular detector geometries and perform on it typical tasks such as cluster segmentation or pattern recognition. Thanks to the flexibility and generality of the graph architecture, this kind of network can be applied to detector of arbitrarly geometry, representing the detector elements through a unique detector identification (e.g., physical position) and the readout value and embedding as vertices in a graph. We apply this idea to tasks related to calorimetry and tracking in LHC-like conditions, investigating original graph architectures to optimise performance and memory footprint.
We study the use of interaction networks to perform tasks related to jet reconstruction. In particular, we consider jet tagging for generic boosted-jet topologies, tagging of large-momentum H$\to$bb decays, and anomalous-jet detection. The achieved performance is compared to state-of-the-art deep learning approaches, based on Convolutional or Recurrent architectures. Unlike these approaches, Interaction Networks allow to reach state-of-the art performance without making assumptions on the underlying data (e.g., detector geometry or resolution, particle ordering criterion, etc.). Given their flexibility, Interaction Networks provide an interesting possibility for deployment-friendly deep learning algorithms for the LHC experiments.
For the High Luminosity LHC, the CMS collaboration made the ambitious choice of a high granularity design to replace the existing endcap calorimeters. The thousands of particles coming from the multiple interactions create showers in the calorimeters, depositing energy simultaneously in adjacent cells. The data are analog to 3D gray-scale image that should be properly reconstructed.
In this talk we will investigate how to localize and identify the thousands of showers in such events with a Deep Neural Network model. This problem is well-known in the Vision domain, it belongs to the challenging class: "Object Detection" which is significantly a harder task than “only” an image classification/regression because of the mixed goals : the cluster/pattern identification (cluster type), its localization (bounding box), and the object segmentation (mask) in the scene.
Our project presents a lot of similarities with the ones treated in Industry but accumulates several technological challenges like the 3D treatment. We will present the Mask R-CNN model which has already proven its efficiency in Industry (for 2D images) and how we extended it to tackle 3D HGCAL data. To conclude we will present the first results of this challenge.
Micro-Pattern Gas Detectors (MPGDs) are the new frontier in between the gas tracking systems. Among them, the triple Gas Electron Multiplier (triple-GEM) detectors are widely used. In particular, cylindrical triple-GEM (CGEM) detectors can be used as inner tracking devices in high energy physics experiments. In this contribution, a new offline software called GRAAL (Gem Reconstruction And Analysis Library) is presented: digitization, reconstruction, alignment algorithms and analysis of the data collected with APV-25 and TIGER ASICs within GRAAL framework are reported. An innovative cluster reconstruction method based on charge centroid, micro-TPC and their merge is discussed, and the detector performances evaluated experimentally for both planar triple-GEM and CGEM prototypes.
The innovative Barrel DIRC (Detection of Internally Reflected Cherenkov light) counter will provide hadronic particle identification (PID) in the central region of the PANDA experiment at the new Facility for Antiproton and Ion Research (FAIR), Darmstadt, Germany. This detector is designed to separate charged pions and kaons with at least 3 standard deviations for momenta up to 3.5 GeV/c covering the polar angle range of 22-140 degree.
An array of microchannel plate photomultiplier tubes is used to detect the location and arrival time of the Cherenkov photons with a position resolution of 2 mm and time precision of about 100 ps. Two reconstruction algorithms have been developed to make optimum use of the observables and to determine the performance of the detector. The "geometrical reconstruction" performs PID by reconstructing the value of the Cherenkov angle and using it in a track-by-track maximum likelihood fit. This method mostly relies on the position of the detected photons in the reconstruction, while the "time imaging" utilizes both, position and time information, and directly performs the maximum likelihood fit using probability density functions determined analytically or from detailed simulations.
Geant4 simulations and data from the particle beams where used to optimize both algorithms in terms of PID performance and reconstruction speed. We will present current status of development and discuss advantages of each algorithm.
Efficient access to distributed computing and storage resources is mandatory for the success of current and future High Energy and Nuclear Physics Experiments. DIRAC is an interware to build and operate distributed computing systems. It provides a development framework and a rich set of services for the Workload, Data and Production Management tasks of large scientific communities. A single DIRAC installation provides a complete solution for the distributed computing of one, or more than one collaboration. The DIRAC Workload Management System (WMS) provides a transparent, uniform interface for managing computing resources. The DIRAC Data Management System (DMS) offers all the necessary tools to ensure data handling operations: it supports transparent access to storage resources based on multiple technologies, and is easily expandable. Distributed Data management can be performed, also using third party services, and operations are resilient with respect to failures. DIRAC is highly customizable and can be easily extended. For these reasons, a vast and heterogeneous set of scientific collaborations have adopted DIRAC as the base for their computing models. Users from different experiments can interact with the system in different ways, depending on their specific tasks, expertise level and previous experience using command line tools, python APIs or Web Portals. The requirements of the diverse DIRAC user communities and hosting infrastructures triggered multiple developments to improve the system usability: examples include the adoption of industry standard authorization and authentication infrastructure solutions, the management of diverse computing resources (cloud, HPC, GPGPU, etc.), the handling of high-intensity work and data flows, but also advanced monitoring and accounting using no-SQL based solutions and message queues. This contribution will highlight DIRAC's current, upcoming and planned capabilities and technologies.
MCwrapper is a set of system that manages the entire Monte Carlo production workflow for GlueX and provides standards for how that Monte Carlo is produced. MCwrapper was designed to be able to utilize a variety of batch systems in a way that is relatively transparent to the user, thus enabling users to quickly and easily produce valid simulated data at home institutions worldwide. Additionally, MCwrapper supports an autonomous system that takes user's project submissions via a custom web application. The system then atomizes the project into individual jobs, matches these jobs to resources, and monitors the jobs status. The entire system is managed by a database which tracks almost all facets of the systems from user submissions to the individual jobs themselves. Users can interact with their submitted projects online via a dashboard or, in the case of testing failure, can modify their project requests from a link contained in an automated email. Beginning in 2018 the GlueX Collaboration began to utilize the Open Science Grid (OSG) to handle a bulk of simulation tasks; these tasks are currently being performed on the OSG automatically via MCwrapper. This talk will outline the entire system of MCwrapper, its use cases, and the unique challenges facing the system.
The Deep Underground Neutrino Experiment (DUNE) will be the world’s foremost neutrino detector when it begins taking data in the mid-2020s. Two prototype detectors, collectively known as ProtoDUNE, have begun taking data at CERN and have accumulated over 3 PB of raw and reconstructed data since September 2018. Particle interaction within liquid argon time projection chambers are challenging to reconstruct, and the collaboration has set up a dedicated Production Processing group to perform centralized reconstruction of the large ProtoDUNE datasets as well as to generate large-scale Monte Carlo simulation. Part of the production infrastructure includes workflow management software and monitoring tools that are necessary to efficiently submit and monitor the large and diverse set of jobs needed to meet the experiment’s goals. We will give a brief overview of DUNE and ProtoDUNE, describe the various types of jobs within the Production Processing group’s purview, and discuss the software and workflow management strategies are currently in place to meet existing demand. We will conclude with a description of our requirements in a workflow management software solution and our planned evaluation process.
Efforts in distributed computing of the CMS experiment at the LHC at CERN are now focusing on the functionality required to fulfill the projected needs for the HL-LHC era. Cloud and HPC resources are expected to be dominant relative to resources provided by traditional Grid sites, being also much more diverse and heterogeneous. Handling their special capabilities or limitations and maintaining global flexibility and efficiency, while also operating at scales much higher than the current capacity, are the major challenges being addressed by the CMS Submission Infrastructure team. This contribution will discuss the risks to the stability and scalability of the CMS HTCondor infrastructure extrapolated to such a scenario, thought to be derived mostly from its growing complexity, with multiple Negotiators and schedulers flocking work to multiple federated pools. New mechanisms for enhanced customization and control over resource allocation and usage, mandatory in this future scenario, will be also presented.
Software tools for detector optimization studies for future experiments need to be efficient and reliable. One important ingredient of the detector design optimization concerns the calorimeter system. Every change of the calorimeter configuration requires a new set of overall calibration parameters which in its turn requires a new calorimeter calibration to be done. An efficient way to perform calorimeter calibration is therefor essential in any detector optimization tool set.
In this contribution, we present the implementation of a calibration system in iLCDirac, which is an extension of the DIRAC grid interware. Our approach provides more direct control over the grid resources to reduce overhead of file download and job initialisation, and provides more flexibility during the calibration process. The service controls the whole chain of a calibration procedure, collects results from finished iterations and redistributes new input parameters among worker nodes. A dedicated agent monitors the health of running jobs and resubmits them if needed. Each calibration has an up-to-date backup which can be used for recovery in case of any disruption in the operation of the service.
As a use case, we will present a study of optimization of the calorimetry system of the CLD detector concept for FCC-ee, which has been adopted from the CLICdet detector model. The detector has been simulated with the DD4hep package and calorimetry performance have been studied with the particle flow package PandoraPFA.
The increase in the scale of LHC computing during Run 3 and Run 4 (HL-LHC) will certainly require radical changes to the computing models and the data processing of the LHC experiments. The working group established by WLCG and the HEP Software Foundation to investigate all aspects of the cost of computing and how to optimise them has continued producing results and improving our understanding of this process. In particular, experiments have developed more sophisticated ways to calculate their resource needs, we have a much more detailed process to
calculate infrastructure costs. This includes studies on the impact of HPC and GPU based resources on meeting the computing demands. We have also developed and perfected tools to quantitatively study the performance of experiments workloads and we are actively collaborating with other activities related to data access, benchmarking and technology cost evolution. In this contribution we expose our recent developments and results and outline the directions of future work.
We will describe a component of the Intelligent Data Delivery Service being developed in collaboration with IRIS-HEP and the LHC experiments. ServiceX is an experiment-agnostic service to enable on-demand data delivery specifically tailored for nearly-interactive vectorized analysis. This work is motivated by the data engineering challenges posed by HL-LHC data volumes and the increasing popularity of python and Spark-based analysis workflows.
ServiceX gives analyzers the ability to query events by dataset metadata. It uses containerized transformations to extract just the data required for the analysis. This operation is collocated with the data lake to avoid transferring unnecessary branches over the WAN. Simple filtering operations are supported to further reduce the amount of data transferred.
Transformed events are cached in a columnar datastore to accelerate delivery of subsequent similar requests. ServiceX will learn commonly related columns and automatically include them in the transformation to increase the potential for cache hits by other users.
Selected events are streamed to the analysis system using an efficient wire protocol that can be readily consumed by a variety of computational frameworks. This reduces time-to-insight for physics analysis by delegating to ServiceX the complexity of event selection, slimming, reformatting, and streaming.
We will report on the status of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) after its fourth year. OSiRIS is delivering a distributed Ceph storage infrastructure coupled together with software-defined networking to support multiple science domains across Michigan’s three largest research universities. The project’s goal is to provide a single scalable, distributed storage infrastructure that allows researchers at each campus to work collaboratively with other researchers across campus or across institutions. The NSF CC*DNI DIBBs program which funded OSiRIS is seeking solutions to the challenges of multi-institutional collaborations involving large amounts of data and we are exploring the creative use of Ceph and networking to address those challenges.
We will present details on the current status of the project and its various science domain users and use-cases. In the presentation we will cover the various design choices, configuration, tuning and operational challenges we have encountered in providing a multi-institutional Ceph deployment interconnected by a monitored, programmable network fabric. We will conclude with our plans for the final year of the project and its longer term outlook.
The Belle II experiment started taking physics data in March 2019, with an estimated dataset of order 60 petabytes expected by the end of operations in the mid-2020s. Originally designed as a fully integrated component of the BelleDIRAC production system, the Belle II distributed data management (DDM) software needs to manage data across 70 storage elements worldwide for a collaboration of nearly 1000 physicists. By late 2018, this software required significant performance improvements to meet the requirements of physics data taking and was seriously lacking in automation. Rucio, the DDM solution created by ATLAS, was an obvious alternative but required tight integration with BelleDIRAC and a seamless yet non-trivial migration. This contribution describes the work done on both DDM options, the current status of the software running successfully in production and the problems associated with trying to balance long-term operations cost against short term risk.
A new bookkeeping system called Jiskefet is being developed for A Large Ion Collider Experiment (ALICE) during Long Shutdown 2, to be in production until the end of LHC Run 4 (2029).
Jiskefet unifies two functionalities. The first is gathering, storing and presenting metadata associated with the operations of the ALICE experiment. The second is tracking the asynchronous processing of the physics data.
It will replace the existing ALICE Electronic Logbook and AliMonitor, allowing for a technology refresh and the inclusion of new features based on the experience collected during Run 1 and Run 2.
The front end leverages web technologies much in use nowadays such as TypeScript and NodeJS and is adaptive to various clients such as tablets, mobile device and other screens. The back end includes a Swagger based REST API and a relational database.
This paper will describe the current status of the development, the initial experience in detector standalone commissioning setups and the future plans. It will also describe the organization of the work done by various student teams who work on Jiskefet in sequential and parallel semesters and how continuity is guaranteed by using guidelines on coding, documentation and development.
(On behalf of the JUNO collaboration)
Abstract:
The JUNO (Jiangmen Underground Neutrino Observatory) experiment is designed to determine the neutrino mass hierarchy and precisely measure oscillation parameters with an unprecedented energy resolution of 3% at 1MeV. It is composed of a 20kton liquid scintillator central detector equipped with 18000 20” PMTs and 25000 3” PMTs, a water pool with 2000 20” PMTs, and a top tracker. Conditions data, coming from calibration and detector monitoring, are heterogeneous, different type of conditions data has different write rates, data format and data volume. JUNO conditions data management system (JCDMS) is developed to homogeneously treat all these heterogeneous conditions data in order to provide easy management and access with both Restful API and web interfaces, support good scalability and maintenance for long time running. we will present the status and development of JCDMS including the data model, workflows, interfaces, data caching and performance of the system.
The ATLAS model for remote access to database resident information relies upon a limited set of dedicated and distributed Oracle database repositories complemented with the deployment of Frontier system infrastructure on the WLCG. ATLAS clients with network access can get the database information they need dynamically by submitting requests to a squid server in the Frontier network which provides results from its cache or passes new requests along the network to launchpads co-located at one of the Oracle sites (the master Oracle database at CERN or one of the Tier 1 Oracle database replicas). Since the beginning of LHC Run 1, the system has evolved in terms of client, squid, and launchpad optimizations but the distribution model has remained fundamentally unchanged.
On the whole, the system has been broadly successful in providing data to clients with relatively few disruptions even while site databases were down due to redundancy overall. At the same time, its quantitative performance characteristics, such as the global throughput of the system, the load distribution between sites, and the constituent interactions that make up the whole, were largely unknown. But more recently, information has been collected from launchpad and squid logs into an Elastic Search repository which has enabled a wide variety of studies of various aspects of the system.
This presentation will describe dedicated studies of the data collected in Elastic Search over the previous year to evaluate the efficacy of the distribution model. Specifically, we will quantify any advantages that the redundancy of the system offers as well as related aspects such as the geographical dependence of wait times seen by clients in getting a response to its requests. These studies are essential so that during LS2 (the long shutdown between LHC Run 2 and Run 3), we can adapt the system in preparation for the expected increase in the system load in the ramp up to Run 3 operations.
The Future Circular Collider (FCC) is designed to provide unprecedented luminosity and unprecedented centre-of-mass energies. The physics reach and potential of the different FCC options - $e^+e^-$, $pp$, $e^-p$ - has been studied and published in dedicated Conceptual Design Reports (CDRs) published at the end of 2018.
Conceptual detector designs have been developed for such studies and tested with a mixture of fast and full simulations. These investigations have conducted using a common software framework called FCCSW.
In this presentation, after summarising the improvements implemented in FCCSW to achieve the results included in the CDRs, we will present the current development plans to support the continuation of the physics potential and detector concept optimization studies in view of future strategic decisions, in particular for the electron-positron machine.
The diversity of the scientific goals across HEP experiments necessitates unique bodies of software tailored for achieving particular physics results. The challenge, however, is to identify the software that must be unique, and the code that is unnecessarily duplicated, which results in wasted effort and inhibits code maintainability.
Fermilab has a history of supporting and developing software projects that are shared among HEP experiments. Fermilab's scientific computing division currently expends effort in maintaining and developing the LArSoft toolkit, used by liquid argon TPC experiments, as well as the event-processing framework technologies used by LArSoft, CMS, DUNE, and the majority of Fermilab-hosted experiments. As computing needs for DUNE and the HL-LHC become clearer, the computing models are being rethought. This talk will focus on Fermilab's plans for addressing the evolving software landscape as it relates to LArSoft and the event-processing frameworks, and how commonality among experiment software can be achieved while still supporting customizations necessary for a given experiment's physics goals.
The ALFA framework is a joint development between ALICE Online-Offline and FairRoot teams. ALFA has a distributed architecture, i.e. a collection of highly maintainable, testable, loosely coupled, independently deployable processes.
ALFA allows the developer to focus on building single-function modules with well-defined interfaces and operations. The communication between the independent processes is handled by FairMQ transport layer. FairMQ offers multiple implementations of its abstract data transport interface, it integrates some popular data transport technologies like ZeroMQ and nanomsg. But also provides shared memory and RDMA transport (based on libfabric) for high throughput, low latency applications. Moreover, FairMQ allows the single process to use multiple and different transports at the same time.
FairMQ based processes can be controlled and orchestrated via different systems by implementing the corresponding plugin. However, ALFA delivers also the Dynamic Deployment System (DDS) as an independent set of utilities and interfaces, providing a dynamic distribution of different user processes on any Resource Management System (RMS) or a laptop.
ALFA is already being tested and used by different experiments in different stages of data processing as it offers an easy integration of heterogeneous hardware and software. Examples of ALFA usage in different stages of event processing will be presented; in a detector read-out as well as in an online reconstruction and in a pure offline world of detector simulations.
The OpenMP standard is the primary mechanism used at high performance computing facilities to allow intra-process parallelization. In contrast, many HEP specific software (such as CMSSW, GaudiHive, and ROOT) make use of Intel's Threading Building Blocks (TBB) library to accomplish the same goal. In this talk we will discuss our work to compare TBB and OpenMP when used for scheduling algorithms to be run by a HEP style data processing framework (i.e. running hundreds of interdependent algorithms at most once for each event read from the detector). This includes both scheduling of different algorithms to be run concurrently as well as scheduling concurrent work within one algorithm. As part of the discussion we present an overview of the OpenMP threading model. We also explain how we used OpenMP when creating a simplified HEP-like processing framework. Using that simplified framework, and a similar one written using TBB, we will present performance comparisons between TBB and different compiler versions of OpenMP.
The high-level trigger (HLT) of LHCb in Run 3 will have to process 5 TB/s of data, which is about two orders of magnitude larger compared to Run 2. The second stage of the HLT runs asynchronously to the LHC, aiming for a throughput of about 1 MHz. It selects analysis-ready physics signals by O(1000) dedicated selections totaling O(10000) algorithms to achieve maximum efficiency. This poses two problems: correct configuration of the application and low-overhead execution of individual algorithms and evaluation of the decision logic.
A python-based system for configuring the data and control flow of the Gaudi-based application, including all components, is presented. It is designed to be user-friendly by using functions for modularity and removing indirection layers employed previously in Run 2. Robustness is achieved by fully eliminating global state and instead building the data flow graph in a functional manner while keeping configurability of the full call stack.
A prototype of the second HLT stage comprising all recent features including a new scheduling algorithm, a faster data store and the above mentioned configuration system is benchmarked, demonstrating the performance of the framework with the expected application complexity.
As part of the LHCb detector upgrade in 2021, the hardware-level trigger will be removed, coinciding with an increase in luminosity. As a consequence, about 40 Tbit/s of data will be processed in a full-software trigger, a challenge that has prompted the exploration of alternative hardware technologies. Allen is a framework that permits concurrent many-event execution targeting many-core architectures. We present the core infrastructure of this R&D project developed in the context of the LHCb Upgrade I. Data transmission overhead is hidden with a custom memory manager, and GPU resource usage is maximized employing a deterministic scheduler. Our framework is extensible and covers the control flow and data dependency requirements of the LHCb High Level Trigger 1 algorithms. We discuss the framework design, performance and integration aspects of a full realization of a GPU High Level Trigger 1 in LHCb.
Statistical modelling is a key element for High-Energy Physics (HEP) analysis. Currently, most of this modelling is performed with the ROOT/RooFit toolkit which is written in C++ and provides Python bindings which are only loosely integrated into the scientific Python ecosystem. We present zfit, a new alternative to RooFit, written in pure Python. Built on top of TensorFlow (a modern, high level computing library for massive computations), zfit provides a high level interface for advanced model building and fitting. It is also designed to be extendable in a very simple way, allowing the usage of cutting-edge developments from the scientific Python ecosystem in a transparent way. In this talk, the main features of zfit are introduced, and its extension to data analysis, especially in the context of HEP experiments, is discussed.
ROOT provides, through TMVA, machine learning tools for data analysis at HEP experiments and beyond. In this talk, we present recently included features in TMVA and the strategy for future developments in the diversified machine learning landscape. Focus is put on fast machine learning inference, which enables analysts to deploy their machine learning models rapidly on large scale datasets. The new developments are paired with newly designed C++ and Python interfaces supporting modern C++ paradigms and full interoperability in the Python ecosystem.
We present as well a new deep learning implementation for convolutional neural network using the cuDNN library for GPU. We show benchmarking results in term of training time and inference time, when comparing with other machine learning libraries such as Keras/Tensorflow.
SWAN (Service for Web-based ANalysis) is a CERN service that allows users to perform interactive data analysis in the cloud, in a "software as a service" model. The service is a result of the collaboration between IT Storage and Databases groups and EP-SFT group at CERN. SWAN is built upon the widely-used Jupyter notebooks, allowing users to write - and run - their data analysis using only a web browser. SWAN is a data analysis hub: users have immediate access to user storage CERNBox, entire LHC data repository on EOS, software (CVMFS) and computing resources, in a pre-configured, ready-to-use environment. Sharing of notebooks is fully integrated with CERNBox and users can easily access their notebook projects on all devices supported by CERNBox.
In the first quarter of 2019 we have recorded more than 1300 individual users of SWAN, with a majority from all four LHC experiments. Integration of SWAN with CERN Spark clusters is at the core of the new controls data logging system for the LHC. Every month new users discover SWAN through tutorials on data analysis and machine learning.
The SWAN service evolves, driven by the user's needs. In the future SWAN will provide access to GPUs, to the more powerful interface of Jupyterlab - that replaces Jupyter notebooks - and to a more configurable, easier to use and more shareable way of setting the software environment of Projects and notebooks.
This presentation will update the HEP community with the status of this effort and its future direction, together with the general evolution of SWAN.
The Faster Analysis Software Taskforce (FAST) is a small, European group of HEP researchers that have been investigating and developing modern software approaches to improve HEP analyses. We present here an overview of the key product of this effort: a set of packages that allows a complete implementation of an analysis using almost exclusively YAML files. Serving as an analysis description language (ADL), this toolset builds on top of the evolving technologies from the Scikit-HEP and IRIS-HEP projects as well as industry-standard libraries such as Pandas and Matplotlib. Data processing starts with event-level data (the trees) and can proceed by adding variables, selecting events, performing complex user-defined operations and binning data, as defined in the YAML description. The resulting outputs (the tables) are stored as Pandas dataframes or FlatBuffers defined by Aghast, which can be programmatically manipulated. The F.A.S.T. tools can then convert these into plots or inputs for fitting frameworks. No longer just a proof-of-principle, these tools are now being used in CMS analyses, the LUX-ZEPLIN experiment, and by students on several other experiments. In this talk we will showcase these tools through examples, highlighting how they address the different experiments’ needs, and compare them to other similar approaches.
In physics we often encounter high-dimensional data, in the form of multivariate measurements or of models with multiple free parameters. The information encoded is increasingly explored using machine learning, but is not typically explored visually. The barrier tends to be visualising beyond 3D, but systematic approaches for this exist in the statistics literature. I will use examples from particle and astrophysics to show how we can use the “grand tour” for such multidimensional visualisations, for example to explore grouping in high dimension and for visual identification of multivariate outliers. I will then discuss the idea of projection pursuit, i.e. searching the high-dimensional space for “interesting” low dimensional projections, and illustrate how we can detect complex associations between multiple parameters.
RooFit is the statistical modeling and fitting package used in many experiments to extract physical parameters from reduced particle collision data. RooFit aims to separate particle physics model building and fitting (the users' goals) from their technical implementation and optimization in the back-end. In this talk, we outline our efforts to further optimize the back-end by automatically running major parts of user models in parallel on multi-core machines.
A major challenge is that RooFit allows users to define many different types of models, with different types of computational bottlenecks. Our automatic parallelization framework must then be flexible, while still reducing run-time by at least an order of magnitude, preferably more.
We have performed extensive benchmarks and identified at least three bottlenecks that will benefit from parallelization. To tackle these and possible future bottlenecks, we designed a parallelization layer that allows us to parallelize existing classes with minimal effort, but with high performance and retaining as much of the existing class's interface as possible.
The high-level parallelization model is a task-stealing approach. Our multi-process approach uses ZeroMQ socket-based communication. Preliminary results show speed-ups of factor 2 to 20, depending on the exact model and parallelization strategy.
We will integrate our parallelization layer into RooFit in such a way that impact on the end-user interface is minimal. This constraint, together with new features introduced in a concurrent RooFit project on vectorization and dataflow redesign, warrants a redesign of the RooFit internal classes for likelihood evaluation and other test statistics. We will briefly outline the implications of this for users.
The IHEP local cluster is a middle-sized HEP data center which consists of 20’000 CPU slots, hundreds of data servers, 20 PB disk storage and 10 PB tape storage. After data taking of JUNO and LHAASO experiment, the data volume processed at this center will approach 10 PB data per year. Facing the current cluster scale, anomaly detection is a non-trivial task in daily maintenance. Traditional methods such as static thresholding of performance metrics, key words searching in system logs, etc., require expertise of certain software systems, and cannot be easy to transplant. Besides, these methods cannot easily adapt to the changes of workloads and hardware configurations. Anomalies are data points which are either different from the majority of others or different from the expectation of a reliable prediction model in a time series. With a sufficient training sample dataset, machine learning-based anomaly detections which leverage these statistical characteristics can largely avoid the disadvantages of traditional methods. The Ganglia monitoring system at IHEP collects billions of timestamped monitoring data from the cluster every year. It provides sufficient data samples to train machine learning models. In this presentation, we firstly developed a generic anomaly detection framework to facilitate different detection task. It facilities common tasks such as data sample building, retagging and visualization, model calling, deviation measurement and performance measurement in machine learning-based anomaly detection methods. Then, for massive storage system, we developed and trained a spatial anomaly detection model based on Isolation Forest algorithm and a time series anomaly detection model based on LSTM recurrent neural networks to validate our idea. Initial performance comparison of our methods and traditional methods will be provided at the end of the presentation.
A Grid computing site consists of various services including Grid middlewares, such as Computing Element, Storage Element and so on. Ensuring a safe and stable operation of the services is a key role of site administrators. Logs produced by the services provide useful information for understanding the status of the site. However, it is a time-consuming task for site administrators to monitor and analyze the service logs everyday. Therefore, a support framework (gridalert), which detects anomaly logs and alerts to site administrators, has been developed using Machine Learning techniques.
Typical classifications using Machine Learning require pre-defined labels. It is difficult to collect a large amount of anomaly logs to build a Machine Learning model that covers all possible pre-defined anomalies. Therefore, Unsupervised Machine Learning based on clustering algorithms is used in the gridalert to detect anomaly logs. Several clustering algorithms, such as k-means, DBSCAN and IsolationForest, and its parameters have been compared in order to maximize the performance of the anomaly detection for Grid computing site operations. The gridalert has been deployed to Tokyo Tier2 site, which is one of the Worldwide LHC Computing Gird sites, and is used in operation. In this presentation, studies about Machine Learning algorithms for the anomaly detection and our operational experiences of the gridalert will be reported.
The benchmarking and accounting of CPU resources in WLCG has been based on the HEP-SPEC06 (HS06) suite for over a decade. HS06 is stable, accurate and reproducible, but it is an old benchmark and it is becoming clear that its performance and that of typical HEP applications have started to diverge. After evaluating several alternatives for the replacement of HS06, the HEPIX benchmarking WG has chosen to focus on the development of a HEP-specific suite based on actual software workloads of the LHC experiments, rather than on a standard industrial benchmark like the new SPEC CPU 2017 suite.
This presentation will describe the motivation and implementation of this new benchmark suite, which is based on container technologies to ensure portability and reproducibility. This approach is designed to provide a better correlation between the new benchmark and the actual production workloads of the experiments. It also offers the possibility to separately explore and describe the independent architectural features of different computing resource types, which is expected to be increasingly important with the growing heterogeneity of the HEP computing landscape. In particular, an overview of the initial developments to address the benchmarking of non-traditional computing resources such as HPCs and GPUs will also be provided.
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The OSG Networking Area, in partnership with WLCG, is focused on being the primary source of networking information for its partners and constituents. It was established to ensure sites and experiments can better understand and fix networking issues, while providing an analytics platform that aggregates network monitoring data with higher level workload and data transfer services. This has been facilitated by the global network of the perfSONAR instances that have been commissioned and are operated in collaboration with WLCG Network Throughput Working Group. An additional important update is the inclusion of the newly funded NSF project SAND (Service Analytics and Network Diagnosis) which is focusing on network analytics.
In this talk we'll describe the current state of the network measurement and analytics platform and summarise the activities taken by the working group and our collaborators, focusing mainly on the throughput issues that have been reported and resolved during the recent period with the help of the perfSONAR network. We will also cover the updates on the higher level services that were developed to help bring perfSONAR network to its full potential. This includes the progress being made in providing higher level analytics, alerting and alarming from the rich set of network metrics we are gathering. . Finally, we will discuss and propose potential R&D areas related to improving the network throughput in general as well as prepare the infrastructure for the foreseen major changes in the way network will be provisioned and operated in the future.
Monitoring of the CERN Data Centres and the WLCG infrastructure is now largely based on the MONIT infrastructure provided by CERN IT. This is the result of the migration from several old in-house developed monitoring tools into a common monitoring infrastructure based on open source technologies such as Collectd, Flume, Kafka, Spark, InfluxDB, Grafana and others. The MONIT infrastructure relies on CERN IT services (OpenStack, Puppet, Gitlab, DBOD, etc) and covers the full range of monitoring tasks: metrics and logs collection, alarms generation, data validation and transport, data enrichment and aggregation (where applicable), dashboards visualisation, reports generation, etc. This contribution will present the different services offered by the MONIT infrastructure today, highlight the main monitoring use cases from the CERN Data Centres, WLCG, and Experiments, and analyse the last years experience of moving from legacy well-established custom monitoring tools into a common open source-based infrastructure.
The Centralised Elasticsearch Service at CERN runs the infrastructure to
provide Elasticsearch clusters for more than 100 different use cases.
This contribution presents how the infrastructure is managed, covering the
resource distribution, instance creation, cluster monitoring and user
support. The contribution will present the components that have been identified as
critical in order to share resources and minimize the amount of clusters and
machines needed to run the service.In particular, all the automation for the
instance configuration, including index template management, backups and
Kibana settings, will be explained in detail.
E-mail service is considered as a critical collaboration system. We will share our experience regarding technical and organizational challenges when migrating 40 000 mailboxes from Microsoft Exchange to free and open source software solution: Kopano.
As of March 2019, CERN is no longer eligible for academic licences of Microsoft products. For this reason, CERN IT started a series of task forces to respond to the evolving requirements of the user community with the goal of reducing as much as possible the need for Microsoft licensed software. This exercise was an opportunity to understand better the user requirements for all office applications. Here we focus on MS Project, the dominant PC-based project management software, which has been used at CERN for many years. There were over 1500 installations at CERN when the task force started, with a heterogeneous pool of users in terms of required functionality, area of work and expertise. This paper will present an evaluation of users’ needs and whether they could be fulfilled with cheaper and less advanced solutions for project management and scheduling. Moreover, selected alternatives, their deployment and lessons learned will be described in more detail. Finally, it will present the approach on how to communicate, train and migrate users to the proposed solutions.
In this talk the approach chosen to monitor firstly a world-wide video conference server infrastructure and secondly a wide diversity of audio-visual devices that build up the audio-visual conference room ecosystem at CERN will be presented.
CERN video conference system is a complex ecosystem which is being used by most HEP institutes, together with Swiss Universities through SWITCH. As a proprietary platform, on its on-premise version, Vidyo offers a very limited monitoring. In order to improve support to our user community together with a better understanding of the Vidyo platform for service managers and video conference supporters a set of tools to monitor the system has been developed keeping in mind simplicity, flexibility, maintainability and cost efficiency reusing as much as possible technologies offered by IT services: Elasticsearch stack, Influxdb, Openshift, Kubernetes, Openstack, etc. The result is a set of dashboards that greatly simplify access to information required by CERN IT helpdesk and service managers and that could be provided to the users. Most of the components developed are open source [1,2], and could be reused for services facing similar problems.
With the arrival of IP devices in the Audio-Visual and Conferencing (AVC) equipment, the possibilities to develop an agnostic solution for monitoring this IoT jungle (video encoders, videoconference codecs, screens, projectors, microphones, clocks,..) becomes feasible. After trying with no real success existing commercial products for monitoring, CERN is now developing an opensource solution to effectively monitor/operate the AVC ecosystem using existing opensource components and central services provided by the IT deparment: node-red/mqtt, telegraf/influxdb/grafana, beats/logstash/elasticseach/kibana, openshift, etc.
[1] https://github.com/CERNCDAIC/aggsvidyo
[2] https://github.com/CERNCDAIC/resthttpck
Indico, CERN’s popular open-source tool for event management, is in widespread use among facilities that make up the HEP community. It is extensible through a robust plugin architecture that provides features such as search and video conferencing integration. In 2018, Indico version 2 was released with many notable improvements, but without a full-featured search functionality that could be implemented easily outside of CERN. At both Fermi and Brookhaven National Laboratories, the user community viewed the lack of this popular feature as a significant impediment to deployment of the new software. In the meantime, CERN embarked upon a major redesign of their core search service, one that would also necessitate a rewrite of the Indico search interface. Seeing this pressing need, the two US labs decided to collaborate, with assistance from the CERN development team, on a project to develop the requisite search functionality for the larger user community. The resulting design exploits the simplified schema defined in the new CERN Search micro-service, based on Invenio and Elasticsearch, while still providing a flexible path to implementation for alternative backend search services. It is intended to provide a software package that can be installed easily and used out of the box, by anyone at any site. This presentation will discuss the design choices and architectural challenges, and provide an overview of the deployment and use of these new plugins.
CERNBox is the CERN cloud storage hub for more than 16000 users at CERN. It allows synchronising and sharing files on all major desktop and mobile platforms (Linux, Windows, MacOSX, Android, iOS) providing universal, ubiquitous, online- and offline access to any data stored in the CERN EOS infrastructure. CERNBox also provides integration with other CERN services for big science: visualisation tools, interactive data analysis and real-time collaborative editing.
Over the last two years, CERNBox has evolved from a pure cloud sync and share platform into a collaborative service, to support new applications such as DrawIO for diagrams and organigrams sketching, OnlyOffice and Collabora Online for documents editing, and DXHTML Gantt for project management, as alternatives to traditional desktop applications. Moving to open source applications has the advantage to reduce licensing costs and enables easier integration within the CERN infrastructure.
Leveraging large and diverse set of applications at CERN, we propose a bring-your-own-application model where user groups can easily integrate their specific web-based applications in all available CERNBox workflows. We report on our experience managing such integrations and applicable use-cases, also in a broader scientific context where an emerging community including other institutes and SMEs is evolving the standards for sync & share storage.
Collaborative services are essential for any experiment.
They help to integrate global virtual communities by allowing to share
and exchange relevant information among members.
Typical examples are public and internal web pages, wikis, mailing
list services, issue tracking systems, and services for meeting
organizations and documents.
After reviewing their collaborative services with respect to security,
reliability, availability, and scalability, the Belle II collaboration
decided in 2016 to migrate services and tools into the existing IT
infrastructure at DESY. So far missing services were added and
workflows adapted. As a new development, a membership management
system which serves the needs of a global collaboration as of to-date
with 968 scientists of more than 113 institutions in 25 countries all
around the world, was put into operation in the beginning of 2018.
Almost all essential service of a living collaboration were subject to
modifications, some of them with major or complete changes. Moreover,
an already productive collaboration had to give up accustomed systems
and adopt to new ones with new look-and-feels and more restrictive
security rules.
In the contribution to CHEP2019 we will briefly review the planning
and realization of the migration process and thoroughly discuss
experiences which we gained while supporting the daily work of the
Belle II collaboration.
The INFN Tier-1 located at CNAF in Bologna (Italy) is a major center of the WLCG e-Infrastructure, supporting the 4 major LHC collaborations and more than 30 other INFN-related experiments.
After multiple tests towards elastic expansion of CNAF compute power via Cloud resources (provided by Azure, Aruba and in the framework of the HNSciCloud project), but also building on the experience gained with the production quality extension of the Tier-1 farm on remote owned sites, the CNAF team, in collaboration with experts from the ATLAS, CMS, and LHCb experiments, has been working to put in production a solution of an integrated HTC+HPC system with the PRACE CINECA center, located nearby Bologna. Such extension will be implemented on the Marconi A2 partition, equipped with Intel Knights Landing (KNL) processors. A number of technical challenges were faced and solved in order to successfully run on low RAM nodes, as well as to overcome the closed environment (network, access, software distribution, … ) that HPC systems deploy with respect to standard GRID sites. We show preliminary results from a large scale integration effort, using resources secured via the successful PRACE grant N. 2018194658, for 30 million KNL core hours.
The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves.
This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources.
Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA's dependence on Kubernetes support at the workers level. Performance results derived from running AI/LFI training workflows on a variety of large HPC sites will be presented.
Nowadays, a number of technology R&D activities has been launched in Europe trying to close the gap with traditional HPC providers like USA and Japan and more recently emerging ones like China.
The EU HPC strategy, funded through EuroHPC initiative, leverages on two different pillars: the first one targets the procurement and the hosting of two/three commercial pre-Exascale systems, in order to provide the HPC user community with world-level class computing systems; the second one aims at boosting industry-research collaboration in order to design a new generation of Exascale systems which is to be mainly based on European technology.
In this framework, analysis and validation of the HPC-enabling technologies is a very critical task and the FETHPC H2020 EuroEXA project (https://euroexa.eu) is prototyping a medium size (but scalable to extreme level) computing platform as proof-of-concept of an EU-designed HPC system.
EuroEXA exploits FPGA devices, with their ensemble of either standard and custom high-performance interfaces, DSP blocks for task acceleration and a huge number of user- assigned logic cells. FPGA adoption allows us to design European innovative IPs such as application-tailored acceleration hardware (for high performances in the computing node) and low latency, high throughput custom network (for scalability).
The EuroEXA computing node is based on a single module hosting Xilinx UltraScale+ FPGAs for application code acceleration hardware, control and network implementation, and, in a later phase, even a new project-designed, ARM-based, low power multi-core chip.
EuroEXA interconnect is an FPGA-based hierarchical hybrid network characterized by direct topology at “blade” level (16 computing nodes on a board) and a custom switch, implementing a mix of full-crossbar and Torus topology, for interconnection with the upper levels.
EuroEXA will also introduce a new high density liquid-cooling technology for blade system and a new multirack modular assembly based on standard shipping containers in order to provide an effective solution for moving, placing and operating large scale EuroEXA system.
A complete and system-optimized programming software stack is under design and a number of scientific, engineering and AI-oriented applications are used to co-design, benchmark and validate the EuroEXA hardware/software solutions.
In this talk, we will introduce the main project motivations and goals, its positioning within the EuroHPC landscape, the status of hardware and software development and the possible synergies with HEP computing requirements.
The Dutch science funding organization NWO is in the process of drafting requirements for the procurement of a future high-performance compute facility. To investigate the requirements for this facility to potentially support high-throughput workloads in addition to traditional high-performance workloads, a broad range of HEP workloads are being functionally tested on the current facility. The requirements obtained from this pilot will be presented, together with technical issues solved and requirements on HPC and HEP software and support.
We present recent work in supporting deep learning for particle physics and cosmology at NERSC, the US Dept. of Energy mission HPC center. We describe infrastructure and software to support both large-scale distributed training across (CPU and GPU) HPC resources and for productive interfaces via Jupyter notebooks. We also detail plans for accelerated hardware for deep learning in the future HPC machines at NERSC, ‘Perlmutter’ and beyond. We demonstrate these capabilities with a characterisation of the emerging deep learning workload running at NERSC. We also present use of these resources to implement specific cutting-edge applications including conditional Generative Adversarial Networks for particle physics and dark-matter cosmology simulations and bayesian inference via probabilistic programming for LHC analyses.
The ALICE Experiment at CERN LHC (Large Hadron Collider) is undertaking a major upgrade during LHC Long Shutdown 2 in 2019-2020, which includes a new computing system called O² (Online-Offline). To ensure the efficient operation of the upgraded experiment and of its newly designed computing system, a reliable, high performance, and automated experiment control system is being developed. The ALICE Experiment Control System (AliECS) is a distributed system based on state of the art cluster management and microservices which have recently emerged in the distributed computing ecosystem. Such technologies will allow the ALICE collaboration to benefit from a vibrant and innovating open source community. This communication describes the AliECS architecture. It provides an in-depth overview of the system's components, features, and design elements, as well as its performance. It also reports on the experience with AliECS as part of ALICE Run 3 detector commissioning setups.
The Data Acquisition (DAQ) system of the Compact Muon Solenoid (CMS) experiment at LHC is a complex system responsible for the data readout, event building and recording of accepted events. Its proper functioning plays a critical role in the data-taking efficiency of the CMS experiment. In order to ensure high availability and recover promptly in the event of hardware or software failure of the subsystems, an expert system, the DAQ Expert, has been developed. It aims at improving the data taking efficiency, reducing the human error in the operations and minimising the on-call expert demand. Introduced in the beginning of 2017, it assists the shift crew and the system experts in recovering from operational faults, streamlining the post mortem analysis and, at the end of Run 2, triggering the fully automatic recoveries without a human intervention. DAQ Expert analyses the real-time monitoring data originating from the DAQ components and the high-level trigger updated every few seconds. It pinpoints the data flow problem and recovers it automatically or after given operator approval. We analyse the CMS downtime in the 2018 run focusing on what was improved with the introduction of automated recoveries; present challenges and design of transforming the expert knowledge to automated recovery jobs. Furthermore, we demonstrate the web-based, ReactJS interfaces that ensure an effective cooperation between the human operators in control room and the automated recovery system. We report on the operational experience with automated recoveries.
The Information Service (IS) is an integral part of the Trigger and Data Acquisition (TDAQ) system of the ATLAS experiment at the Large Hadron Collider (LHC) at CERN. The IS allows online publication of operational monitoring data, and it is used by all sub-systems and sub-detectors of the experiment to constantly monitor their hardware and software components including more than 25000 applications running on more than 3000 computers. The Persistent Back-End for the ATLAS Information System (P-BEAST) service stores all raw operational monitoring data for the lifetime of the experiment and provides programming and graphical interfaces to access them including Grafana dashboards and notebooks based on the CERN SWAN platform. During the ATLAS data taking sessions (for the full LHC Run 2 period) P-BEAST acquired data at an average information update rate of 200 kHz and stored 20 TB of highly compacted and compressed data per year. This paper reports how over six years the P-BEAST became an essential piece of the experiment operations including details of the challenging requirements, the fails and successes of the various attempted implementations, the new types of monitoring data and the results of the time-series database technologies evaluations for the improvements during next LHC Run 3.
The Belle II experiment features a substantial upgrade of the Belle detector and will operate at the SuperKEKB energy-asymmetric $e^+ e^-$ collider at KEK in Tuskuba, Japan. The accelerator successfully completed the first phase of commissioning in 2016 and the Belle II detector saw its first electron-positron collisions in April 2018. Belle II features a newly designed silicon vertex detector based on double-sided strip and DEPFET pixel detectors. A subset of the vertex detector was operated in 2018 to determine background conditions (Phase 2 operation); installation of the full detector was completed early in 2019 and the experiment starts full data taking.
This talk will report on the final arrangement of the silicon vertex detector part of Belle II with focus on on-line and off-line monitoring of detector conditions and data quality, design and use of diagnostic and reference plots, and integration with the software framework of Belle II. Data quality monitoring plots will be discussed with a focus on simulation and acquired cosmic and collision data.
The LHCb high level trigger (HLT) is split in two stages. HLT1 is synchronous with collisions delivered by the LHC and writes its output to a local disk buffer, which is asynchronously processed by HLT2. Efficient monitoring of the data being processed by the application is crucial to promptly diagnose detector or software problems. HLT2 consists of approximately 50000 processes and 4000 histograms are produced by each process. This results in 200 million histograms that need to be aggregated for each of up to a hundred data taking intervals that are being processed simultaneously. This paper presents the multi-level hierarchical architecture of the monitoring infrastructure put in place to achieve this. Network bandwidth is minimised by sending histogram increments and only exchanging metadata when necessary, using a custom lightweight protocol based on boost::serialize. The transport layer is implemented with ZeroMQ, which supports IPC and TCP communication, queue handling, asynchronous request/response and multipart messages. The persistent storage to ROOT is parallelized in order to cope with data arriving from a hundred of data taking intervals being processed simultaneously by HLT2. The performance and the scalability of the current system are presented. We demonstrate the feasibility of such an approach for the HLT1 use case, where real-time feedback and reliability of the infrastructure are crucial. In addition, a prototype of a high-level transport layer based on the stream-processing platform Apache Kafka is shown, which has several advantages over the lower-level ZeroMQ solution.
The ALICE Experiment at CERN LHC (Large Hadron Collider) is undertaking a major upgrade during LHC Long Shutdown 2 in 2019-2020, which includes a new computing system called O² (Online-Offline). The raw data input from the ALICE detectors will then increase a hundredfold, up to 3.4 TB/s. In order to cope with such a large amount of data, a new online-offline computing system, called O2, will be deployed.
One of the key software components of the O2 system will be the data Quality Control (QC) that replaces the existing online Data Quality Monitoring and offline Quality Assurance. It involves the gathering, the analysis by user-defined algorithms and the visualization of monitored data, in both the synchronous and asynchronous parts of the O2 system.
This paper presents the architecture and design, as well as the latest and upcoming features, of the ALICE O2 QC. In particular, we review the challenges we faced developing and scaling the object merging software, the trending and correlation infrastructure and the repository management. We also discuss the ongoing adoption of this tool amongst the ALICE collaboration and the measures taken to develop, in synergy with their respective teams, efficient monitoring modules for the detectors.
The LHCb detector at the LHC is a single forward arm spectrometer dedicated to the study of $b-$ and $c-$ hadron states. During Run 1 and 2, the LHCb experiment has collected a total of 9 fb$^{-1}$ of data, corresponding to the largest charmed hadron dataset in the world and providing unparalleled datatests for studies of CP violation in the $B$ system, hadron spectroscopy and rare decays, not to mention heavy ion and fixed target datasets. The LHCb detector is currently undergoing an upgrade to nearly all parts of the detector to cope with the increased luminosity of Run 3 and beyond. Simulation for the analyses of such datasets is paramount, but is prohibitively slow in generation and reconstruction due to the sheer number of simulated decays needed to match the collected datasets. In this talk, we explore the suite of fast simulations which LHCb has employed to meet the needs of the Run 3 and beyond, including the reuse of the underlying event and parameterized simulations, and the possibility of porting the framework to multithreaded environments.
The ATLAS physics program relies on very large samples of GEANT4 simulated events, which provide a highly detailed and accurate simulation of the ATLAS detector. However, this accuracy comes with a high price in CPU, and the sensitivity of many physics analyses is already limited by the available Monte Carlo statistics and will be even more so in the future. Therefore, sophisticated fast simulation tools are developed. In Run-3 we aim to replace the calorimeter shower simulation for most samples with a new parametrized description of longitudinal and lateral energy deposits, including machine learning approaches, to achieve a fast and accurate description. Looking further ahead, prototypes are being developed using cutting edge machine learning approaches to learn the appropriate calorimeter response, which are expected to improve modeling of correlations within showers. Two different approaches, using Variational Auto-Encoders (VAEs) or Generative Adversarial Networks (GANs), are trained to model the shower simulation. Additional fast simulation tools will replace the inner detector simulation, as well as digitization and reconstruction algorithms, achieving up to two orders of magnitude improvement in speed. In this talk, we will describe the new tools for fast production of simulated events and an exploratory analysis of the deep learning methods.
In High Energy Physics, simulation activity is a key element for theoretical models evaluation and detector design choices. The increase in the luminosity of particle accelerators leads to a higher computational cost when dealing with the orders of magnitude increase in collected data. Thus, novel methods for speeding up simulation procedures (FastSimulation tools) are being developed with the help of Deep Learning. For this task, unsupervised learning is performed based on a given training HEP dataset with generative models employed to render samples from the same distribution.
A novel Deep Learning architecture is proposed in this research based on autoregressive connections to model the simulation output by decomposing the event distribution as a product of conditionals. The aim is for the network to be able to capture nonlinear, long-range correlations and input varying dependencies with tractable, explicit probability densities. The following research report analyses the benefits of employing autoregressive models in comparison with previously proposed models and their ability for generalisation in the attempt of fitting multiple data distributions. The training dataset contains different simplified calorimeters simulations obtained with the Geant4 toolkit (such as: PbWO4, W/Si). Finally, testing procedures and results for network performance are developed and analysed.
The future need of simulated events for the LHC experiments and their High Luminosity upgrades, is expected to increase dramatically. As a consequence, research on new fast simulation solutions, based on Deep Generative Models, is very active and initial results look promising.
We have previously reported on a prototype that we have developed, based on 3 dimensional convolutional Generative Adversarial Network, to simulate particle showers in high-granularity calorimeters.
As an example of future high-granularity geometries, we have chosen the electromagnetic calorimeter design developed in the context of the Linear Collider Detector studies characterised by 25 layers and 5mmx5mm cell granularity
The training dataset is simulated using Geant4 and the DDSim framework.
In this talk we will present improved results on a more realistic simulation of different particles (electrons and pions) characterized by variable energy and incident trajectory. Detailed validation studies, comparing our results to Geant4 Monte Carlo simulation, show very good agreement for high level physics quantities (such as energy shower shapes) and detailed calorimeter response (single cell response) over a large energy range. In particular, we will show how increasing the network representational power, introducing physics-based constraints and a transfer-learning approach to the training process improve the agreement to Geant4 results over a large energy range. Initial studies on a network optimisation based on the implementation of Genetic Algorithms will also be discussed.
LHCb is one of the major experiments operating at the Large Hadron Collider at CERN. The richness of the physics program and the increasing precision of the measurements in LHCb lead to the need of ever larger simulated samples. This need will increase further when the upgraded LHCb detector will start collecting data in the LHC Run 3. Given the computing resources pledged for the production of Monte Carlo simulated events in the next years, the use of fast simulation techniques will be mandatory to cope with the expected dataset size. In LHCb generative models, which are nowadays widely used for computer vision and image processing are being investigated in order to accelerate the generation of showers in the calorimeter and high-level responses of Cherenkov detector. We demonstrate that this approach provides high-fidelity results along with a significant speed increase and discuss possible implication of these results. We also present an implementation of this algorithm into LHCb simulation software and validation tests.
Belle II uses a Geant4-based simulation to determine the detector response to the generated decays of interest. A realistic detector simulation requires the inclusion of noise from beam-induced backgrounds. This is accomplished by overlaying random trigger data to the simulated signal. To have statistically independent Monte-Carlo events a high number of random trigger events are desirable. However, the size of the background events, in particular the part of the pixel vertex detector (PXD), is so large that it is infeasible to record, store, and overlay the same amount as simulated signal events. Our approach to overcome the limitation of the simulation by storage resources is to use a Wasserstein generative adverserial network to generate PXD background data. A challenge is the high resolution of 250x768 pixels of in total 40 sensors with correlations between them. We will present the current status of this approach and assess its quality based on tracking performance studies.
There is a general trend in WLCG towards the federation of resources, aiming for increased simplicity, efficiency, flexibility, and availability. Although general, VO-agnostic federation of resources between two independent and autonomous resource centres may prove arduous, a considerable amount of flexibility in resource sharing can be achieved, in the context of a single WLCG VO, with a relatively simple approach. We have demonstrated this for PIC and CIEMAT, the Spanish Tier-1 and Tier-2 sites for CMS (separated by 600 Kms, ~10 ms latency), by making use of the existing CMS xrootd federation (AAA) infrastructure and profiting from the common CE/batch technology used by the two centres (HTCondor). This work describes how compute slots are shared between the two sites with transparent and efficient access to the input data irrespective of its location. This approach allows to dynamically increase the capacity of a site with idle execution slots from the remote site. Our contribution also includes measurements for diverse CMS workflows comparing performances between local and remote execution. In addition to enabling an increased flexibility in the use of the resources, this lightweight approach can be regarded as a benchmark to explore future potential scenarios, where storage resources would be concentrated in a reduced number of sites.
The Queen Mary University of London WLCG Tier-2 Grid site has been providing GPU resources on the Grid since 2016. GPUs are an important modern tool to assist in data analysis. They have historically been used to accelerate computationally expensive but parallelisable workloads using frameworks such as OpenCL and CUDA. However, more recently their power in accelerating machine learning, using libraries such as TensorFlow and Coffee, has come to the fore and the demand for GPU resources has increased. Significant effort is being spent in high energy physics to investigate and use machine learning to enhance the analysis of data. GPUs may also provide part of the solution to the compute challenge of the High Luminosity LHC. The motivation for providing GPU resources via the Grid is presented. The Installation and configuration of the SLURM batch system together with Compute Elements (Cream and ARC) for use with GPUs is shown. Real world use cases are presented and the success and issues observed will be discussed. Recommendations, informed by our experiences, and our future plans will also be given.
In a HEP Computing Center, at least 1 batch systems are used. As an example, at IHEP, we’ve used 3 batch systems, PBS, HTCondor and Slurm. After running PBS as local batch system for 10 years, we replaced it by HTCondor (for HTC) and Slurm (for HPC). During that period, problems came up on both user and admin sides.
On user side, the new batch systems bring a set of new commands, which users have to learn and remember more. In particular, some users would have to use HTCondor and Slurm in the meantime. Furthermore, HTCondor and Slurm provide more functions, which means more complicated usage mode, compared to the simple PBS commands.
On admin side, HTCondor gives more freedom to users, which becomes a problem to admins. Admins have to find the solutions for many problems: preventing users from requesting the resources they are not allowed to use, checking if the required attributes are correct, deciding which site is requested (Slurm cluster, remote sites, virtual machine sites), etc.
For the above requirements, HepJob was developed. HepJob provides a set of simple commands to users, hep_sub, hep_q, hep_rm, etc. In the submission procedure, HepJob checks all the attributes and ensure all attributes are correct; Assigns the proper resources to users, the user and group info is obtained from the management database; Routes jobs to the targeted site; Goes through the remaining steps.
Users can start with HepJob very easily and admins can take many prevention actions in HepJob.
The Jiangmen Underground Neutrino Observatory (JUNO) is a multipurpose neutrino experiment, which plans to take about 2PB raw data each year starting from 2021. The experiment data plans to be stored in IHEP and have another copy in Europe (CNAF, IN2P3, JINR data centers). MC simulation tasks are expected to be arranged and operated through a distributed computing system to share efforts among data centers. The paper will present the design of the JUNO distributed computing system based on DIRAC to meet the requirements of the JUNO workflow and dataflow among data centers according to the JUNO computing model. The production system to seamlessly manage the JUNO MC simulation workflow and dataflow together is designed within the DIRAC transformation framework, in which data flows among data centers for production groups are managed based on the DIRAC data management infrastructure which uses DFC as File Catalogue, request manager to interface with FTS and transformation system to manage a bundle of files. The muon simulation with optical photon which has huge memory and CPU time problems would be the most challenging part. Therefore, multicore supports and GPU federation are considered in the system to meet this challenge. The function and performance tests to evaluate the prototype system would be also presented in the paper.
Low latency, high throughput data processing in distributed environments is a key requirement of today's experiments. Storage events facilitate synchronisation with external services where the widely adopted request-response pattern does not scale because of polling as a long-running activity. We discuss the use of an event broker and stream processing platform (Apache Kafka) for storage events, with respect to automatised scientific workflows starting from file system events (dCache, GPFS) as triggers for data processing and placement.
In a brokered delivery, the broker provides the infrastructure for routing generated events to consumer services. A client connects to the broker system and subscribes to streams of storage events which consist of data transfer records for files being uploaded, downloaded and deleted. This model is complemented by direct delivery using W3C’s Server-Sent Events (SSE) protocol. We also address the shaping of a security model, where authenticated clients are authorised to read dedicated subsets of events.
On the compute side, the messages feed into event-driven work-flows, either user supplied software stacks or solutions based on open-source platforms like Apache Spark as analytical framework and Apache OpenWhisk for Function-as-a-Service (FaaS) and more general computational microservices. Building on cloud application templates for scalable analysis platforms, desired services can be dynamically provisioned on DESY's on-premise OpenStack cloud as well as in commercial hybrid cloud environments. Moreover, this model supports also the integration of data management tools like Rucio to address data locality e.g. to move files subsequent to processing by event-driven work-flows.
ATLAS event processing requires access to centralized database systems where information about calibrations, detector status and data-taking conditions are stored. This processing is done on more than 150 computing sites on a world-wide computing grid which are able to access the database using the squid-Frontier system. Some processing workflows have been found which overload the Frontier system due to the Conditions data model currently in use, specifically because some of the Conditions data requests have been found to have a low caching efficiency. The underlying cause is that non-identical requests as far as the caching are actually retrieving a much smaller number of unique payloads. While ATLAS is undertaking an adiabatic transition during LS2 and Run-3 from the current COOL Conditions data model to a new data model called CREST for Run 4, it is important to identify the problematic Conditions queries with low caching efficiency and work with the detector subsystems to improve the storage of such data within the current data model. For this purpose ATLAS put together an information aggregation and analytics system. The system is based on aggregated data from the squid-Frontier logs using the Elastic Search technology. This talk describes the components of this analytics system from the server based on Flask/Celery application to the user interface and how we use Spark SQL functionalities to filter data for making plots, storing the caching efficiency results into a PostgreSQL database and finally deploying the package via a Docker container.
With increasing data volume from Nuclear Physics experiments requirements to data
storage and access are changing. To keep up with large data sets new data formats
are needed for efficient processing and analysis of the data. Frequently, in the
experiments data goes through stages from data acquisition to reconstruction and
data analysis and data is converted from one format to another causing waisted CPU
cycles.
In this work we present High Performance Output (HIPO) data format developed
for CLAS12 experiment at Jefferson National Laboratory. It was designed to fit the needs
of data acquisition and high level data analysis, to avoid data format conversions
at different stages of data processing. The new format was designed
to store different event topologies from reconstructed data in tagged form
for efficient access by different analysis groups. In centralized data skimming
applications HIPO data format significantly outperforms standard data formats
used in Nuclear and High Energy Physics (ROOT) and industry standard formats,
such as Apache Avro and Apache Parquet.
For almost 10 years now XRootD has been very successful at facilitating data management of LHC experiments. Being the foundation and main component of numerous solutions employed within the WLCG collaboration (like EOS and DPM), XRootD grew into one of the most important storage technologies in the High Energy Physics (HEP) community. With the latest major release (5.0.0) XRootD framework brought not only architectural improvements and functional enhancements, but also introduced a TLS based, secure version of the xroot/root data access protocol (a prerequisite for supporting access tokens).
In this contribution we explain the xroots/roots protocol mechanics and focus on the implementation of the encryption component engineered to ensure low latencies and high throughput. We also give an overview of other developments finalized in release 5.0.0 (extended attributes support, verified close, etc.), and finally, we discuss what else is on the horizon.
The File Transfer Service developed at CERN and in production since 2014, has become fundamental component for LHC experiments workflows.
Starting from the beginning of 2018 with the participation to the EU project Extreme Data Cloud (XDC) [1] and the activities carried out in the context of the DOMA TPC [2] and QoS [3] working groups, a series of new developments and improvements has been planned and performed taking also into account the requirements from the experiments.
This talk will mainly focus on the support for OpenID Connect and the QoS integration via CDMI as output of the XDC project.
The integration with OpenID Connect is also following the direction of the future Authentication and Authorisation Infrastructure (AAI) for WLCG experiments.
The service scalability enhancements, the support for Xrootd and HTTP TPC and the first 'non-gridftp' transfers experiences via FTS between WLCG production sites will be also described, with an emphasis on performance comparison.
The service enhancements are meeting the requirements for LHC Run-3 and facilitating the adoption for other HEP and non-HEP communities.
[1] http://www.extreme-datacloud.eu/
[2] https://twiki.cern.ch/twiki/bin/view/LCG/ThirdPartyCopy
[3] https://twiki.cern.ch/twiki/bin/view/LCG/QoS
The XRootD software framework is essential for data access at WLCG sites. The WLCG community is exploring and expanding XRootD functionality. This presents a particular challenge at the RAL Tier-1 as the Echo storage service is a Ceph based Erasure Coded object store. External access to Echo uses gateway machines which run GridFTP and XRootD servers. This paper will describe how third party copy, WebDav and additional authentication protocols have been added to these XRootD servers. This allows ALICE to use Echo as well as preparing for the eventual phase out of GridFTP.
Local jobs access Echo via XCaches on every worker node. Remote jobs are increasingly accessing data via XRootD on Echo. For CMS jobs this is via their AAA service. For ATLAS, who are consolidating their storage at fewer sites, jobs are increasingly accessing data remotely. This paper describes the coninuting work to optimise both types of data access by testing different caching methods, including remotely configured XCaches (using SLATE) running on the RAL OpenStack cloud infrastructure.
When the LHC started data taking in 2009 the data rates were unprecedented for the time and forced the WLCG community develop a range of tools for managing their data across many different sites. A decade later other science communities are finding their data requirements have grown far beyond what they can easily manage and are looking for help. The RAL Tier-1’s primary mission has always been to provide resources for the LHC experiments although 10% is set aside for non-LHC experiments. In the last 2 years the Tier-1 has received additional funding to support other scientific communities and now provides over 5 PB disk and 13PB Tape storage to them.
RAL has run an FTS service for the LHC experiments for several years. Smaller scientific communities have used this for moving large volumes of data between sites but have frequently reported difficulties. In the past RAL also provided an LFC service for managing files stored on the Grid. The RAL Tier-1 should provide a complete data management solution for these communities and it was therefore decided to setup a Rucio instance to do this.
In April 2018 a Rucio instance was setup at RAL which has so far been used by the AENEAS project. RAL is providing the development effort to allow a single Rucio instance to support multiple experiments. RAL also runs a DynaFed service which is providing an authentication and authorization layer in front of RAL S3 storage service.
In preparation for Run 3 of the LHC, the ATLAS experiment is modifying its offline software to be fully multithreaded. An important part of this is data structures that can be efficiently and safely concurrently accessed from many threads. A standard way of achieving this is through mutual exclusion; however, the overhead from this can sometimes be excessive. Fully lockless implementations are known for some data structures; however, they are typically complex, and the overhead they require can sometimes be larger than that required for locking implementations. An interesting compromise is to allow lockless access only for reading but not for writing. This often allows the data structures to be much simpler, while still giving good performance for read-mostly access patterns. This talk will show some examples of this strategy in data structures used by the ATLAS offline software. It will also give examples of synchronization strategies inspired by read-copy-update, as well as helpers for memoizing values in a multithreaded environment.
Marlin is the event processing framework of the iLCSoft ecosystem. Originally developed
for the ILC more than 15 years ago, it is now widely used, e.g. by CLICdp, CEPC and
many test beam projects such as Calice, LCTPC and EU-Telescope. While Marlin is
lightweight and flexible it was originally designed for sequential processing only.
With MarlinMT we now evolved Marlin for parallel processing of events on multi-core
architectures based on multi-threading. We report on the necessary developments and
issues encountered, within Marlin as well as with the underlying LCIO EDM. A focus will be
put on the new parallel event processing (PEP) scheduler. We conclude with first
performance estimates, such as application speedup and memory profiling, based on parts
of the ILD reconstruction chain that have been ported to MarlinMT.
The advent of computing resources with co-processors, for example Graphics Processing Units (GPU) or Field-Programmable Gate Arrays (FPGA), for use cases like the CMS High-Level Trigger (HLT) or data processing at leadership-class supercomputers imposes challenges for the current data processing frameworks. These challenges include developing a model for algorithms to offload their computations on the co-processors as well as keeping the traditional CPU busy doing other work. The CMS data processing framework, CMSSW, implements multithreading using the Intel’s Threading Building Blocks (TBB) library, that utilizes tasks as concurrent units of work. In this talk we will discuss a generic mechanism to interact effectively with non-CPU resources that has been implemented in CMSSW. In addition, configuring such a heterogeneous system is challenging. In CMSSW an application is configured with a configuration file written in the Python language. The algorithm types are part of the configuration. The challenge therefore is to unify the CPU and co-processor settings while allowing their implementations to be separate. We will explain how we solved these challenges while minimizing the necessary changes to the CMSSW framework. We will also discuss on a concrete example how algorithms would offload work to NVIDIA GPUs using directly the CUDA API.
As the mobile ecosystem has demonstrated, ARM processors and GPUs promise to deliver higher compute efficiency with a lower power consumption. One interesting platform to experiment with architectures different from a traditional x86 machine is the NVIDIA AGX Xavier SoC, that pairs a 64-bit ARM processor 8 cores with a Volta-class GPU with 512 CUDA cores. The CMS reconstruction software was ported to run on the ARM architecture, and there is an ongoing effort to rewrite some of the most time-consuming algorithms to leverage NVIDIA GPUs. In this presentation we will explore the challenges of running the CMS reconstruction software on a smll embedded device, and compare its compute performance and power consumption with those of a traditional x86 server.
With GPUs and other kinds of accelerators becoming ever more accessible, High Performance Computing Centres all around the world using them ever more, ATLAS has to find the best way of making use of such accelerators in much of its computing.
Tests with GPUs -- mainly with CUDA -- have been performed in the past in the experiment. At that time the conclusion was that it was not advantageous for the ATLAS offline and trigger software to invest time and money into GPUs. However as the usage of accelerators has become cheaper and simpler in recent years, their re-evaluation in ATLAS's offline software is warranted.
We will show code designs and performance results of using OpenCL, OpenACC and CUDA to perform calculations using the ATLAS offline/analysis (xAOD) Event Data Model. We compare the performance and flexibility of these different offload methods, and show how different memory management setups affect our ability to offload different types of calculations to a GPU efficiently. So that an overall throughout increase could be achieved even without highly optimising our reconstruction code specifically for GPUs.
The future upgraded High Luminosity LHC (HL-LHC) is expected to deliver about 5 times higher instantaneous luminosity than the present LHC, producing pile-up up to 200 interactions per bunch crossing. As a part of its phase-II upgrade program, the CMS collaboration is developing a new end-cap calorimeter system, the High Granularity Calorimeter (HGCAL), featuring highly-segmented hexagonal silicon sensors (0.5-1.1 cm2) and scintillators (4-30 cm2), totalling more than 6 million channels in comparison to about 200k channels for the present CMS endcap calorimeters. For each event, the HGCAL clustering algorithm needs to reduce more than 100k hits into ~10k clusters, while keeping the fine shower structure. The same algorithm must reject pileup for further shower reconstruction. Due to the high pileup in the HL-LHC and high granularity, HGCAL clustering is confronted with an unprecedented surge of computation load. This motivates the concept of high-throughput heterogeneous computing in HGCAL offline clustering. Here we introduce a fully-parallelizable density-based clustering algorithm running on GPUs. It uses a tile-based data structure as input for a fast query of neighbouring cells and achieves an O(n) computational complexity. Within the CMS reconstruction framework, clustering on GPUs demonstrates at least a 10x throughout increase compared to current CPU-based clustering.
In this talk I will present an investigation into sizeable interference effects between a {heavy} charged Higgs boson signal produced via $gg\to t\bar b H^-$ (+ c.c.) followed by the decay $H^-\to b\bar t$ (+ c.c.) and the irreducible background given by $gg\to t\bar t b \bar b$ topologies at the Large Hadron Collider (LHC). I will show how such effects could spoil current $H^\pm$ searches where signal and background are normally treated separately. The reason for this is that a heavy charged Higgs boson can have a large total width, in turn enabling such interferences, altogether leading to very significant alterations, both at the inclusive and exclusive level, of the yield induced by the signal alone. This therefore implies that currently established LHC searches for such wide charged Higgs bosons require modifications. This is shown quantitatively using two different benchmark configurations of the minimal realisation of Supersymmetry, wherein such $H^\pm$ states naturally exist.
GAMBIT is a modular and flexible framework for performing global fits to a wide range of theories for new physics. It includes theory and analysis calculations for direct production of new particles at the LHC, flavour physics, dark matter experiments, cosmology and precision tests, as well as an extensive library of advanced parameter-sampling algorithms. I will present the GAMBIT software framework and give a brief overview of the main physics results it has produced so far.
Despite the overwhelming cosmological evidence for the existence of dark matter, and the considerable effort of the scientific community over decades, there is no evidence for dark matter in terrestrial experiments.
The GPS.DM observatory uses the existing GPS constellation as a 50,000 km-aperture sensor array, analysing the satellite and terrestrial atomic clock data for exotic physics signatures. In particular, the collaboration searches for evidence of transient variations of fundamental constants correlated with the Earth’s galactic motion through the dark matter halo.
The initial results of the search lead to an orders-of-magnitude improvement in constraints on certain models of dark matter [1].
I will discuss the initial results and future prospects, including the method used for processing the data, and the “GPS simulator” and dark-matter signal generator we built to test to methods [2].
[1] B. M. Roberts, G. Blewitt, C. Dailey, M. Murphy, M. Pospelov, A. Rollings, J. Sherman, W. Williams, and A. Derevianko, Nat. Commun. 8, 1195 (2017).
[2] B. M. Roberts, G. Blewitt, C. Dailey, and A. Derevianko, Phys. Rev. D 97, 083009 (2018).
In this talk, we discuss the new physics implication in Two Higgs doublet Model (2HDM) under various experimental constraints. As part work of Gambit group, our work is to use the global fit method to constrain the parameter space, find out the hints for new physics and try to make some predictions for further studies.
In our global fit, we include the constraints from LEP, LHC (SM-like Higgs boson search), the theoretical requirements ( Unitarity, Perturbativity, and vacuum stability), various flavour physics constraints ( radiative B Decay $B \to X_s \gamma$, rare fully leptonic B decays $B \to \mu^+\mu^-$ ,etc) and muon g-2 anomaly.
After the 7-parameter global fit, we have a detailed study about the result, analysing individual constraints effects, finding out advantages of every constraint in constraining parameters and discovering new particles. For the Type-II 2HDM, we find that the $\lambda_2$ is sensitive to the LHC SM-lkie Higgs boson search results. Our final results will be displayed in $\tan \beta$ - $\cos(\beta-\alpha)$, $m_A$ - $\tan \beta$, which are usually considered.
At the high luminosity flavor factory experiments such as the Belle II
experiment, it is expected to find the new physics effect and
constrain the new physics models with the high statics and many
observables. In such analysis, the global analysis of the many
observables with the model-independent approach is important. One
difficulty in such global analysis is that the new physics could
affect the numerical results obtained by experiments assuming the
Standard Model, because of the changes of the reconstructed
kinematical distributions used in the event selection and in the
fitting to obtain the number of signal and background events.
Therefore, it is also important to prepare the event generator
including the new physics effects for the Monte Carlo simulation of
the detector response to estimate and consider the effects properly in
the global analysis.
In this work, we present development of the event generator of
B->K(*)ll decays including the new physics effect in the
model-independent way by parametrizing with the Wilson coefficients.
We implement the decay model using the EvtGen
[https://evtgen.hepforge.org/] framework so that it can be applicable
in the analysis software framework of the B physics experiments. For the
theoretical calculation of the new physics effect we consider the EOS
[https://eos.github.io/] library and other possible calculations. We
report the results obtained by the developed event generator and
application in the global analysis.
Searches for beyond-Standard Model physics at the LHC have thus far not uncovered any evidence of new particles, and this is often used to state that new particles with low mass are now excluded. Using the example of the supersymmetric partners of the electroweak sector of the Standard Model, I will present recent results from the GAMBIT collaboration that show that there is plenty of room for low mass solutions based on the LHC data. I will then present a variety of methods for designing new LHC analyses that can successfully target those solutions.
The EGI Cloud Compute service offers a multi-cloud IaaS federation that brings together research clouds as a scalable computing platform for research accessible with OpenID Connect Federated Identity. The federation is not limited to single sign-on, it also introduces features to facilitate the portability of applications across providers: i) a common VM image catalogue VM image replication to ensure these images will be available at providers whenever needed; ii) a GraphQL information discovery API to understand the capacities and capabilities available at each provider; and iii) integration with orchestration tools (such as Infrastructure Manager) to abstract the federation and facilitate using heterogeneous providers. EGI also monitors the correct function of every provider and collects usage information across all the infrastructure.
DODAS (Dynamic On Demand Analysis Service) is an open-source Platform-as-a-Service tool, which allows to deploy software applications over heterogeneous and hybrid clouds. DODAS is one of the so-called Thematic Services of the EOSC-hub project and it instantiates on-demand container-based clusters offering a high level of abstraction to users, allowing to exploit distributed cloud infrastructures with a very limited knowledge of the underlying technologies.
This work presents a comprehensive overview of DODAS integration with EGI Cloud Federation, reporting the experience of the integration with CMS Experiment submission infrastructure system.
The cloudscheduler VM provisioning service has been running production jobs for ATLAS and Belle II for many years using commercial and private clouds in Europe, North America and Australia. Initially released in 2009, version 1 is a single Python 2 module implementing multiple threads to poll resources and jobs, and to create and destroy virtual machine. The code is difficult to scale, maintain or extend and lacks many desirable features, such as status displays, multiple user/project management, robust error analysis and handling, and time series plotting, to name just a few examples. To address these shortcomings, our team has re-engineered the cloudscheduler VM provisioning service from the ground up. The new version, dubbed cloudscheduler version 2 or CSV2, is written in Python 3 runs on any modern Linux distribution, and uses current supporting applications and libraries. The system is composed of multiple, independent Python 3 modules communicating with each other through a central MariaDB (version 10) database. It features both graphical (web browser), and command line user interfaces and supports multiples users/projects with ease. Users have the ability to manage and monitor their own cloud resources without the intervention of a system administrator. The system is scalable, extensible, and maintainable. It is also far easier to use and is more flexible than its predecessor. We present the design, highlight the development process which utilizes unit tests, and show encouraging results from our operational experience with thousands of jobs and workernodes. We also present our experience with containers for running workloads, code development and software distribution.
Cloud Services for Synchronization and Sharing (CS3) have become increasing popular in the European Education and Research landscape in the last
years. Services such as CERNBox, SWITCHdrive, CloudStor and many more have become indispensable in everyday work for scientists, engineers and in administration
CS3 services represent an important part of the EFSS market segment (Enterprise File Sync and Share). According to the report at the last CS3 2019 Rome conference, 25 sites provide a service to the total of 395 thousand researchers and educators around the globe (in Europe and Australia, China, US, Brazil, South Africa and Russia) serving 2.7 billion files (corresponding to 11.5 PB of storage). CS3 provides easily accessible, sync&share services with intuitive and responsive user interfaces.
Although these services are becoming popular because of their intuitive interface for sharing and synchronization of data, availability on all platforms (mobile, desktop and web) and capabilities to adapt to different user scenarios such as offline work, the commercially developed sync&share platforms are not sufficiently integrated with research services, tools and applications. This lack of integration currently forms a major bottleneck for European collaborative research communities. In addition, services as operated by several European providers who are in CS3, are currently too fragmented.
The CS3 APIs is a set of APIs to make research clouds based on sync and share technology interoperable. The APIs are designed to decrease the burden of porting an application developed for one EFSS service to another one and also provide a standard way to connect the sync and share platform with existing and new storage repositories over a well-defined metadata control protocol. These interconnections increase the cohesion between services to create an easily-accessible and integrated science environment that facilitates research activities across institutions without having fragmented silos based on ad-hoc solutions.
We report on our experience designing the protocol and the reference implementation (REVA), and its future evolution to reduce the fragmentation in the pan-European research network.
The use of commercial cloud services has gained popularity in research environments. Not only it is a flexible solution for adapting computing capacity to the researchers' needs, it also provides access to the newest functionalities on the market. In addition, most service providers offer cloud credits, enabling researchers to explore innovative architectures before procuring them at scale. Yet, the economical and contractual aspects linked to the production use of commercial clouds are often overlooked, preventing researchers to reap their full benefits.
CERN, in collaboration with leading European research institutes, has launched several initiatives to bridge this gap. Completed in 2018, the HNSciCloud Pre-Commercial Procurement (PCP) project successfully developed a European hybrid cloud platform to pioneer the convergence and deployment of commercial cloud, high-performance computing and big-data capabilities for scientific research. Leveraging many of the lessons learned from HNSciCloud, the OCRE project - Open Clouds for Research Environments - started in January 2019 in order to accelerate commercial cloud adoption in the European research community. In parallel, the ARCHIVER PCP project - Archiving and Preservation for Research Environments - will develop hybrid and scalable solutions for archiving and long-term preservation of scientific data whilst ensuring that research groups retain stewardship of their datasets.
With a total procurement budget exceeding €18 million, these initiatives are setting best practices for effective and sustainable procurement of commercial cloud services for research activities. These are highly relevant as, in the wider context of the European Open Science Cloud (EOSC), the engagement of commercial providers is considered fundamental to contribute to the creation of a sustainable, technologically advanced environment with open services for data management, analysis and re-use across disciplines, with transparent costing models.
In this contribution, we will detail the outcomes of the HNSciCloud PCP project, expand on the objectives of the subsequent OCRE and ARCHIVER projects and provide a vision for the role of the private sector within the EOSC.
In the last couple of years, we have been actively developing the Dynamic On-Demand Analysis Service (DODAS) as an enabling technology to deploy container-based clusters over any Cloud infrastructure with almost zero effort. The DODAS engine is driven by high-level templates written in the TOSCA language, that allows to abstract the complexity of many configuration details. DODAS is particularly suitable for harvesting opportunistic computing resources; this is why several scientific communities already integrated their computing use cases into DODAS-instantiated clusters automating the instantiation, management and federation of HTCondor batch system.
The increasing demand, availability and utilization of HPC by and for multidisciplinary user community, often mandates the possibility to transparently integrate, manage and mix HTC and HPC resources.
In this paper, we discuss our experience extending and using DODAS to connect HPC and HTC resources in the context of a distributed Italian regional infrastructure involving multiple sites and communities. In this use case, DODAS automatically generates HTCondor-based clusters on-demand, dynamically and transparently federating sites that may also include HPC resources managed by SLURM; DODAS allows user workloads to make opportunistic and automated use of both HPC and HTC resources, thus effectively maximizing and optimizing resource utilization.
We also report on our experience of using and federating HTCondor batch systems exploiting the JSON Web Token capabilities introduced in recent HTCondor versions, replacing the traditional X509 certificates in the whole chain of workload authorization. In this respect we also report on how we integrated HTCondor using OAuth with the INDIGO IAM service.
Cloud computing is becoming mainstream, with funding agencies moving beyond prototyping and starting to fund production campaigns, too. An important aspect of any production computing campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not.
We thus embarked on a network characterization campaign, documenting traceroutes, latency and throughput in various regions of Amazon AWS, Microsoft Azure and Google GCP Clouds, both between Cloud resources and major DTNs in the Pacific Research Platform, including OSG data federation caches in the network backbone, and inside the clouds themselves. We also documented the incurred cost while doing so.
Along the way we discovered that network paths were often not what the major academic network providers thought they were, and we helped them in improving the situation, thus improving peering between academia and commercial cloud.
In this talk we present the observed results, both during the initial test runs and the latest state of the art, as well as explain what it took to get there.
Most of the challenges set by modern physics endeavours are related to the management, processing and analysis of massive amount of data. As stated in a recent Nature editorial (The thing about data, Nature Physics volume 13, page 717, 2017), "the rise of big data represents an opportunity for physicists. To take full advantage, however, they need a subtle but important shift in mindset". All this calls for a substantial change in the way future physicists are taught: statistics and probability, information theory, machine learning as well as scientific computing and hardware setups should be the pillars of the education of a new physics students generation. This is what an innovative master programme launched in fall 2018 by the University of Padua, "Physics of Data", aims at. This contribution summarises its actual implementation, describing the educational methods (all focused on "hands-on" activities and research projects) and reporting on the brilliant results obtained by the first enrolled students.
The number of women in technical computing roles in the HEP community hovers at around 15%. At the same time there is a growing body of research to suggest that diversity, in all its forms, brings positive impact on productivity and wellbeing. These aspects are directly in line with many organisations’ values and missions, including CERN. Although proactive efforts to recruit more women in our organisations and institutes may help, the percentage of female applicants in candidate pools is similarly low and limits the potential for change. Factors influencing the career choice of girls have been identified to start as early as primary school and are closely tied to encouragement and exposure. It is the hope of various groups in the HEP community that, by intervening early, there may be a change in demographics over the years to come. During 2019, the Women in Technology Community at CERN developed two workshops for 6-9 year olds, which make the fundamental concepts of computer science accessible to young people with no prior experience and minimal assumed background knowledge. The immediate objectives were to demystify computer science, and to allow the children to meet a diverse set of role models from technical fields through our volunteer tutors. The workshops have been run for International Women’s Day and Girls in ICT Day, and a variation will be incorporated into the IT contribution to CERN’s Open Day in September 2019 (where both boys and girls will participate). We will present an overview of the statistics behind our motivation, describe the content of the workshops, results and lessons learnt and the future evolution of such activities.
In recent years proficiency in data science and machine learning (ML) became one of the most requested skills for jobs in both industry and academy. Machine learning algorithms typically require large sets of data to train the models and extensive usage of computing resources both for training and inference. Especially for deep learning algorithms, training performances can be dramatically improved by exploiting Graphical Processing Units (GPU). The needed skill set for a data scientist is therefore extremely broad, and ranges from knowledge of ML models to distributed programming on heterogeneous resources. While most of the available training resources focus on ML algorithms and tools such as TensorFlow, we designed a course for doctoral students where model training is tightly coupled with underlying technologies that can be used to dynamically provision resources. Throughout the course, students have access to OCCAM, an HPC facility at the University of Torino, managed using container-based cloud-like technologies, where Computing Applications are run on Virtual Clusters deployed on top of the physical infrastructure.
Task scheduling over OCCAM resources is managed by an orchestration layer (such as Mesos or Kubernetes), leveraging Docker containers to define and isolate the runtime environment. The Virtual Clusters developed to execute ML workflows are accessed through a web interface based on JupyterHub. When a user authenticates on the Hub, a notebook server is created as a containerized application. A set of libraries and helper functions is provided to execute a parallelized ML task by automatically deploying a Spark driver and several Spark execution nodes as Docker containers. This solution automates the delivery of the software stack required by a typical ML workflow and enables scalability by allowing the execution of ML tasks, including training, over commodity (i.e. CPUs) or high-performance (i.e. GPUs) resources distributed over different hosts across a network.
iTHEPHY is an ERASMUS+ project which aims at developing innovative student-centered Deeper Learning Approaches (DPA) and Project-Based teaching and learning methodologies for HE students, contributing to increase the internationalization of physics master courses. In this talk we'll introduce the iTHEPHY project status and main goals attained, with a focus on the web-based virtual environment developed in order to support the groups of students and the teams of teachers during their DPA learning and teaching activities: the iTHEPHY DPA Platform. The iTHEPHY DPA platform will be described in detail, focusing on the methodologies and technologies which enabled us to deliver a modular, user friendly, open-source, scalable and reusable platform that can be adopted by other communities in a straightforward way. The presentation will describe the work carried out in order to integrate some well established tools like Moodle, Redmine, BigBlueButton, Rocketchat, Jitsi, Sharelatex and INDIGO-DataCloud IAM. Some aspects about containerization of services in Cloud will be also covered. Finally, some reflections about sustainability of the software platform delivered will be presented.
The International Particle Physics Outreach Group (IPPOG) is a network of scientists, science educators and communication specialists working across the globe in informal science education and outreach for particle physics. IPPOG’s flagship activity is the International Particle Physics Masterclass programme, which provides secondary students with access to particle physics data using dedicated visualisation and analysis software. Students meet scientists, learn about particle physics, accelerators and detectors, perform physics measurements and search for new phenomena, then compare results in an end-of-day videoconference with other classes. The most recent of these events was held from 7 March to 16 April 2019 with thousands of students participating in 332 classes held in 239 institutes from 54 countries around the world. We report on the evolution of Masterclasses in recent years, in both physics and computing scope, as well as in global reach.
High Performance Computing (HPC) centers are the largest facilities available for science. They are centers of expertise for computing scale and local connectivity and represent unique resources. The efficient usage of HPC facilities is critical to the future success of production processing campaigns of all Large Hadron Collider (LHC) experiments. A substantial amount of R&D investigations are being performed in order to harness the power provided by such machines. HPC facilities are early adopters of heterogenous accelerated computing architectures, which represent a challenge and an opportunity. The adoption of accelerated heterogenous architectures has the potential to dramatically increase the performance of specific workflows and algorithms. In this presentation we will discuss R&D work on using alternative architectures both in collaboration with industry through CERN openlab and with the DEEP-EST project, a European consortium to build a prototype modular HPC infrastructure at the exa-scale. We will present the work on a proof-of-concept container platform and batch integration for workload submissions to access HPC testbed resources for data intensive science applications. As strategic computing resources, HPC centers are often isolated with tight network security, which represents a challenge for data delivery and access. We will close by summarizing the requirements and challenges for data access, through the Data Organization Management and Access (DOMA) project of the WLCG. Facilitating data access is critical to the adoption of HPC centers for data intensive science.
High Energy Physics (HEP) experiments will enter a new era with the start of the HL-LHC program, where computing needs required will surpass by large factors the current capacities. Looking forward to this scenario, funding agencies from participating countries are encouraging the HEP collaborations to consider the rapidly developing High Performance Computing (HPC) international infrastructures as a mean to satisfy at least a fraction of the foreseen HEP processing demands. Moreover, considering that HEP needs have been usually covered by facilities cost-optimized rather than performance-optimized, employing HPC centers would also allow access to more advanced resources. HPC systems are highly non-standard facilities, custom-built for use cases largely different from CMS demands, namely the processing of real and simulated particle collisions which can be analyzed individually without any correlation. The utilization of these systems by HEP experiments would not trivial, as each HPC center is different, increasing the level of complexity from the CMS integration and operations perspectives. Additionally, while CMS data is residing on a distributed highly-interconnected storage infrastructure, HPC systems are in general not meant for accessing large data volumes residing outside the facility. Finally, the allocation policies to these resources is quite different from the current usage of pledged resources deployed at CMS supporting Grid sites. This contribution will report on the CMS strategy developed to make effective use of HPC resources, involving a closer collaboration between CMS and HPC centers in order to further understand and subsequently overcome the present obstacles. Progress in the necessary technical and operational adaptations being made in CMS computing will be described.
The High-Luminosity LHC will provide an unprecedented data volume of complex collision events. The desire to keep as many of the "interesting" events for investigation by analysts implies a major increase in the scale of compute, storage and networking infrastructure required for HL-LHC experiments. An updated computing model is required to facilitate the timely publication of accurate physics results from HL-LHC data samples. This talk discusses the study of the computing requirements for CMS during the era of the HL-LHC. We will discuss how we have included requirements beyond the usual CPU, disk and tape estimates made by LHC experiments during Run 2, such as networking and tape read/write rate requirements. We will show how Run 2 monitoring data has been used to make choices towards a HL-LHC computing model. We will illustrate how changes to the computing infrastructure or analysis approach can impact total resource needs and cost. Finally, we will discuss the approach and status of the CMS process for evolving its HL-LHC computing model based on modeling and other factors.
High Performance Computing (HPC) supercomputers are expected to play an increasingly important role in HEP computing in the coming years. While HPC resources are not necessarily the optimal fit for HEP workflows, computing time at HPC centers on an opportunistic basis has already been available to the LHC experiments for some time, and it is also possible that part of the pledged computing resources will be offered as CPU time allocations at HPC centers in the future. The integration of the experiment workflows to make the most efficient use of HPC resources is therefore essential.
This presentation will describe the work that has been necessary to integrate LHCb workflows at HPC sites. This has required addressing two types of challenges: in the distributed computing area, for efficiently submitting jobs, accessing the software stacks and transferring data files; and in the software area, for optimising software performance on hardware architectures that differ significantly from those traditionally used in HEP. The talk will cover practical experience for the deployment of Monte Carlo generation and simulation workflows at the HPC sites available to LHCb. It will also describe the work achieved on the software side to improve the performance of these applications using parallel multi-process and multi-threaded approaches.
Predictions for requirements for the LHC computing for Run 3 and Run 4 (HL_LHC) over the course of the next 10 years show a considerable gap between required and available resources, assuming budgets will globally remain flat at best. This will require some radical changes to the computing models for the data processing of the LHC experiments. Concentrating computational resources in fewer larger and more efficient centres should increase the cost-efficiency of the operation and, thus, of the data processing. Large scale general purpose HPC centres could play a crucial role in such a model. We report on the technical challenges and solutions adopted to enable the processing of the ATLAS experiment data on the European flagship HPC Piz Daint at CSCS, now acting as a pledged WLCG Tier-2 centre. As the transition of the Tier-2 from classic to HPC resources has been finalised, we also report on performance figures over two years of production running and on efforts for a deeper integration of the HPC resource within the ATLAS computing framework at different tiers.
For many scientific projects, data management is an increasingly complicated challenge. The number of data-intensive instruments generating unprecedented volumes of data is growing and their accompanying workflows are becoming more complex. Their storage and computing resources are heterogeneous and are distributed at numerous geographical locations belonging to different administrative domains and organizations. These locations do not necessarily coincide with the places where data is produced nor where data is stored, analyzed by researchers, or archived for safe long-term storage. To fulfill these needs, the data management system Rucio has been developed to allow the high-energy physics experiment ATLAS to manage its large volumes of data in an efficient and scalable way.
But ATLAS is not alone, and several diverse scientific projects have started evaluating, adopting, and adapting the Rucio system for their own needs. As the Rucio community has grown many improvements have been introduced, customisations have been added, and many bugs have been fixed. Additionally, new dataflows have been investigated and operational experiences have been documented. In this article we collect and compare the common successes, pitfalls, and oddities which arose in the evaluation efforts of multiple diverse experiments, and compare them with the ATLAS experience. This includes the high-energy physics experiments CMS and Belle II, the neutrino experiment DUNE, as well as the LIGO and SKA astronomical observatories.
The ALICE experiment has originally been designed as a relatively low-rate experiment, in particular given the limitations of the Time Projection Chamber (TPC) readout system using MWPCs. This will not be the case anymore for LHC Run 3 scheduled to start in 2021.
After the LS2 upgrades, including a new silicon tracker and a GEM-based readout for the TPC, ALICE will operate at a peak Pb-Pb collision rate of 50 kHz.
To cope with this rate at least the TPC will be operated in continuous mode and all collisions will be read out, compressed and written to permanent storage without any trigger selection.
The First Level Processing (FLP) site will receive continuous raw data at the rate of 3.4 TB/s and send to Event Processing Nodes (EPN) 10-20 ms packets at a rate of about 640 GB/s.
EPNs will perform the data reconstruction and compression in a quasi-streaming and will send them for archival at the rate of 100 GB/s.
Here we present the details of this synchronous stage of ALICE data processing.
The ALICE Experiment at CERN LHC (Large Hadron Collider) is undertaking a major upgrade during LHC Long Shutdown 2 in 2019-2020. The raw data input from the detector will then increase a hundredfold, up to 3.4 TB/s. In order to cope with such a large throughput, a new Online-Offline computing system, called O2, will be deployed.
The FLP servers (First Layer Processor) are the readout nodes hosting the CRU (Common Readout Unit) cards in charge of transferring the data from the detector links to the computer memory. The data then flows through a chain of software components until it is shipped over network to the processing nodes.
In order to select a suitable platform for the FLP, it is essential that the hardware and the software are tested together. Each candidate server is therefore equipped with multiple readout cards (CRU), one InfiniBand 100G Host Channel Adapter, and the O2 readout software suite. A series of tests are then run to ensure the readout system is stable and fulfils the data throughput requirement of 42Gb/s (highest data rate in output of the FLP equipped with 3 CRUs).
This paper presents the software and firmware features developed to evaluate and validate different candidates for the FLP servers. In particular we describe the data flow from the CRU firmware generating data, up to the network card where the buffers are sent over the network using RDMA. We also discuss the testing procedure and the results collected on different servers.
We report on performance measurements and optimizations of the event-builder software for the CMS experiment at the CERN Large Hadron Collider (LHC). The CMS event builder collects event fragments from several hundred sources. It assembles them into complete events that are then handed to the High-Level Trigger (HLT) processes running on O(1000) computers. We use a test system with 16 dual-socket Skylake-based computers interconnected with 100 Gbps Infiniband and Ethernet networks. The main challenge is the demanding message rate and memory performance required of the event-builder node to fully exploit the network capabilities. Each event-builder node has to process several TCP/IP streams from the detector backends at an aggregated bandwidth of 100 Gbps, distribute event fragments to other event-builder nodes at the fist level trigger rate of 100 kHz, verify and build complete events using fragments received from all other nodes, and finally make the complete events available to the HLT processors. The achievable performance on today's hardware and different system architectures is described. Furthermore, we compare native Infiniband with RoCE (RDMA over Converged Ethernet). We discuss the required optimizations and highlight some of the problems encountered. We conclude with an outlook on the baseline CMS event-builder design for the LHC Run 3 starting in 2021.
The Compressed Baryonic Matter (CBM) experiment is currently under construction at the GSI/FAIR accelerator facility in Darmstadt, Germany. In CBM, all event selection is performed in a large online processing system, the “First-level Event Selector” (FLES). The data are received from the self-triggered detectors at an input-stage computer farm designed for a data rate of 1 TByte/s. The distributed input interface will be realized using custom FPGA-based PCIe add-on cards, which preprocess and index the incoming data streams. The data is then transferred to an online processing cluster of several hundred nodes, which will be located in the shared Green-IT data center on campus.
Employing a time-based container data format to decontextualize the time-stamped signal messages from the detectors, data segments of specific time intervals can be distributed on the farm and processed independently. Timeslice building, the continuous process of collecting the data of a time interval simultaneously from all detectors, places a high load on the network and requires careful scheduling and management. Optimizing the design of the online data management includes minimizing copy operations of data in memory, using DMA/RDMA wherever possible, reducing data interdependencies, and employing large memory buffers to limit the critical network transaction rate.
As a demonstrator for the future FLES system, the mini-FLES system has been set up and is currently in operation at the GSI/FAIR facility. Designed as a vertical slice of the full system, it contains a fraction of all foreseen components. It is used to verify the developed hardware and software architecture and includes an initial version of a FLES control system. As part of the mini-CBM experiment of FAIR Phase-0, it is also the central data acquisition and online monitoring system of a multi-detector setup for physics data taking. This presentation will give an overview of the mini-FLES system of the CBM experiment and discuss its performance. The presented material includes latest results from operation in several recent mini-CBM campaigns at the GSI/FAIR SIS18.
The Belle II experiment is a new generation B-factory experiment at KEK in Japan aiming at the search for New Physics in a huge sample of B-meson dacays. The commissioning of accelerator and detector for the first physics run has been started from March this year. The Belle II High Level Trigger (HLT) is fully
working in the beam run. The HLT is now operated with 1600 cores clusterized in 5 units of 16 processing servers, which is the 1/4 of full configuration.
In each unit, the event-by-event basis parallel processing is implemented using the IPC-based ring buffers with the event transport over the network socket connection. The load balancing is automatically ensured by the ring buffers. In each processing server with 20 cores, the parallel processing is implemented utilizing the multi-process approach to run the same code for different events without taking a special care. The copy-on-write fork of processes efficiently reduces the memory consumption.
The event selection is done using the same offline code in two steps. The first is the track finding and the calorimeter clustering, and the rough event selection is performed to discard off-vertex background events efficiently. The full event reconstruction is performed for the selected events and they are classified in multiple categories. Only the events in the categories of interest are finally sent out to the storage. The live data quality monitoring is also performed on HLT.
For the selected events, the reconstructed tracks are extrapolated to the surface of pixel detector(PXD) and lively fed back to the readout electronics for the real time data reduction by sending only the associated hits.
Currently the accelerator study is still on-going to increase the luminosity and the physics data taking is being performed by sharing the time. During the data taking, sometimes the background rate becomes high and the L1 trigger rate reaches close to 10kHz, which is the 1/3 of maximum design rate. In this condition, the performance of Belle II HLT is discussed with the detailed report on various kinds of troubles and their fixes.
ALICE (A Large Ion Collider Experiment), one of the large LHC experiments, is currently undergoing a significant upgrade. Increase in data rates planned for LHC Run3, together with triggerless continuous readout operation, requires a new type of networking and data processing infrastructure.
The new ALICE O2 (online-offline) computing facility consists of two types of nodes: First Level Processors (FLP): containing a custom PCIe cards to receive data from detectors, and Event Processing Nodes (EPN): compute dense nodes equipped with GPGPUs for fast online data compression. FLPs first buffer the detector data for a time interval into SubTimeFrame (STF) objects. A TimeFrame then aggregates all corresponding STFs from each FLP into the TimeFrame (TF) object, located on a designated EPN node where it can be processed. The data distribution network connects FLP and EPN nodes, enabling efficient TimeFrame aggregation and providing a high quality of service.
We present design details of the data distribution network tailored to the requirements of the ALICE O2 facility based on the InfiniBand HDR technology. Further, we will show a scheduling algorithm for TimeFrame distribution from FLP to EPN nodes, which evenly utilizes all available processing capacity and avoids creating long-term network congestion.
The LHCb experiment will be upgraded in 2021 and a new trigger-less readout system will be implemented. In the upgraded system, both event building (EB) and event selection will be performed in software for every collision produced in every bunch-crossing of the LHC. In order to transport the full data rate of 32 Tb/s we will use state of the art off-the-shelf network technologies, e.g. InfiniBand EDR.
The full event building system will require around 500 nodes interconnected together via a non blocking topology, because of the size of the system it very difficult to test at production scale, before the actual procurement. We resort therefore to network simulations as a powerful tool for finding the optimal configuration. We developed an accurate low level description of an InfiniBand based network with event building like traffic.
We will present a full scale simulation of a possible implementation of the LHCb EB network.
The Geant4 electromagnetic (EM) physics sub-packages is an important component of LHC experiment simulations. During long shutdown 2 for LHC these packages are under intensive development and in this work we report a progress for the new Geant4 version 10.6. These developments includes modifications allowing speed-up computations for EM physics, improve EM models, extend set for models, and extend validations of EM physics. Results of EM tests and benchmarks will be discussed in details.
The STAR Heavy Flavor Tracker (HFT) has enabled a rich physics program, providing important insights into heavy quark behavior in heavy ion collisions. Acquiring data during the 2014 through 2016 runs at the Relativistic Heavy Ion Collider (RHIC), the HFT consisted of four layers of precision silicon sensors. Used in concert with the Time Projection Chamber (TPC), the HFT enables the reconstruction and topological identification of tracks arising from charmed hadron decays. The ultimate understanding of the detector efficiency and resolution demands large quantities of high quality simulations, accounting for the precise alignment of sensors, and the detailed response of the detectors and electronics to the incident tracks. The background environment presented additional challenges, as simulating the significant rates from pileup events accumulated during the long integration times of the tracking detectors could have quickly exceeded the available computational resources, and the relative contributions from different sources was unknown. STAR has long addressed these issues by embedding simulations into background events directly sampled during data taking at the experiment. This technique has the advantage of providing a completely realistic picture of the dynamic background environment while introducing minimal additional computational overhead compared to simulation of the primary collision alone, thus scaling to any luminosity. We will discuss how STAR has applied this technique to the simulation of the HFT, and will show how the careful consideration of misalignment of precision detectors and calibration uncertainties results in the detailed reproduction of basic observables, such as track projection to the primary vertex. We will further summarize the experience and lessons learned in applying these techniques to heavy-flavor simulations and discuss recent results.
VecGeom is a geometry modeller library with hit-detection features as needed by particle detector simulation at the LHC and beyond. It was incubated by a Geant-R&D initiative and the motivation to combine the code of Geant4 and ROOT/TGeo into a single, better maintainable piece of software within the EU-AIDA program.
So far, VecGeom is mainly used by LHC experiments as a geometry primitive library called from Geant4, where it has had very positive impact on CPU time due to its faster algorithms for complex primitives.
In this contribution, we turn to a discussion of how VecGeom can be used as the navigating library in Geant4 in order to benefit from both the fast geometry primitives as well as its vectorized navigation module. We investigate whether this integration provides the speed improvements expected, on top of that obtained from geometry primitives. We discuss and benchmark the application of such a VecGeom-navigator plugin to Geant4 for the use-case of ALICE and show paths towards usage for other experiments.
Lastly, an update on the general developments of VecGeom is given.
This includes a review of how developments in VecGeom can further benefit from interfacing with external ray-tracing kernels such as Intel-Embree.
The HL-LHC and the corresponding detector upgrades for the CMS experiment will present extreme challenges for the full simulation. In particular, increased precision in models of physics processes may be required for accurate reproduction of particle shower measurements from the upcoming High Granularity Calorimeter. The CPU performance impacts of several proposed physics models will be discussed. There are several ongoing research and development efforts to make efficient use of new computing architectures and high performance computing systems for simulation. The integration of these new R&D products in the CMS software framework and corresponding CPU performance improvements will be presented.
The JUNO (Jiangmen Underground Neutrino Observatory) experiment is a multi-purpose neutrino experiment designed to determine the neutrino mass hierarchy and precisely measure oscillation parameters. It is composed of a 20kton liquid scintillator central detector equipped with 18000 20’’ PMTs and 25000 3’’ PMTs, a water pool with 2000 20’’ PMTs, and a top tracker. Monte-Carlo simulation is a fundamental tool for optimizing the detector design, tuning reconstruction algorithms, and performing physics study. The status of the JUNO simulation software will be presented, including generator interface, detector geometry, physics processes, MC truth, pull mode electronic simulation and background mixing. This contribution will also present the latest update of JUNO simulation software, including Geant4 upgraded from 9.4 to 10.4, and their performance comparison. Previous electronic simulation algorithm can only work for central detector, a new electronic simulation package is designed to enable joint simulation of all sub-detectors by using the Task/sub-Task/Algorithm/Tool scheme provided by SNiPER framework. The full simulation of optical photons in large liquid scintillator is CPU intensive, especially for cosmic muons, atmospheric neutrinos and proton decay events. For proton decay, users are only interested in the proton decay events with energy deposition between 100MeV and 700MeV, number of Michel electrons larger than 0, and the energy of Michel electron larger than 10MeV. Only 10% of the simulated proton decay events meet these requirements. We made some improvements on simulation procedure to enable doing full optical photon simulation only on some pre-selected events and reduce a lot of computing resources. A pre-simulation without optical photon simulation is carried out firstly, with all the Geant4 steps and other necessary MC truth information saved. Then a pre-selection based on MC truth information can determine which event need to be full simulated with optical photon processes activated and the G4Steps as input. This simulation procedure and relative interfaces can also be used for MPI or GPU based muon events full simulation.
The ALICE experiment at the CERN LHC will feature several upgrades for run 3, one of which is a new inner tracking system (ITS). The ITS upgrade is currently under development and commissioning. The new ITS will be installed during the ongoing long shutdown 2.
The specification for the ITS upgrade calls for event rates of up to 100 kHz for Pb-Pb, and 400 kHz pp, which is two orders of magnitude higher than the existing system. The seven layers of ALPIDE pixel sensor chips significantly improve tracking with a total of 24120 pixel chips. This is a vast improvement over the existing inner tracker with six layers, of which only the two innermost layers were pixel sensors.
A number of factors will have an impact on the performance and readout efficiency of the upgraded ITS in run 3. While these factors are not limited to operating conditions such as run type and event rates, there are also a number of sensor configuration parameters that will have an effect. For instance the strobe length and the choice of sensor operating mode; triggered or continuous.
To that end we have developed a simplified simulation model of the readout hardware in the ALPIDE and ITS, using the SystemC library for system level modeling in C++. This simulation model is three orders of magnitude faster than a normal HDL simulation of the chip, and facilitates simulations of an increased number of events for a large portion of the detector.
In this paper we present simulation results where we have been able to quantify detector performance under different running conditions. The results are used for system configuration as well as ongoing development of the readout electronics.
The unprecedented computing resource needs of the ATLAS experiment have motivated the Collaboration to become a leader in exploiting High Performance Computers (HPCs). To meet the requirements of HPCs, the PanDA system has been equipped with two new components; Pilot 2 and Harvester, that were designed with HPCs in mind. While Harvester is a resource-facing service which provides resource provisioning and workload shaping, Pilot 2 is responsible for payload execution on the resource.
The presentation focuses on Pilot 2, which is a complete rewrite of the original PanDA Pilot used by ATLAS and other experiments for well over a decade. Pilot 2 has a flexible and adaptive design that allows for plugins to be defined with streamlined workflows. In particular, it has plugins for specific hardware infrastructures (HPC/GPU clusters) as well as for dedicated workflows defined by the needs of an experiment.
Examples of dedicated HPC workflows are discussed in which the Pilot either uses an MPI application for processing fine-grained event level service under the control of the Harvester service or acts like an MPI application itself and runs a set of job in an assemble.
In addition to describing the technical details of these workflows, results are shown from its deployment on Cori (NERSC), Theta (ALCF), Titan and Summit (OLCF).
For the past several years, IceCube has embraced a central, global overlay grid of HTCondor glideins to run jobs. With guaranteed network connectivity, the jobs themselves transferred data files, software, logs, and status messages. Then we were given access to a supercomputer, with no worker node internet access. As the push towards HPC increased, we had access to several of these machines, but no easy way to use them. So we went back to the basics of running production jobs, staging data in and out and running offline jobs on the local queue. But we made sure it still integrated directly with our dataset management and file metadata systems, to not lose everything we had gained in recent years.
The SKA will enable the production of full polarisation spectral line cubes at a very high spatial and spectral resolution. Performing a back-of-the-evelope estimate gives you the incredible amount of around 75-100 million tasks to run in parallel to perform a state-of-the-art faceting algorithm (assuming that it would spawn off just one task per facet, which is not the case). This simple estimate formed the basis of the development of a prototype, which had scalability as THE primary requirement. In this talk I will present the current status of the DALiuGE system, including some really exciting computer science research.
ATLAS Computing Management has identified the migration of all resources to Harvester, PanDA’s new workload submission engine, as a critical milestone for Run 3 and 4. This contribution will focus on the Grid migration to Harvester.
We have built a redundant architecture based on CERN IT’s common offerings (e.g. Openstack Virtual Machines and Database on Demand) to run the necessary Harvester and HTCondor services, capable of sustaining the load of O(1M) workers on the grid per day.
We have reviewed the ATLAS Grid region by region and moved as much possible away from blind worker submission, where multiple queues (e.g. single core, multi core, high memory) compete for resources on a site. Instead we have migrated towards more intelligent models that use information and priorities from the central PanDA workload management system and stream the right number of workers of each category to a unified queue while keeping late binding to the jobs.
We will also describe our enhanced monitoring and analytics framework. Worker and job information is synchronized with minimal delays to a CERN IT provided Elastic Search repository, where we can interact with dashboards to follow submission progress, discover site issues (e.g. broken Compute Elements) or spot empty workers.
The result is a much more efficient usage of the Grid resources with smart, built-in monitoring of resources.
The WLCG is today comprised of a range of different types of resources such as cloud centers, large and small HPC centers, volunteer computing as well as the traditional grid resources. The Nordic Tier 1 (NT1) is a WLCG computing infrastructure distributed over the Nordic countries. The NT1 deploys the Nordugrid ARC CE, which is non-intrusive and lightweight, originally developed to cater for HPC centers where no middleware could be installed on the compute nodes. The NT1 runs ARC in the Nordugrid mode which contrary to the Pilot mode leaves jobs data transfers up to ARC. ARCs data transfer capabilities together with the ARC cache are the most important features of ARC.
HPCs are getting increased interest within the WLCG, but so are cloud resources. With the ARC CE as an edge service to the cloud or HPC resource, all data transfers required by a job are downloaded by data transfer nodes on the edge of the cluster before the job starts running on the compute node. This ensures a highly efficient use of the compute nodes CPUs, as the job starts immediately after reaching the compute node compared to the traditional pilot model where the pilot job on the compute node is responsible for fetching the data. In addition, the ARC cache gives a possible several-fold gain if more jobs need the same data. ARCs data handling capabilities ensures very efficient data access to the jobs, and even better for HPC centers with its fast interconnects.
In this presentation we will describe the Nordugrid model with the ARC-CE as an edge service to an HPC or cloud resource and show the gain in efficiency this model provides compared to the pilot model.
Many of the challenges faced by the LHC experiments (aggregation of distributed computing resources, management of data across multiple storage facilities, integration of experiment-specific workflow management tools across multiple grid services) are similarly experienced by "midscale" high energy physics and astrophysics experiments, particularly as their data set volumes are increasing at comparable rates. Often these (international, multi-institution) collaborations have outgrown the computing resources offered by their home laboratories, or the capacities of any single member institution. Unlike the LHC experiments, however, these collaborations often lack the manpower required to build, integrate and operate the systems required to meet their scale. In the Open Science Grid, we have organized a team designed to support collaborative science organizations re-use proven software and patterns in distributed processing and data management, often but not restricted to software developed for the LHC. Examples are re-use of the Rucio and FTS3 software for reliable data transfer and management, XRootD for data access and caching, Ceph for large scale pre-processing storage, and Pegasus for workflow management across heterogeneous resources. We summarize experience with the VERITAS gamma ray observatory, the South Pole Telescope (CMB detector), and the XENON dark matter search experiment.
The “Third Party Copy” (TPC) Working Group in the WLCG’s “Data Organization, Management, and Access” (DOMA) activity was proposed during a CHEP 2018 Birds of a Feather session in order to help organize the work toward developing alternatives to the GridFTP protocol. Alternate protocols enable the community to diversify; explore new approaches such as alternate authorization mechanisms; and reduce the risk due to the retirement of the Globus Toolkit, which provides a commonly used GridFTP protocol implementation.
Two alternatives were proposed to the TPC group for investigation: WebDAV and XRootD. Each approach has multiple implementations, allowing us to demonstrate interoperability between distinct storage systems. As the working group took as a mandate the development of alternatives - and not to select a single protocol - we have put together a program of work allowing both to flourish. This includes community infrastructure such as documentation pointers, email lists, or biweekly meetings, as well as continuous interoperability testing involving production & test endpoints, deployment recipes, scale testing, and debugging assistance.
Each major storage system utilized by WLCG sites now has at least one functional non-GridFTP protocol for performing third-party-copy. The working group is focusing on including a wider set of sites and helping sites deploy more production endpoints. We are interacting with WLCG VOs to perform production data transfers using WebDAV or XRootD at the participating sites with the objective that all sites deploy at least one of these alternative protocols.
Since its earliest days, the Worldwide LHC Computational Grid (WLCG) has relied on GridFTP to transfer data between sites. The announcement that Globus is dropping support of its open source Globus Toolkit (GT), which forms the basis for several FTP client and servers, has created an opportunity to reevaluate the use of FTP. HTTP-TPC, an extension to HTTP compatible with WebDAV, has arisen as a strong contender for an alternative approach.
In this paper, we describe the HTTP-TPC protocol itself, along with the current status of its support in different implementations, and the interoperability testing done within the WLCG DOMA working group’s TPC activity. This protocol also provides the first real use-case for token-based authorisation. We will demonstrate the benefits of such authorisation by showing how it allows HTTP-TPC to support new technologies (such as OAuth, OpenID Connect, Macaroons and SciTokens) without changing the protocol. We will also discuss the next steps for HTTP-TPC, improving documentation and plans to use the protocol for WLCG transfers.
A Third Party Copy (TPC) has existed in the pure XRootD storage environment for many years. However using XRootD TPC in the WLCG environment presents additional challenges due to the diversity of the storage systems involved such as EOS, dCache, DPM and ECHO, requiring that we carefully navigate the unique constraints imposed by these storage systems and their site-specific environments through customized configuration and software development. To support multi-tenant setups seen at many WLCG sites, X509 based authentication and authorization in XRootD was significantly improved to meet both security and functionality requirements. This paper presents architecture of the pull based TPC with optional X509 credential delegation and how it is implemented in native XRootD and dCache. The paper discusses technical requirements, challenges, design choices and implementation details in the WLCG storage systems, as well as in FTS/gfal2. It also outlines XRootD’s plan to support newer TPC and security models such as token based authorization.
The anticipated increase in storage requirements for the forthcoming HL-LHC data rates is not matched by a corresponding increase in budget. This results in a short-fall in available resources if the computing models remain unchanged. Therefore, effort is being invested in looking for new and innovative ways to optimise the current infrastructure, so minimising the impact of this shortfall.
In this paper, we describe an R&D effort targeting "Quality of Service" (QoS), as a working group within the WLCG DOMA activity. The QoS approach aims to reduce the impact of the shortfalls, and involves developing a mechanism that both allows sites to reduce the cost of their storage hardware, with a corresponding increase in storage capacity, while also supporting innovative deployments with radically reduced cost or improved performance.
We describe the strategy this group is developing to support these innovations, along with the current status and plans for the future.
Optimization of computing resources, in particular storage, the costliest one, is a tremendous challenge for the High Luminosity LHC (HL-LHC) program. Several venues are being investigated to address the storage issues foreseen for HL-LHC. Our expectation is that savings can be achieved in two primary areas: optimization of the use of various storage types and reduction of the required manpower to operate the storage.
We will describe our work, done in the context of the WLCG DOMA project, to prototype, deploy and operate an at-scale research storage platform to better understand the opportunities and challenges for the HL-LHG era. Our multi-VO platform includes several storage technologies, from highly performant SSDs to low end disk storage and tape archives, all coordinated by the use of dCache. It is distributed over several major sites in the US (AGLT2, BNL, FNAL & MWT2) which are several tens of msec RTT apart with one extreme leg over the Atlantic in DESY to test extreme latencies. As a common definition of attributes for QoS characterizing storage systems in HEP has not yet been defined, we are using this research platform to experiment on several of them, e.g., number of copies, availability, reliability, throughput, iops and latency.
The platform provides a unique tool to explore the technical boundaries of the ‘data-lake’ concept and its potential savings in storage and operations costs.
We will conclude with a summary of our lessons learned and where we intend to go with our next steps.
HL-LHC will confront the WLCG community with enormous data storage, management and access challenges. These are as much technical as economical. In the WLCG-DOMA Access working group, members of the experiments and site managers have explored different models for data access and storage strategies to reduce cost and complexity, taking into account the boundary conditions given by our community.
Several of these scenarios have been studied quantitatively, such as the datalake model and incremental improvements of the current computing model with respect to resource needs, costs and operational complexity.
To better understand these models in depth, analysis of traces of current data accesses and simulations of the impact of new concepts have been carried out. In parallel, evaluations of the required technologies took place. These were done in testbed and production environments at small and large scale.
We will give an overview of the activities and results of the working group, describe the models and summarise the results of the technology evaluation focusing on the impact of storage consolidation in the form of datalakes, where the use of read-ahead caches (XCache) has emerged as a successful approach to reduce the impact of latency and bandwidth limitation.
We will describe the experience and evaluation of these approaches in different environments and usage scenarios. In addition we will present the results of the analysis and modelling efforts based on data access traces of experiments.
GNA is a high performance fitter, designed to handle large scale models with big number of parameters. Following the data flow paradigm the model in GNA is built as directed acyclic graph. Each node (transformation) of the graph represents a function, that operates on vectorized data. A library of transformations, implementing various functions, is precompiled. The graph itself is assembled at runtime in Python and may be modified without recompilation.
High performance is achieved via several ways. The computational graph is lazily evaluated. Output data of each node is cached and recalculated only in case it is required: when one of the parameters or inputs has been changed. Transformations, subgraphs or the complete graph may be executed on GPU with data transferred lazily between CPU and GPU.
The description of the framework as well as practical examples from Daya Bay and JUNO experiments will be presented.
Modern hardware is trending towards increasingly parallel and heterogeneous architectures. Contemporary machine processors are spread across multiple sockets, where each socket can access some system memory faster than the rest, creating non-uniform memory access (NUMA). Efficiently utilizing these NUMA machines is becoming increasingly important. This paper examines latest Intel Skylake and Xeon Phi NUMA node architectures, indicating possible performance problems for multi-threaded, data processing applications, due to the kernel thread migration (TM) mechanism, that I designed to optimize power consumption. We discuss NUMA aware CLARA workflow management system that defines proper level of vertical scaling and process affinity, associating CLARA worker threads with particular processor cores. By minimizing thread migration and context-switching cost among cores, we were able to improve the data locality and reduce the cache-coherency traffic among the cores, resulting in sizable performance improvements.
The hardware landscape used in HEP and NP is changing from homogeneous multi-core systems towards heterogeneous systems with many different computing units, each with their own characteristics. To achieve data processing maximum performance the main challenge is to place the right computing on the right hardware.
In this paper we discuss CLAS12 charge particle tracking workload partitioning that allowed us to utilize both CPU and GPU to improve the performance. The tracking application algorithm was decomposed into micro-services that are deployed on CPU and GPU processing units, where the best features of both are intelligently combined to achieve maximum performance. In this heterogeneous environment CLARA aims to match the requirements of each micro-service to the strength of a CPU or a GPU architecture. In addition, CLARA performs load balancing to minimize idle time for both processing units. However predefined execution of a micro-service on a CPU or a GPU may not be the most optimal solution due to the streaming data-quantum size and the data-quantum transfer latency between CPU and GPU. So, we trained the CLARA workflow orchestrator to dynamically assign micro-service execution to a CPU or a GPU, based on the benchmark results analyzed for a period of the real-time data-processing.
In particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads.
We present the luigi analysis workflow (law) Python package which is based on the open-source pipelining tool luigi, originally developed by Spotify. It establishes a generic design pattern for analyses of arbitrary scale and complexity, and shifts the focus from executing to defining the analysis logic. Law provides the building blocks to seamlessly integrate with interchangeable remote resources without, however, limiting itself to a specific choice of infrastructure. In particular, it introduces the paradigm of complete separation between analysis algorithms on the one hand, and run locations, storage locations, and software environments on the other hand.
To cope with the sophisticated demands of end-to-end HEP analyses, law supports job execution on WLCG infrastructure (ARC, gLite) as well as on local computing clusters (HTCondor, LSF), remote file access via most common protocols through the Grid File Access Library (GFAL2), and an environment sandboxing mechanism with support for Docker and Singularity containers. Moreover, the novel approach ultimately aims for analysis preservation out-of-the-box.
Law is developed open-source and entirely experiment independent. It is successfully employed in ttH cross section measurements and searches for di-Higgs boson production with the CMS experiment.
The ATLAS experiment has successfully integrated High-Performance Computing (HPC) resources in its production system. Unlike the current generation of HPC systems, and the LHC computing grid, the next generation of supercomputers is expected to be extremely heterogeneous in nature: different systems will have radically different architectures, and most of them will provide partitions optimized for different kinds of workloads. In this work we explore the applicability of concepts and tools realized in Ray (the high-performance distributed execution framework targeting large-scale machine learning applications) to ATLAS event throughput optimization on heterogeneous distributed resources, ranging from traditional grid clusters to Exascale computers.
We present a prototype of Raythena, a Ray-based implementation of the ATLAS Event Service (AES), a fine-grained event processing workflow aimed at improving the efficiency of ATLAS workflows on opportunistic resources, specifically HPCs. The AES is implemented as an event processing task farm that distributes packets of events to several worker processes running on multiple nodes. Each worker in the task farm runs an event-processing application (Athena) as a daemon. In Raythena we replaced the event task farm workers with stateful components of Ray called Actors, which process packets of events and return data processing results. In addition to stateful Actors, Raythena also utilizes stateless Tasks for merging intermediate outputs produced by the Actors. The whole system is orchestrated by Ray, which assigns work to Actors and Tasks in a distributed, possibly heterogeneous, environment.
The second thrust of this study is to use Raythena to schedule Gaudi Algorithms (the primary unit of work of ATLAS' Athena framework) across a set of heterogeneous nodes. For ease of testing, we have used the Gaudi execution flow simulator to run a production ATLAS reconstruction scenario consisting of 309 Algorithms, modeled by synthetic CPU burners constrained by the data dependencies, and run for the time duration of the original Algorithms. The Algorithms are wrapped in Ray Actors or Tasks, and communicate via the Ray Global Control Store. This approach allows the processing of a single event to be distributed across more than one node, a functionality currently not supported by the Athena framework. We will discuss Raythena features and performance as a scheduler for ATLAS workflows, comparing them to those offered by Athena.
For all its flexibility, the AES implementation is currently comprised of multiple separate layers that communicate through ad-hoc command-line and file-based interfaces. The goal of Raythena is to integrate these layers through a feature-rich, efficient application framework. Besides increasing usability and robustness, a vertically integrated scheduler will enable us to explore advanced concepts such as dynamically shaping of workflows to exploit currently available resources, particularly on heterogeneous systems.
High Energy Physics experiments face unique challenges when running their computation on High Performance Computing (HPC) resources. The LZ dark matter detection experiment has two data centers, one each in the US and UK, to perform computations. Its US data center uses the HPC resources at NERSC.
In this talk, I will describe the current computational workflow of the LZ experiment, detailing some of the challenges faced while making the transition from network distributed computing environments like PDSF to newer HPC resources like Cori, at NERSC.
The increase in luminosity by a factor of 100 for the HL-LHC with respect to Run 1 poses a big challenge from the data analysis point of view. It demands a comparable improvement in software and processing infrastructure. The use of GPU enhanced supercomputers will increase the amount of computer power and analysis languages will have to be adapted to integrate them. The particle physics community has traditionally developed their own tools to analyze the data, usually creating dedicated ROOT-based data formats. However, there have been several attempts to explore the inclusion of new tools into the experiments data analysis workflow considering data formats not necessarily based on ROOT. Concepts and techniques include declarative languages to specify hierarchical data selection and transformation, cluster systems to manage processing the data, Machine Learning integration at the most basic levels, statistical analysis techniques, etc. This talk will provide an overview of the current efforts in the field, including efforts in traditional programming languages like C++, Python, and Go, and efforts that have invented their own languages like Root Data Frame, CutLang, ADL, coffea, and functional declarative languages. There is a tremendous amount of activity in this field right now, and this talk will attempt to summarize the current state of the field.
With an increased dataset obtained during CERN LHC Run-2, the even larger forthcoming Run-3 data and more than an order of magnitude expected increase for HL-LHC, the ATLAS experiment is reaching the limits of the current data production model in terms of disk storage resources. The anticipated availability of an improved fast simulation will enable ATLAS to produce significantly larger Monte Carlo samples with the available CPU, which will then be limited by insufficient disk resources.
The ATLAS Analysis Model Study Group for Run-3 was setup at the end of Run-2. Its tasks have been to analyse the efficiency and suitability of the current analysis model and to propose significant improvements to it. The group has considered options allowing ATLAS to save, for the same data/MC sample, at least 30% disk space overall, and has given directions how significantly larger savings could be realised for the HL-LHC. Furthermore, recommendations have been suggested to harmonise the current stage of analysis across the collaboration. The group has now completed its work: key recommendations will be the new small sized analysis formats DAOD_PHYS and DAOD_PHYSLITE and the increased usage of a tape carousel mode in the centralized production of these formats. This talk will review the recommended ATLAS analysis model for Run-3 and its status of the implementation. It will also provide an outlook to the HL-LHC analysis.
For the last 5 years Accelogic pioneered and perfected a radically new theory of numerical computing codenamed “Compressive Computing”, which has an extremely profound impact on real-world computer science. At the core of this new theory is the discovery of one of its fundamental theorems which states that, under very general conditions, the vast majority (typically between 70% and 80%) of the bits used in modern large-scale numerical computations are absolutely irrelevant for the accuracy of the end result. This theory of Compressive Computing provides mechanisms able to identify (with high intelligence and surgical accuracy) the number of bits (i.e., the precision) that can be used to represent numbers without affecting the substance of the end results, as they are computed and vary in real time. The bottom line outcome would be to provide a state-of-the-art compression algorithm that surpasses those currently available in the ROOT framework, with the purpose of enabling substantial economic and operational gains (including speedup) for High Energy and Nuclear Physics data storage/analysis. In our initial studies, a factor of nearly x4 (3.9) compression was achieved with RHIC/STAR data where ROOT compression managed only x1.4.
In this contribution, we will present our concepts of “functionally lossless compression”, have a glance at examples and achievements in other communities, present the results and outcome of our current R&D as well as present a high-level view of our plan to move forward with a ROOT implementation that would deliver a basic solution readily integrated into HENP applications. As a collaboration of experimental scientists, private industry, and the ROOT Team, our aim is to capitalize on the substantial success delivered by the initial effort and produce a robust technology properly packaged as an open-source tool that could be used by virtually every experiment around the world as means for improving data management and accessibility.
ALICE Experiment is currently undergoing a major upgrade program, both in
terms of hardware and software, to prepare for the LHC Run 3. A new Software
Framework is being developed in collaboration with the FAIR experiments at GSI
to cope with the 100 fold increase in collected collisions.
We present our progress to adapt such a framework for the end user physics data
analysis. In particular, we will highlight the design and technology choices.
We will show how we adopt Apache Arrow as a platform for our in memory analysis
data layout. We will illustrate the benefits of this solution, such as:
efficient and parallel data processing, interoperability with a large number of
analysis tools and ecosystems, integration with the modern ROOT declarative
analysis framework RDataFrame.
Scikit-HEP is a community-driven and community-oriented project with the goal of providing an ecosystem for particle physics data analysis in Python. Scikit-HEP is a toolset of approximately twenty packages and a few “affiliated” packages. It expands the typical Python data analysis tools for particle physicists. Each package focuses on a particular topic, and interacts with other packages in the toolset, where appropriate. Most of the packages are easy to install in many environments; much work has been done this year to provide binary “wheels” on PyPI and conda-forge packages. The uproot family provides pure Python ROOT file access and has been a runaway success with over 15000 downloads per month. AwkwardArray provides a natural “Jagged” array structure. The iMinuit package exposes the MINUIT2 C++ package to Python. Histogramming is central in any analysis workflow and has received much attention, including new Python bindings for the performant C++14 Boost::Histogram library. The Particle and DecayLanguage packages were developed to deal with particles and decay chains. Other packages provide the ability to interface between Numpy and popular HEP tools such as Pythia and FastJet. The Scikit-HEP project has been gaining interest and momentum, by building a user and developer community engaging collaboration across experiments. Some of the packages are being used by other communities, including the astroparticle physics community. An overview of the overall project and toolset will be presented, as well as a vision for development and sustainability.
The COFFEA Framework provides a new approach to HEP analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language and commodity big data technologies such as Apache Spark and NoSQL databases. To achieve this suite of improvements across many use cases, COFFEA takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will present published results from analysis of CMS data using the COFFEA framework along with a discussion of metrics and the user experience of arriving at those results with columnar analysis.
The traditional HEP analysis model uses successive processing steps to reduce the initial dataset to a size that permits real-time analysis. This iterative approach requires significant CPU time and storage of large intermediate datasets and may take weeks or months to complete. Low-latency, query-based analysis strategies are being developed to enable real-time analysis of primary datasets by replacing conventional nested loops over objects with native operations on hierarchically nested, columnar data. Such queries are well-suited to distributed processing using a strategy called function as a service (FaaS).
In this presentation we introduce funcX---a high-performance FaaS platform that enables intuitive, flexible, efficient, and scalable remote function execution on existing infrastructure including clouds, clusters, and supercomputers. A funcX function explicitly defines a function body and dependencies required to execute the function. FuncX allows users, interacting via a REST API, to register and then execute such functions without regard for the physical resource location or scheduler architecture on which the function is executed---an approach we refer to as ``serverless supercomputing.'' We show how funcX can be used to parallelize a real-world HEP analysis operating on columnar data to aggregate histograms of analysis products of interest in real time. Subtasks representing partial histograms are dispatched as funcX requests with expected runtimes of less than a second. Finally, we demonstrate efficient execution of such analyses on heterogeneous resources, including leadership-class computing facilities.
Driven by the need to carefully plan and optimise the resources for the next data taking periods of Big Science projects, such as CERN’s Large Hadron Collider and others, sites started a common activity, the HEPiX Technology Watch Working Group, tasked with tracking the evolution of technologies and markets of concern to the data centres. The talk will give an overview of general and semiconductor markets, server markets, CPUs and accelerators, memories, storage and networks; it will highlight important areas of uncertainties and risks.
At the SDCC we are deploying a Jupyterhub infrastructure to enable
scientists from multiple disciplines to access our diverse compute and
storage resources. One major design goal was to avoid rolling out yet
another compute backend and leverage our pre-existing resources via our
batch systems (HTCondor and Slurm). Challenges faced include creating a
frontend that allows users to choose what HPC resources they have access
to as well as selecting containers or environments, delegating
authentication to a MFA-enabled proxy, and automating deployment of
multiple hub instances. We will show what we have done, and some
examples of how we have worked with various groups to get their analysis
working with Jupyter notebooks.
The WLCG has over 170 sites and the number is expected to grow in the coming years. In order to support WLCG workloads, each site has to deploy and maintain several middleware packages and grid services. Setting up, maintaining and supporting the grid infrastructure at a site can be a demanding activity and often requires significant assistance from WLCG experts. Modern configuration management (Puppet, Ansible, ...), container orchestration (Docker Swarm, Kubernetes, ...) and containerization technologies (Docker, ...) can effectively make such activities lightweight via packaging sensible configurations of grid services and providing simple mechanisms to distribute and deploy them across the infrastructure available at a site. This article describes the SIMPLE project: a Solution for Installation, Management and Provisioning of Lightweight Elements. The SIMPLE framework leverages modern infrastructure management tools to deploy containerized grid services, such as popular compute elements (HTCondor, ARC, ...), batch systems (HTCondor, Slurm, ...), worker nodes etc. It is built on the principles of software sustainability, modularity and scalability. The article also describes the framework’s architecture, extensibility and the special features that enable lightweight deployments at WLCG sites.
One of the most costly factors in providing a global computing infrastructure such as the WLCG is the human effort in deployment, integration, and operation of the distributed services supporting collaborative computing, data sharing and delivery, and analysis of extreme scale datasets. Furthermore, the time required to roll out global software updates, introduce new service components, or prototype novel systems requiring coordinated deployments across multiple facilities is often increased by communication latencies, staff availability, and in many cases expertise required for operations of bespoke services. While the WLCG computing grid (and distributed systems implemented throughout HEP) is a global service platform, it lacks the capability and flexibility of a modern platform-as-a-service including continuous integration/continuous delivery (CI/CD) methods, development-operations capabilities (DevOps, where developers assume a more direct role in the actual production infrastructure), and automation. Most importantly, tooling which reduces required training, bespoke service expertise, and the operational effort throughout the infrastructure, most notably at the resource endpoints ("sites"), is entirely absent in the current model. In this paper, we explore ideas and questions around potential "NoOps" models in this context: what is realistic given organizational policies and constraints? How should operational responsibility be organized across teams and facilities? What are the technical gaps? What are the social and cybersecurity challenges? Conversely what advantages does a NoOps model deliver for innovation and for accelerating the pace of delivery of new services needed for the HL-LHC era? We will describe initial work along these lines in the context of providing a data delivery network supporting IRIS-HEP DOMA R&D.
We describe the software tool-set being implemented in the contest of the NOTED [1] project to better exploit WAN bandwidth for Rucio and FTS data transfers, how it has been developed and the results obtained.
The first component is a generic data-transfer broker that interfaces with Rucio and FTS. It identifies data transfers for which network reconfiguration is both possible and beneficial, translates the Rucio and FTS information into parameters that can be used by network controllers and makes these available via a public interface.
The second component is a network controller that, based on the parameters provided by the transfer broker, decides which actions to apply to improve the path for a given transfer.
Unlike the transfer-broker, the network controller described here is tailored to the CERN network as it has to choose the appropriate action given the network configuration and protocols used at CERN. However, this network controller can easily be used as a model for site-specific implementations elsewhere.
The paper describes the design and the implementation of the two tools, the tests performed and the results obtained. It also analyses how the tool-set could be used for WLCG in the contest of the DOMA [2] activity.
[1] Network Optimisation for Transport of Experimental Data - CERN project
[2] Data Organisation, Management and Access - WLCG activity
The International Particle Physics Outreach Group (IPPOG) is a network of scientists, science educators and communication specialists working across the globe in informal science education and outreach for particle physics. The primary methodology adopted by IPPOG requires the direct involvement of scientists active in current research with education and communication specialists, in order to effectively develop and share best practices in outreach. IPPOG member activities include the International Particle Physics Masterclass programme, International Day of Women and Girls in Science, Worldwide Data Day, International Muon Week and International Cosmic Day organisation, and participation in activities ranging from public talks, festivals, exhibitions, teacher training, student competitions, and open days at local institutions. These independent activities, often carried out in a variety of languages to public with a variety of backgrounds, all serve to gain the public trust and to improve worldwide understanding and support of science. We present our vision of IPPOG as a strategic pillar of particle physics, fundamental research and evidence-based decision-making around the world.
I describe a novel interactive virtual reality visualization of the Belle II detector at KEK and the animation therein of GEANT4-simulated event histories. Belle2VR runs on Oculus and Vive headsets (as well as in a web browser and on 2D computer screens, in the absence of a headset). A user with some particle-physics knowledge manipulates a gamepad or hand controller(s) to interact with and interrogate the detailed GEANT4 event history over time, to adjust the visibility and transparency of the detector subsystems, to translate freely in 3D, to zoom in or out, and to control the event-history timeline (scrub forward or backward, speed up or slow down). A non-expert uses the app - during public outreach events, for example - to explore the world of subatomic physics via electron-positron collision events in the Belle II experiment at the SuperKEKB colliding-beam facility at KEK in Japan. Multiple simultaneous users, wearing untethered locomotive VR backpacks and headsets, walk about a room containing the virtual model of the Belle II detector and each others' avatars as they observe and control the simulated event history. Developed at Virginia Tech by an interdisciplinary team of researchers in physics, education, and virtual environments, the simulation is intended to be integrated into the undergraduate physics curriculum. I describe the app, including visualization features and design decisions, and illustrate how a user interacts with its features to expose the underlying physics in each electron-positron collision event.
We present an interactive game for up to seven players that demonstrates the challenges of on-line event selection at the Compact Muon Solenoid (CMS) experiment to the public. The game - in the shape of a popular classic pinball machine - was conceived and prototyped by an interdisciplinary team of graphic designers, physicists and engineers at the CMS Create hackathon in 2016. Having won the competition, the prototype was turned into a fully working machine that is now exhibited on the CMS visitor's path. Teams of 2-7 visitors can compete with one another to collect as many interesting events as possible within a simulated LHC fill. In a fun and engaging way, the game conveys concepts such as multi-level triggering, pipelined processing, event building, the importance of purity in event selection and more subtle details such as dead time. The multi-player character of the game corresponds to the distributed nature of the actual trigger and data acquisition system of the experiment. We present the concept of the game, its design and its technical implementation centred around an Arduino micro-controller controlling 700 RGB LEDs and a sound subsystem running on a Mac mini.
The rapid economic growth is building new trends in careers. Almost every domain, including high-energy physics, needs people with strong capabilities in programming. In this evolving environment, it is highly desirable that young people are equipped with computational thinking (CT) skills, such as problem-solving and logical thinking, as well as the ability to develop software applications and write code. These are crucial elements of Science, Technology, Engineering, and Mathematics education (STEM).
This talk will present an outcome from a Proof of Concept study of educational online activity. The project consists of building a first step of an interactive coding tutorial that will aim to introduce young people to computer science and particle physics principles in a fun and engaging way. Successful realization of this online educational asset will equip educators with a new tool to introduce STEM education and digital literacy in the classrooms, to eventually inspire young people to acquire necessary skills to be ready for a digital economic growth and future jobs.
Fluidic Data is a floor-to-ceiling installation spanning the four levels of the CERN Data Centre stairwell. It utilizes the interplay of water and light to visualize the magnitude and flow of information coming from the four major LHC experiments. The installation consists of an array of transparent hoses that house colored fluid, symbolizing the data of each experiment, surrounded by a collection of diffractive "pods" representing the particles pivotal to each experiment. The organic fusion of art and science engenders a meditative environment, allowing the visitor time for reflection and curiosity.
The Fluidic Data installation is a cross department collaboration that incorporates materials and techniques used in the construction of the LHC and its experiments. The project brings together artists, engineers, science communicators and physicists with a common goal of communicating CERN's research and resources. The success of this collaboration exemplifies the effectiveness of working in diverse teams, both intellectually and culturally, to accomplish unique projects.
Public Engagement (PE) with science should be more than “fun” for the staff involved. PE should be a strategic aim of any publically funded science organisation to ensure the public develops an understanding and appreciation of their work, its benefits to everyday life and to ensure the next generation is enthused to take up STEM careers. Most scientific organisations do have aims to do this, but very few have significant budgets to deliver this. In a landscape of ever tightening budgets, how can we develop a sustainable culture of PE within these organisations?
UKRI/STFC’s Scientific Computing Department present how we have worked to embed a culture of PE with the department by developing our early career staff members; highlighting the impact PE makes at the departmental and project level; and linking PE to our competency framework.
We will also discuss how our departmental work interacts with and complements STFC’s organisational-wide PE effort, such as making use of a shared evaluation framework that allows us to evaluate our public engagement activities against their goals and make strategic decisions about the programmes future direction.
MPI-learn and MPI-opt are libraries to perform large-scale training and hyper-parameter optimization for deep neural networks. The two libraries, based on Message Passing Interface, allows to perform these tasks on GPU clusters, through different kinds of parallelism. The main characteristic of these libraries is their flexibility: the user has complete freedom in building her own model, thanks to the multi-backend support. In addition, the library supports several cluster architectures, allowing a deployment on multiple platforms. This generality can make this the basis for a train & optimise service for the HEP community. We present scalability results obtained from two typical HEP use-case: jet identification from raw data and shower generation from a GAN model. Results on GPU clusters were obtained at the ORNL TITAN supercomputer ad other HPC facilities, as well as exploiting commercial cloud resources and OpenStack. A comprehensive comparisons of scalability performance across platforms will be presented, together with a detailed description of the libraries and their functionalities.
CERN IT department has been maintaining different HPC facilities over the past five years, one in Windows and the other one on Linux as the bulk of computing facilities at CERN are running under Linux. The Windows cluster has been dedicated to engineering simulations and analysis problems. This cluster is a High Performance Computing (HPC) cluster thanks to powerful hardware and low-latency interconnects. The Linux cluster resources are accessible through HTCondor, and are used for general purpose parallel but single-node type jobs, providing computing power to the CERN experiments and departments for tasks such as physics event reconstruction, data analysis and simulation. For HPC workloads that require multi-node parallel environments for MPI programs, there is a dedicated HPC service with MPI clusters running under the SLURM batch system and dedicated hardware with fast interconnects.
In the past year, it was decided to consolidate compute intensive jobs in Linux to make a better use of the existing resources. Moreover, this was also in line with CERN IT strategy to reduce its dependencies on Microsoft products. This paper describes the migration of Ansys, COMSOL and CST users who were running on Windows HPC to Linux clusters. Ansys, COMSOL and CST are three engineering applications used at CERN on different domains, like multiphysics simulations or electromagnetic field problems. Users of these applications are sitting in different departments, with different needs and levels of expertise. In most cases the users have no prior knowledge of Linux. The paper will present the technical strategy to allow the engineering users to submit their simulations to the appropriate Linux cluster, depending on their HW needs. It will also describe the technical solution to integrate their Windows installations to submit to Linux clusters. Finally, the challenges and lessons learnt during the migration will be also discussed.
The upcoming generation of exascale HPC machines will all have most of their computing power provided by GPGPU accelerators. In order to be able to take advantage of this class of machines for HEP Monte Carlo simulations, we started to develop a Geant pilot application as a collaboration between HEP and the Exascale Computing Project. We will use this pilot to study and characterize how the machines’ architecture affects performance. The pilot will encapsulate the minimum set of physics and software framework processes necessary to describe a representative HEP simulation problem. The pilot will then be used to exercise communication, computation, and data access patterns. The project’s main objective is to identify re-engineering opportunities that will increase event throughput by improving single node performance and being able to make efficient use of the next generation of accelerators available in Exascale facilities.
Covariance matrices are used for a wide range of applications in particle ohysics, including Kalman filter for tracking purposes, as well as for Primary Component Analysis and other dimensionality reduction techniques. The covariance matrix contains covariance and variance measures between all permutations of data dimensions, leading to high computational cost.
By using a novel decomposition of the covariance matrix and exploiting parallelism on FPGA as well as separability of subtasks to CPU and FPGA, a linear increase of computation time for 156 number of integer dimensions and a constant computation time for 16 integer dimensions is achieved for exact covariance matrix calculation on a hybrid FPGA-CPU system, the Intel HARP 2. This leads up to 100 times faster results than the FPGA baseline and 10 times faster computation time compared to standard CPU covariance matrix calculation.
Detailed simulation is one of the most expensive tasks, in terms of time and computing resources for High Energy Physics experiments. The need for simulated events will dramatically increase for the next generation experiments, like the ones that will run at the High Luminosity LHC. The computing model must evolve and in this context, alternative fast simulation solutions are being studied. 3DGAN represent a successful example across the several R&D activities focusing on the use of deep generative models to particle detector simulation: physics results in terms of agreement to standard Monte Carlo techniques are already very promising. Optimisation of the computing resources needed to train these models, and consequently to deploy them efficiently during the inference phase will be essential to exploit the added-value of their full capabilities.
In this context, CERN openlab has a collaboration with the researchers at SHREC at the University of Florida and with Intel to accelerate the 3DGAN inferencing stage using FPGAs. This contribution will describe the efforts ongoing at the University of Florida to develop an efficient heterogeneous computing (HGC) framework, CPUs integrated with accelerators such as GPUs and FPGAs, in order to accelerate Deep Learning. The HGC framework uses Intel distribution of OpenVINO, running on an Intel Programmable Acceleration Card (PAC) equipped with an Arria 10 GX FPGA.
Integration of the 3DGAN use case in the HGC framework has required development and optimisation of new FPGA primitives using the Intel Deep Learning Acceleration (DLA) development suite.
A number of details of this work and preliminary results will be presented, specifically in terms of speedup, stimulating a discussion for future development.
The CMS experiment will be upgraded for operation at the High-Luminosity LHC to maintain and extend its optimal physics performance under extreme pileup conditions. Upgrades will include an entirely new tracking system, supplemented by a track trigger processor capable of providing tracks at Level-1, as well as a high-granularity calorimeter in the endcap region. New front-end and back-end electronics will also provide the level-1 trigger with high-resolution information from the barrel calorimeter and the muon systems. The upgraded Level-1 processors, based on powerful FPGAs, will be able to carry out sophisticated feature searches with resolutions often similar to the offline ones, while keeping pileup effects under control. In this paper, we discuss the feasibility of a system capturing Level-1 intermediate data at the beam-crossing rate of 40 MHz and carrying out online analyses based on these limited-resolution data. This 40 MHz scouting system would provide fast and virtually unlimited statistics for detector diagnostics, alternative luminosity measurements and, in some cases, calibrations, and it has the potential to enable the study of otherwise inaccessible signatures, either too common to fit in the L1 accept budget, or with requirements which are orthogonal to “mainstream” physics, such as long-lived particles. We discuss the requirements and possible architecture of a Phase-2 40 MHz scouting system, as well as some of the physics potential, and results from a demonstrator operated at the end of Run-2 using the Global Muon Trigger data from CMS. Plans for further demonstrators envisaged for Run 3 are also discussed.
Within the FAIR Phase-0 program the fast algorithms of the FLES (First-Level Event Selection) package developed for the CBM experiment (FAIR/GSI, Germany) are adapted for online and offline processing in the STAR experiment (BNL, USA). Using the same algorithms creates a bridge between online and offline. This makes it possible to combine online and offline resources for data processing.
Thus, on the basis of the STAR HLT farm an express data production chain was created, which extends the functionality of HLT in real time, up to the analysis of physics. The same express data production chain can be used on the RCF farm, which is used for fast offline production with the similar tasks as in the extended HLT. The chain of express analysis does not interfere with the chain of standard analysis.
An important advantage of express analysis is that it allows to start calibration, production and analysis of the data as soon as they are received. Therefore, use of the express analysis can be beneficial for BES-II data production and help accelerate science discovery by helping to obtain results within a year after the end of data acquisition.
The specific features of express data production are given, as well as the result of online QA plots such as the real-time reconstruction of secondary decays in a BES-II environment.
With the unprecedented high luminosity delivered by the LHC, detector readout and data storage limitations severely limit searches for processes with high-rate backgrounds. An example of such searches is those for mediators of the interactions between the Standard Model and dark matter, decaying to hadronic jets. Traditional signatures and data taking techniques limit these searches to masses above the TeV. In order to extend the search range to lower masses on the order of 100 GeV and probe weaker couplings, the ATLAS experiment employs a range of novel trigger and analysis strategies. One of these is the trigger-level analysis (TLA), which records only trigger-level jet objects instead of the full detector information. This strategy of using only partial event information permits the use of lower jet trigger thresholds and increased recording rates with minimal impact on the total output bandwidth. We discuss the implementation of this stream and its planned updates for Run 3 and outline its technical challenges. We also present the results of an analysis using this technique, highlighting the competitiveness and complementarity with traditional data streams.
The Australian Square Kilometre Array Pathfinder (ASKAP) is a
new generation 36-antenna 36-beam interferometer capable of producing
about 2.5 Gb/s of raw data. The data are streamed from the observatory
directly to the dedicated small cluster at the Pawsey HPC centre. The ingest
pipeline is a distributed real time software which runs on this cluster
and prepares the data for further (offline) processing by imaging and
calibration pipelines. In addition to its main functionality, it turned out
to be a valuable tool for various commissioning experiments and allowed us
to run an interim system and achieve the first scientific results much earlier.
I will review the architecture of the ingest pipeline, its role in the
overall ASKAP's design as well as the lessons learned by developing a hard
real-time application in the HPC environment.
The transverse feedback system in LHC provides turn-by-turn, bunch-by-bunch measurements of the beam transverse position with a submicrometer resolution from 16 pickups. This results in a 16 high-bandwidth data-streams (1Gbit/s each), which are sent through a digital signal processing chain to calculate the correction kicks which are then applied to the beam. These data-streams contain valuable information about beam parameters and stability. A system that can extract and analyze these parameters and make them available for the users is extremely valuable for the accelerators physicists, machine operators, or engineers working with LHC. This paper introduces the next generation transverse observation system, which was designed specifically to allow demanding low-latency (few turns) beam parameter analysis such as passive tune extraction or transverse instability detection, while at the same time provide users around CERN with the raw data-streams in form of buffers. A new acquisition card and driver was developed that achieves a latency less than 100$\mu$s from the position being measured by the pickup to data being available for processing on the host. This data is then processed by a multitude of applications that are executed in a real-time environment that was fine-tuned for the driver and the applications. To handle the high throughput required by the analysis applications without saturating the computing resources, a combination of parallel programming techniques are used in combination with GPGPU computing.
Development of the second generation JANA2 multi-threaded event processing framework is ongoing through an LDRD initiative grant at Jefferson Lab. The framework is designed to take full advantage of all cores on modern many-core compute nodes. JANA2 efficiently handles both traditional hardware triggered event data and streaming data in online triggerless environments. Development is being done in conjunction with the Electron Ion Collider development. Anticipated to be the next large scale Nuclear Physics facility constructed. The core framework is written in modern C++ but includes an integrated Python interface. The status of development and summary of the more interesting features will be presented.
The increase in luminosity foreseen in the future years of operation of the Large Hadron Collider (LHC) creates new challenges in computing efficiency for all participating experiment. To cope with these challenges and in preparation for the third running period of the LHC, the LHCb collaboration currently overhauls its software framework to better utilise modern computing architectures. This effort includes the LHCb simulation framework (Gauss).
In this talk, we present Gaussino, an LHCb-independent simulation framework which forms the basis for LHCb's future simulation framework which incorporates the reimplemented or modernised core features of Gauss. It is built on Gaudi's functional framework making use of multiple threads. Event generation is interfaced to external generators with an example implementation of a multi-threaded Pythia8 interface being included. The detector simulation is handled by the multithreaded version of Geant4 with an interface allowing for the parallel execution of multiple events at the same time as well as for parallelism within a single event. Additionally, we present the integration of DD4hep geometry description into Gaussino to handle the detector geometry and conversion.
Software improvements in the ATLAS Geant4-based simulation are critical to keep up with the evolving hardware and increasing luminosity. Geant4 simulation currently accounts for about 50% of CPU consumption in ATLAS and it is expected to remain the leading CPU load during Run 4 (HL-LHC upgrade) with an approximately 25% share in the most optimistic computing model. The ATLAS experiment recently developed two algorithms for optimizing Geant4 performance: Neutron Russian Roulette (NRR) and range cuts for electromagnetic processes. The NRR randomly terminates a fraction of low energy neutrons in the simulation and weights energy deposits of the remaining neutrons to maintain physics performance. Low energy neutrons typically undergo many interactions with the detector material and their path becomes uncorrelated with the point of origin. Therefore, the response of neutrons can be efficiently estimated only with a subset of neutrons. Range cuts for electromagnetic processes exploit a built-in feature of Geant4 and terminate low energy electrons that originate from physics processes including conversions, the photoelectric effect, and Compton scattering. Both algorithms were tuned to maintain physics performance in ATLAS and together they bring about a 20% speedup of the ATLAS Geant4 simulation. Additional ideas for improvements currently under investigation will be also be discussed in the talk. Lastly, this talk presents how the ATLAS experiment utilizes software packages such as Intel's VTune to identify and resolve hot-spots in simulation.
HEP experiments simulate the detector response by accessing all needed data and services within their own software frameworks. However, decoupling the simulation process from the experiment infrastructure can be useful for a number of tasks, amongst them the debugging of new features, or the validation of multithreaded vs sequential simulation code and the optimization of algorithms for HPCs. The relevant features and data must be extracted from the framework to produce a standalone simulation application.
As an example, the simulation of the detector response of the ATLAS experiment at the LHC is based on the Geant4 toolkit and is fully integrated in the experiment's framework "Athena". Recent developments opened the possibility of accessing a full persistent copy of the ATLAS geometry outside of the Athena framework. This is a prerequisite for running ATLAS Geant4 simulation standalone. In this talk we present the status of development of FullSimLight, a full simulation prototype that is being developed with the goal of running ATLAS standalone Geant4 simulation with the actual ATLAS geometry.
The purpose of FullSimLight is to simplify studies of Geant4 tracking and physics processes, including on novel architectures. We will also address the challenges related to the complexity of ATLAS's geometry implementation, which precludes persistifying a complete detector description in a way that can be automatically read by standalone Geant4. This lightweight prototype is meant to ease debugging operations on the Geant4 side and to allow early testing of new Geant4 releases. It will also ease optimization studies and R&D activities related to HPC development: i.e. the possibility to offload partially/totally the simulation to GPUs/Accelerators without having to port the whole experimental infrastructure.
The Heavy Photon Search (HPS) is an experiment at the Thomas Jefferson National Accelerator Facility designed to search for a hidden sector photon (A’) in fixed-target electro-production. It uses a silicon micro-strip tracking and vertexing detector inside a dipole magnet to measure charged particle trajectories and a fast lead-tungstate crystal calorimeter just downstream of the magnet to provide a trigger and to identify electromagnetic showers. The HPS experiment uses both invariant mass and secondary vertex signatures to search for the A’. The overall design of the detector follows from the kinematics of A’ production which typically results in a final state particle within a few degrees of the incoming beam. The occupancies of sensors near the beam plane are high, so high-rate detectors, a fast trigger, and excellent time tagging are required to minimize their impact and detailed simulations of backgrounds are crucial to the success of the experiment. The detector is fully simulated using the flexible and performant Geant4-based program "slic" using the xml-based "lcdd" detector description (described in previous CHEP conferences). Simulation of the readout and the event reconstruction itself are performed with the Java-based software package "hps-java." The simulation of the detector readout includes full charge deposition, drift and diffusion in the silicon wafers, followed by a detailed simulation of the readout chip and associated electronics. Full accounting of the occupancies and trigger was performed by overlaying simulated beam backgrounds. HPS has successfully completed two engineering runs and will complete its first physics run in the summer of 2019. Event reconstruction involving track, cluster and vertex finding and fitting for both simulated and real data will be described. We will begin with an overview of the physics goals of the experiment followed by a short description of the detector design. We will then describe the software tools used to design the detector layout and simulate the expected detector performance. Finally, the event reconstruction chain will be presented and preliminary comparisons of the expected and measured detector performance will be presented.
The large volume of data expected to be produced by the Belle II experiment presents the opportunity for for studies of rare, previously inaccessible processes. To investigate such rare processes in a high data volume environment necessitates a correspondingly high volume of Monte Carlo simulations to prepare analyses and gain a deep understanding of the contributing physics processes to each individual study. This resulting challenge, in terms of computing resource requirements, calls for more intelligent methods of simulation, in particular for background processes with very high rejection rates. This work presents a method of predicting in the early stages of the simulation process the likelihood of relevancy of an individual event to the target study using convolutional neural networks. The results show a robust training that is integrated natively into the existing Belle II analysis software framework, with steps taken to mitigate systematic biases induced by the early selection procedure.
Estimations of the CPU resources that will be needed to produce simulated data for the future runs of the ATLAS experiment at the LHC indicate a compelling need to speed-up the process to reduce the computational time required. While different fast simulation projects are ongoing (FastCaloSim, FastChain, etc.), full Geant4 based simulation will still be heavily used and is expected to consume the biggest portion of the total estimated processing time. In order to run effectively on modern architectures and profit from multi-core designs, a migration of the Athena framework to a multi-threading processing model has been performed in the last years. A multi-threaded simulation based on AthenaMT and Geant4MT enables substantial decreases in the memory footprint of jobs, largely from shared geometry and cross-sections tables. This approach scales better with respect to the multi-processing approach (AthenaMP) especially on the architectures that are foreseen to be used in the next LHC runs. In this paper we will report about the status of the multithreaded simulation in ATLAS, focusing on the different challenges of its validation process. We will demonstrate the different tools and strategies that have been used for debugging multi-threaded runs versus the corresponding sequential ones, in order to have a fully reproducible and consistent simulation result.
In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastructures. For instance, every year thousands of tickets are submitted to ATLAS and CMS issue tracking systems, hence further processed by the experiment operators. On the other hand, logging information from computing services and systems is being archived on ElasticSearch, Hadoop, and NoSQL data stores. Such a wealth of information can be exploited to increase the level of automation in computing operations by using adequate techniques, such as machine learning (ML), tailored to solve specific problems. ML models applied to the prediction of intelligent data placements and access patterns can help to increase the efficiency of resource exploitation and the overall throughput of the experiments distributed computing infrastructures. Time-series analyses may allow for the estimation of the time needed to complete certain tasks, such as processing a certain number of events or transferring a certain amount of data. Anomaly detection techniques can be employed to predict system failures, leading for example to network congestion. Recording and analyzing shifter actions can be used to automate tasks such as submitting tickets to support centers, or to suggest possible solutions to repeating issues. The Operational Intelligence project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations. We discuss how state-of-the-art technologies can be used to build general solutions to common problems and to reduce the operational cost of the experiment computing infrastructure.
The CMS computing infrastructure is composed by several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and logging information was scattered in several sources and typically accessible only by experts. In the last year CMS computing fostered the adoption of common big data solutions based on open-source, scalable, and no-SQL tools, such as Hadoop, InfluxDB, and ElasticSearch, available through the CERN IT infrastructure. Such system allows for the easy deployment of monitoring and accounting applications using visualisation tools such as Kibana and Graphana. Alarms can be raised when anomalous conditions in the monitoring data are met, and the relevant teams are automatically notified. Data sources from different subsystems are used to build complex workflows and predictive analytics (data popularity, smart caching, transfer latency, …), and for performance studies. We describe the full software architecture and data flow, the CMS computing data sources and monitoring applications, and show how the stored data can be used to gain insights into the various subsystems by exploiting scalable solutions based on Spark.
For the last 10 years, the ATLAS Distributed Computing project has based its monitoring infrastructure on a set of custom designed dashboards provided by CERN-IT. This system functioned very well for LHC Runs 1 and 2, but its maintenance has progressively become more difficult and the conditions for Run 3, starting in 2021, will be even more demanding; hence a more standard code base and more automatic operations are needed. A new infrastructure has been provided by the CERN-IT Monit group, based on InfluxDB as the data store and Grafana as the display environment. ATLAS has adapted and further developed its monitoring tools to use this infrastructure for data and workflow management monitoring and accounting dashboards, expanding the range of previous possibilities with the aim of achieving a single, simpler, environment for all monitoring applications. This presentation will describe the tools used, the data flows for monitoring and accounting, the problems encountered and the solutions found.
The central Monte-Carlo production of the CMS experiment utilizes the WLCG infrastructure and manages daily thousands of tasks, each up to thousands of jobs. The distributed computing system is bound to sustain a certain rate of failures of various types, which are currently handled by computing operators a posteriori. Within the context of computing operations, and operation intelligence, we propose a machine learning technique to learn from the operators with a view to reduce the operational workload and delays. This work is in continuation of CMS work on operation intelligence to try and reach accurate predictions with machine learning. We present an approach to consider the log files of the workflows as regular text to leverage modern techniques from natural language processing (NLP). In general, log files contain a substantial amount of text that is not human language. Therefore, different log parsing approaches are studied in order to map the log files' words to high dimensional vectors. These vectors are then exploited as feature space to train a model that predicts the action that the operator has to take. This approach has the advantage that the information of the log files is extracted automatically and the format of the logs can be arbitrary. In this work the performance of the log file analysis with NLP is presented and compared to previous approaches.
Relational database (RDB) and its management system (RDBMS) offer many advantages to us, such as a rich query language, maintainability gained from a concrete schema, robust and reasonable backup solutions such as differential backup, and so on. Recently, some of RDBMS has supported column-store features that offer data compression with a high level of both data size and query performance. These features are useful for data collection and management. However, it is not easy to leverage such features. First of all, RDBMS gains a reasonable performance only after a proper description of the data schema, which requires expertise.
In this talk, we propose an easy-to-use data schema management scheme for RDBMS that includes the utilization of the column-store features. Our approach mainly focuses on time-series data. First of all, our approach supports appropriate schema generation to leverage an RDBMS that includes automatic creation of sub-tables and indexes. This is good preparation for leveraging column-store features in RDBMS.
Along with the proposal, we implemented a prototype system on PostgreSQL-based RDBMS. Our preliminary experiments show a good performance over other ordinary approaches.
In this work we review existing monitoring outputs and recommend some novel alternative approaches to improve the comprehension of large volumes of operations data that are produced in distributed computing. Current monitoring output is dominated by the pervasive use of time-series histograms showing the evolution of various metrics. These can quickly overwhelm or confuse the viewer due to the large number of similar looking plots. We propose a supplementary approach through the sonification of real-time data streamed directly from a variety of distributed computing services. The real-time nature of this method allows operations staff to quickly detect problems and identify that a problem is still ongoing, avoiding the case of investigating an issue a-priori when it may already have been resolved. In this paper we present details of the system architecture and provide a recipe for deployment suitable for both site and experiment teams.
The envisaged Storage and Compute needs for the HL-LHC will be a factor up to 10 above what can be achieved by the evolution of current technology within a flat budget. The WLCG community is studying possible technical solutions to evolve the current computing in order to cope with the requirements; one of the main focuses is resource optimization, with the ultimate objective of improving performance and efficiency as well as simplifying and reducing operation costs. As of today the storage consolidation based on a Data Lake model is considered a good candidate for addressing HL-LHC data access challenges, allowing global redundancy instead of local redundancy, dynamic adaptation of QoS, intelligent data deployment based on cost driven metrics. A Data Lake model under evaluation can be seen as a logical entity which hosts a distributed working set of analysis data. Compute power can be close to the lake, but also remote and thus completely external. In this context we expect Data caching to play a central role as a technical solution to reduce the impact of latency and reduce network load. A geographically distributed caching layer will be functional to many satellite computing centers might appear and disappear dynamically. In this talk we propose to develop a flexible and automated AI environment for smart management of the content of clustered cache systems, to optimize hardware for the service and operations for maintenance. In this talk we demonstrate a AI-based smart caching system, and discuss the implementation of training and inference facilities along with the XCache integration with the smart decision service. Finally, we evaluate the effect on smart-caches and data placement, and compare data placement algorithm with and without ML model.
Computing needs projections for the HL-LHC era (2026+), following the current computing models, indicate that much larger resource increases would be required than those that technology evolution at a constant budget could bring. Since worldwide budget for computing is not expected to increase, many research activities have emerged to improve the performance of the LHC processing software applications, as well as to propose more efficient deployment scenarios and techniques which might alleviate the increase of expected resources for the HL-LHC. The massively increasing amounts of data to be processed leads to enormous challenges for HEP storage systems, networks and the data distribution to end-users. This is particularly important in scenarios in which the LHC data would be distributed from sufficiently small numbers of centers holding the experiment’s data. Enabling data locality via local caches on sites seems a very promising approach to hide transfer latencies while reducing the deployed storage space and number of replicas elsewhere. However, this highly depends on the workflow I/O characteristics and available network across sites. A crucial assessment is to study how the experiments are accessing and using the storage services deployed in sites in WLCG, to properly evaluate and simulate the benefits for several of the new emerging proposals within WLCG/HSF. In order to evaluate access and usage of storage, this contribution shows data access and popularity studies for the CMS Workflows executed in the Spanish Tier-1 (PIC) and Tier-2 (CIEMAT) sites supporting CMS activities, based on local and experiment monitoring data spanning more than one year. Simulations of data caches for end-user analysis data, as well as potential areas for storage savings will be reviewed.
The University of California system has excellent networking between all of its campuses as well as a number of other Universities in CA, including Caltech, most of them being connected at 100 Gbps. UCSD and Caltech have thus joined their disk systems into a single logical xcache system, with worker nodes from both sites accessing data from disks at either site. This setup has been in place for a couple years now and has shown to work very well. Coherently managing nodes at multiple physical locations has however not been trivial, and we have been looking for ways to improve operations. With the Pacific Research Platform (PRP) now providing a Kubernetes resource pool spanning resources in the science DMZs of all the UC campuses, we have recently migrated the xcache services from being hosted bare-metal into containers. This talk presents our experience in both migrating to and operating in the new environment.
A general problem faced by computing on the grid for opportunistic users is that while delivering opportunistic cycles is simpler compared to delivering opportunistic storage. In this project we show how we integrated Xrootd caches places on the internet backbone to simulate a content delivery network for general science workflows. We will show that for some workflows on LIGO, DUNE, and general gravitational waves data reuse increase cpu efficiency while decreasing network bandwidth reuse.
With the increase of storage needs at the HL-LHC horizon, the data management and access will be very challenging for this critical service. The evaluation of possible solutions within the DOMA, DOMA-FR (IN2P3 project contribution to DOMA) and ESCAPE initiatives is a major activity to select the most optimal ones from the experiment and site point of views. The LAPP and LPSC teams have put their expertise and computing infrastructures in common to build the FR-ALPES federation and setup a DPM federated storage. Based on their experience of their Tier2 WLCG site management, their implication in the ATLAS Grid infrastructure and thanks to the flexibility of ATLAS and Rucio tools, the integration of this federation into the ATLAS grid infrastructure has been straightforward. In addition, the integrated DPM caching mechanism including volatile pools is also implemented. This infrastructure is foreseen to be a test bed for a DPM component within a DataLake. This presentation will describe the test bed (infrastructures separated by few ms in Round Trip Time unit) and its integration into the ATLAS framework. The impact on the sites and ATLAS operations of both the test bed implementation and its use will also be shown, as well as the measured performances on data access speed and reliability.
Data movement between sites, replication and storage are very expensive operations, in terms of time and resources, for the LHC collaborations, and are expected to be even more so in the future. In this work we derived usage patterns based on traces and logs from the data and workflow management systems of CMS and ATLAS, and simulated the impact of different caching and data lifecycle management approaches. Data corresponding to one year of operation and covering all Grid sites have been the basis for the analysis. For selected sites, this data has been augmented by access logs from the local storage system to also include data accesses not managed via the standard experiments workflow management systems. We present the results of the studies, the tools developed and the experiences with the data analysis frameworks used, and assess the validity of both current and alternative approaches to data management from a cost perspective.
High-Energy Physics has evolved a rich set of software packages that need to work harmoniously to carry out the key software tasks needed by experiments. The problem of consistently building and deploying these software packages as a coherent software stack is one that is shared across the HEP community. To that end the HEP Software Foundation Packaging Working Group has worked to identify common solutions that can be used across experiments, with an emphasis on consistent, reproducible builds and easy deployment into CVMFS or containers via CI systems. We based our approach on well identified use cases and requirements from many experiments. In this paper we summarise the work of the group in the last year and how we have explored various approaches based on package managers from industry and the scientific computing community.
We give details about a solution based on the Spack package manager which has been used to build the software required by the SuperNEMO and FCC experiments. We shall discuss changes that needed to be made to Spack to satisfy all our requirements. A layered approach to packaging with Spack, that allows build artefacts to be shared between different experiments, is described. We show how support for a build environment for software developers is provided.
In big physics experiments, as simulation, reconstruction and analysis become more sophisticated, scientific reproducibility is not a trivial task. Software is one of the biggest challenges. Modularity is a common sense of software engineering to facilitate quality and reusability of code. However, that often introduces nested dependencies not obvious for physicists to work with. Package manager is the widely practised solution to organize dependencies systematically.
Portage from Gentoo Linux is both robust and flexible, and is highly regarded by the free operating system community. In form of Gentoo Prefix, portage can be deployed by a normal user into a directory prefix, on a workstation, cloud or supercomputing node. Software is described by its build recipes along with dependency relations. Real world use cases of Gentoo Prefix in neutrino and dark matter experiments will be demonstrated, to show how physicists could benefit from existing tools of proven superiority to guarantee reproducibility in simulation, reconstruction and analysis of big physics experiments.
Development of scientific software has always presented challenges to its practitioners, among other things due to its inherently collaborative nature. Software systems often consistent of up to several dozen closely-related packages developed within a particular experiment or related ecosystem, with up to a couple of hundred externally-sourced dependencies. Making improvements to one such package can require related changes to multiple other packages, and some systemic improvements can require major structural changes across the ecosystem.
There have been several attempts to produce a multi-package development system within HEP in the past, such systems usually being limited to one or a few experiments and requiring a common build system (e.g. Make, CMake). Common features include a central installation of each "release" of the software system to avoid multiple builds of the same package on a system, and integration with version control systems.
SpackDev is based on the powerful Spack build and packaging system in wide use in HPC, utilizing its package recipes and build mangement system to extract build instructions and manage the parallel development, build and test process for multiple packages at a time. Intended to handle packages without restriction to one internal build system, SpackDev is integrated with Spack as a command extension and is generally applicable outside HEP. We describe SpackDev's features and development over the last two years and the medium-term future, and initial experience using the SpackDev in the context of the LArSoft liquid argon detector toolkit.
The conda package manager is widely used in both commercial and academic high-performance computing across a wide range of fields. In 2016 conda-forge was founded as a community-driven package repository which allows packaging efforts to be shared across communities. This is especially important with the challenges faced when packaging modern software with complex dependency chains or specialised hardware such as GPUs. Conda-forge receives support from Anaconda Inc. and became an officially supported PyData project in 2018. Conda is a language independent package manager which can be used for providing native binaries for Linux, macOS and Windows with x86, arm64 and POWER architectures.
The ROOT framework is a fundamental component of many HEP experiments. However, quickly installing ROOT on a new laptop or deploying it in continuous integration systems typically requires a non-negligible amount of domain-specific skills. The ability to install ROOT within conda has been requested for many years and its appeal was proven with it over 18,000 downloads within the first 5 months of it being made available. In addition, it has subsequently been used as a base for distributing other packages such as CMS’s event display package (Fireworks) and the alphatwirl analysis framework.
In this contribution we will discuss the process of adding ROOT releases to conda-forge and how nightly builds of ROOT are being provided to allow end users to provide feedback on new and experimental features such as RDataFrame. We also discuss our experience distributing conda environments using CVMFS for physics analysts to use both interactively and with distributed computing resources.
The number of BYOD continuously grows at CERN. Additionally, it is desirable to move from a centrally managed model to a distributed model where users are responsible for their own devices. Following this strategy, the new tools have to be provided to distribute and - in case of licensed software - also track applications used by CERN users. The available open source and commercial solutions were analyzed and none of them proved to be a good fit for CERN use cases. Therefore, it was decided to develop a system that could integrate various open source solutions and provide desired functionality for multiple platforms, both mobile and desktop. This paper presents the architecture and design decisions made to achieve a platform-independent, modern, maintainable and extensible system for software distribution at CERN.
CERN Windows server infrastructure consists of about 900 servers. The management and maintenance is often a challenging task as the data to be monitored is disparate and has to be collected from various sources. Currently, alarms are collected from the Microsoft System Center Operation Manager (SCOM) and many administrative actions are triggered through e-mails sent by various systems or scripts.
The objective of the Chopin Management System project is to maximize automation and facilitate the management of the infrastructure. The current status of the infrastructure, including essential health checks, is centralized and presented through a dashboard. The system collects information necessary for managing the infrastructure in the real-time, such as hardware configuration or Windows updates, and reacts to any change or failure instantly . As part of the system design, big data streaming technologies are employed in order to assure the scalability and fault-tolerance of the service, should the number of servers drastically grow. Server events are aggregated and processed in real-time through the use of these technologies, ensuring quick response to possible failures. This paper presents details of the architecture and design decisions taken in order to achieve a modern, maintainable and extensible system for Windows Server Infrastructure management at CERN.
GUM is a new feature of the GAMBIT global fitting software framework, which provides a direct interface between Lagrangian level tools and GAMBIT. GUM automatically writes GAMBIT routines to compute observables and likelihoods for physics beyond the Standard Model. I will describe the structure of GUM, the tools (within GAMBIT) it is able to create interfaces to, and the observables it is able to compute.
I will discuss recent advances in lattice QCD from the physics and computational points of view that have enabled basic a number properties and interactions of light nuclei to be determined directly from QCD. These calculations offer the prospect of providing nuclear matrix inputs necessary for a range of intensity frontier experiments (DUNE, mu2e) and dark matter direct-detection experiments along with well-quantified uncertainties.
Background field methods offer an approach through which fundamental non-perturbative hadronic properties can be studied. Lattice QCD is the only ab initio method with which Quantum Chromodynamics can be studied at low energies; it involves numerically calculating expectation values in the path integral formalism. This requires substantial investment in high performance super computing resources. Here the background field method is used with lattice QCD to induce a uniform background magnetic field.
A particular challenge of lattice QCD is isolating the desired state, rather than a superposition of excited states. While extensive work has been performed which allows the ground state to be identified in lattice QCD calculations, this remains a challenging proposition for the ground state in the presence of a background field. Quark level operators are introduced to resolve this challenge and thus allow for extraction of the magnetic polarisability that characterises the response of the nucleon to a magnetic field.
There exists a long standing discrepancy of around 3.5 sigma between experimental measurements and standard model calculations of the magnetic moment of the muon. Current experiments aim to reduce the experimental uncertainty by a factor of 4, and Standard Model calculations must also be improved by a similar order. The largest uncertainty in the Standard Model calculation comes from the QCD contribution, in particular the leading order hadronic vacuum polarisation (HVP). To calculate the HVP contribution, we use lattice gauge theories which allows us to study QCD at low energies. In order to better understand this quantity, we investigate the effect of QED corrections to the leading order HVP term by including QED in our lattice calculations, and investigate flavour breaking effects. This is done using fully dynamical QCD+QED gauge configurations generated by the QCDSF collaboration and a novel method of quark turning.
Computing the gluon component of momentum in the nucleon is a difficult and computationally expensive problem, as the matrix element involves a quark-line-disconnected gluon operator which suffers from ultra-violet fluctuations. But also necessary for a successful determination is the non-perturbative renormalisation of this operator. We investigate this renormalisation here by direct computation in the RI mom scheme. A clear statistical signal is obtained in the direct calculation by an adaption of the Feynman-Hellmann technique. A comparison is conducted in order to verify the energy-momentum sum rule of the nucleon.
The origin of the low-lying nature of the Roper resonance has been the subject of significant interest for many years, including several investigations using lattice QCD. It has been claimed that chiral symmetry plays an important role in our understanding of this resonance. We present results from our systematic examination of the potential role of chiral symmetry in the low-lying nucleon spectrum through the direct comparison of the clover and overlap fermion actions. After a brief summary of the background motivation, we specify the computational details of the study and outline our comparison methodologies. We do not find any strong evidence supporting the claim that chiral symmetry plays a significant role in understanding the Roper resonance on the lattice.
The WLCG Web Proxy Auto Discovery (WPAD) service provides a convenient mechanism for jobs running anywhere on the WLCG to dynamically discover web proxy cache servers that are nearby. The web proxy caches are general purpose for a number of different http applications, but different applications have different usage characteristics and not all proxy caches are engineered to work with the heaviest loads. For this reason, the initial sources of information for WLCG WPAD were the static configurations that ATLAS and CMS maintain for the Conditions data that they read through the Frontier Distributed Database system, which is the most demanding popular WLCG application for web proxy caches. That works well for use at traditional statically defined WLCG sites, but now that usage of commercial clouds is increasing, there is also a need for web proxy caches to dynamically register themselves as they are created. A package called Shoal had already been created to manage dynamically created web proxy caches. This paper describes the integration of the Shoal package into the WLCG WPAD system, such that both staticly and dynamically created web proxy caches can be located from a single source. It also describes other improvements to the WLCG WPAD system since the last CHEP publication.
Within the ATLAS detector, the Trigger and Data Acquisition system is responsible for the online processing of data streamed from the detector during collisions at the Large Hadron Collider (LHC) at CERN. The online farm is composed of ~4000 servers processing the data read out from ~100 million detector channels through multiple trigger levels. The capability to monitor the ongoing data taking and all the involved applications is essential to debug and intervene promptly to ensure efficient data taking. The base of the current web service architecture was designed a few years ago, at the beginning of the ATLAS operation (Run 1). It was intended to serve primarily static content from a Network-attached Storage, and privileging strict security, using separate web servers for internal (ATLAS Technical and Control Network - ATCN) and external (CERN General Purpose Network and public internet) access. During these years, it has become necessary to add to the static content an increasing number of dynamic web-based User Interfaces, as they provided new functionalities and replaced legacy desktop UIs. These are typically served by applications on VMs inside ATCN and made accessible externally via chained reverse HTTP proxies. As the trend towards Web UIs continues, the current design has shown its limits, and its increasing complexity became an issue for maintenance and growth. It is, therefore, necessary to review the overall web services architecture for ATLAS, taking into account the current and future needs of the upcoming LHC Run 3.
In this paper, we present our investigation and roadmap to re-design the web services system to better operate and monitor the ATLAS detector, while maintaining the security of critical services, such as Detector Control System, and maintaining the separation of remote monitoring and on-site control according to ATLAS policies.
Computational science, data management and analysis have been key factors in the success of Brookhaven Lab's scientific programs at the Relativistic Heavy Ion Collider (RHIC), the National Synchrotron Light Source (NSLS-II), the Center for Functional Nanomaterials (CFN), and in biological, atmospheric, and energy systems science, Lattice Quantum Chromodynamics (LQCD) and Materials Science as well as our participation in international research collaborations, such as the ATLAS Experiment at Europe's Large Hadron Collider (LHC) and Belle II Experiment at KEK (Japan). The construction of a new data center is an acknowledgement of the increasing demand for computing and storage services at BNL.
The Computing Facility Revitalization (CFR) project is aimed at repurposing the former National Synchrotron Light Source (NSLS-I) building as the new datacenter for BNL. The new data center is to become available in early 2021 for ATLAS compute, disk storage and tape storage equipment, and later that year - for all other collaborations supported by the RACF/SDCC Facility, including: STAR, PHENIX and sPHENIX experiments at RHIC collider at BNL, Belle II Experiment at KEK (Japan), and BNL CSI HPC clusters. Migration of the majority of IT payload from the existing datacenter to the new datacenter is expected to begin with the central networking systems and first BNL ATLAS Tier-1 Site tape robot in early FY21, and it is expected to continue throughout FY21-23. This presentation will highlight the key MEP facility infrastructure components of the new data center. Also, we will describe our plans to migrate IT equipment between datacenters, the inter-operational period in FY21, gradual IT equipment replacement in FY21-24, and show the expected state of occupancy and infrastructure utilization for both datacenters in FY25.
Since 2013 CERN’s local data centre combined with a colocation infrastructure at the Wigner data centre in Budapest have been hosting the compute and storage capacity for WLCG Tier-0. In this paper we will describe how we try to optimize and improve the operation of our local data centre to meet the anticipated increment of the physics compute and storage requirements for Run3, taking into account two important changes on the way: the end of the colocation contract with Wigner in 2019 and the loan of 2 out of 6 prefabricated compute containers being commissioned by the LHCb experiment for their online computing farm.
The ATLAS Spanish Tier-1 and Tier-2s have more than 15 years of experience in the deployment and development of LHC computing components and their successful operations. The sites are already actively participating in, and even coordinating, emerging R&D computing activities developing the new computing models needed in the LHC Run3 and HL-LHC periods.
In this contribution, we present details on the integration of new components, such as HPC computing resources, to execute ATLAS simulation workflows; the development of new techniques to improve efficiency in a cost-effective way, such as storage and CPU federations; and improvements in Data Organization, Management and Access through storage consolidations ("data-lakes"), the use of data Caches, and improving experiment data catalogues, like Event Index. The design and deployment of novel analysis facilities using GPUs together with CPUs and techniques like Machine Learning will also be presented.
ATLAS Tier-1 and Tier-2 sites in Spain are, and will be, contributing to significant R&D in computing, evaluating different models for improving performance of computing and data storage capacity in the LHC High Luminosity era.
DESY is one of the largest accelerator laboratories in Europe, developing and operating state of the art accelerators, used to perform fundamental science in the areas of high-energy physics photon science and accelerator development.\newline
While for decades high energy physics has been the most prominent user of the DESY compute, storage and network infrastructure, various scientific areas as science with photons and accelerator development have catched up and are now dominating the demands on the DESY infrastructure resources, with significant consequences for the IT resource provisioning. In this contribution, we will present an overview of the computational, storage and network resources covering the various physics communities on site.\newline
Ranging from HTC batch-like offline processing in the Grid and the interactive user analyses resources in the National Analysis Factory for
the HEP community, to the computing needs of accelerator development or of photon sciences such as PETRA III or the European XFEL. Since DESY co-hosts these experiments and their data taking, their requirements include fast low-latency online processing for data taking and calibration as well as offline processing, thus HPC workloads, that are run on the dedicated {\it Maxwell} HPC cluster.\newline
As all communities face in the coming years significant challenges due to changing environments and increasing data rates, we will discuss how this will reflect in necessary changes to the computing and storage
infrastructures.\newline
We will present DESY compute cloud and container orchestration plans as a possible basis for infrastructure and platform services. We will show examples of Jupyter for small scale interactive analysis, as well as its integration into large scale resources such as batch systems or Spark clusters.\newline
To overcome the fragmentation of the various resources for all scientific communities at DESY , we explore how to integrate them into a seamless user experience in an {\it Interdisciplinary Data and Analysis Facility}
In this paper we present the latest CMS open data release published on the CERN Open Data portal. The samples of raw datasets, collision and simulated datasets were released together with the detailed information about the data provenance. The data production chain covers the necessary compute environments, the configuration files and the computational procedures used in each data production step. We describe data curation techniques used to obtain and publish the data provenance information and we study the possibility to reproduce parts of the released data using the publicly available information. The present work demonstrates the usefulness of releasing selected samples of raw and primary data in order to fully ensure the completeness of information about data production chain for the attention of general data scientists and other non-specialists interested in using particle physics data for education or research purposes.
The CMS collaboration at the CERN LHC has made more than one petabyte of open data available to the public, including large parts of the data which formed the basis for the discovery of the Higgs boson in 2012. Apart from their scientific value, these data can be used not only for education and outreach, but also for open benchmarks of analysis software. However, in their original format, the data cannot be accessed easily without experiment-specific knowledge and skills. Work is presented that allows to set up open analyses that are performed close to the published ones, but which meet minimum requirements for experiment-specific knowledge and software. The suitability of this approach for education and outreach is demonstrated with analyses that have been made fully accessible to the public via the CERN open data portal. In the second part of the talk, the value of these data as basis for benchmarks of analysis software under realistic conditions of a high-energy physics experiment is discussed.
Open Data Science Mesh (CS3MESH4EOSC) is a newly funded project to create a new generation, interoperable federation of data and higher-level services to enable friction-free collaboration between European researchers.
This new EU-funded project brings together 12 partners from the CS3 community (Cloud Synchronization and Sharing Services). The consortium partners include CERN, Danish Technical University (DK), SURFSARA (NL), Poznan Supercomputing Centre (PL), CESNET (CZ), AARNET (AUS), SWITCH (CH), University of Munster (DE), Ailleron SA (PL), Cubbit (IT), Joint Research Centre (BE) and Fundacion ESADE (ES). CERN acts as project coordinator.
The consortium already operates services and storage-centric infrastructure for around 300 thousand scientists and researchers across the globe. The project will integrate these local existing sites and services into a seamless mesh infrastructure which is fully interconnected with the EOSC-Hub, as proposed in the European Commission’s Implementation Roadmap for EOSC.
The project will provide a framework for applications in several major areas: Data Science Environments, Open Data Systems, Collaborative Documents, On-demand Large Dataset Transfers and Cross-domain Data Sharing.
The collaboration between the users will be enabled by a simple sharing mechanism: a user will select a file or folder to share with other users at other sites. Such shared links will be established and removed dynamically by the users from a streamline web interface of their local storage systems. The mesh will automatically and contextually enable different research workflow actions based on type of content shared in the folder. One of the excellence areas of CS3 services is access to content from all types of devices: web, desktop applications and mobile devices. The project augments this capability to access content stored on remote sites and will in practice introduce FAIR principles in European Science.
The project with leverage on technologies developed and integrated in the research community, such as ScienceBox (CERNBox, SWAN, EOS), EGI-CheckIn, File Transfer Service (FTS), ARGO, EduGAIN and others. The project will also involve commercial cloud providers, integrating their software and services
The ATLAS Collaboration is releasing a new set of recorded and simulated data samples at a centre-of-mass energy of 13 TeV. This new dataset was designed after an in-depth review of the usage of the previous release of samples at 8 TeV. That review showed that capacity-building is one of the most important and abundant uses of public ATLAS samples. To fulfil the requirements of the community and at the same time attract new users and use cases, we developed real analysis software based on ROOT in two of the most popular programming languages: C++ and Python. These so-called analysis frameworks are complex enough to reproduce with reasonable accuracy the results -figures and final yields- of published ATLAS Collaboration physics papers, but still light enough to be run on commodity computers. Computers that university students and regular classrooms have, allow students to explore LHC data with similar techniques to those used by current ATLAS analysers. We present the development path and the final result of these analysis frameworks, their products and how they are distributed to final users inside and outside the ATLAS community.
Perform data analysis and visualisation on your own computer? Yes, you can! Commodity computers are now very powerful in comparison to only a few years ago. On top of that, the performance of today's software and data development techniques facilitates complex computation with fewer resources. Cloud computing is not always the solution, and reliability or even privacy is regularly a concern. While the Infrastructure as a Service (IaaS) and Software as a Service (SaaS) philosophies are a key part of current scientific endeavours, there is a misleading feeling that we need to have remote computers to do any kind of data analysis. One of the aims of the ATLAS Open Data project is to provide resources — data, software and documents — that can be stored and executed in computers with minimal or non-internet access, and in as many different operating systems as possible. This approach is viewed as complementary to the IaaS/SaaS approach, where local university, students and trainers' resources can be used in an effective and reproducible way — making the HEP and Computer Sciences fields accessible to more people. We present the latest developments in the production and use of local Virtual Machines and Docker Containers for the development of physics data analysis. We also discuss example software and Jupyter notebooks, which are in constant development for use in classrooms, and students’ and teachers' computers around the world.