25th International Conference on Computing in High Energy & Nuclear Physics

Name: 25th International Conference on Computing in High Energy & Nuclear Physics
Start: 2021-05-17T14:50:00+02:00
End: 2021-05-21T18:10:00+02:00
Location: No location set

17 May 2021, 14:50 → 21 May 2021, 18:10 Europe/Paris

Description

25th International Conference on Computing in High-Energy and Nuclear Physics

vCHEP2021

Welcome! The CHEP conference series addresses the computing, networking and software issues for the world’s leading data‐intensive science experiments that currently analyse hundreds of petabytes of data using worldwide computing resources.

vCHEP 2021 was held as a virtual event between Monday-Friday 17th-21st May 2021.

Thank you to everyone who came and contributed to the conference.

Proceedings

The vCHEP 2021 proceedings are now published in EPJ Web of Conferences.

simone.campana@cern.ch

graeme.andrew.stewart@cern.ch

Participants

1144 View full list

Monday 17 May
- Opening Session
  
  Conveners: James Catmore (University of Oslo (NO)), Simone Campana (CERN)
  
  Video recording
  
  Zoom
  - 1
    
    Welcome
    
    Speaker: Joachim Josef Mnich (CERN)
    
    Recording
  - 2
    
    Introduction
    
    Speaker: Dr Graeme A Stewart (CERN)
    
    Recording
    
    vCHEP-Opening.pdf
- 15:15
  Conference Photo
  The group photo of the conference participants will be composed of small but recognizable pictures of people connected to the Zoom meeting with their cameras enabled. The names will be blurred. The final group photo will afterwards be published on the conference website, and possibly in other publications.
  - If you want to appear on the group photo, please enable your camera when we will be taking the photo (technically, screenshots of the Zoom gallery view).
  - If you prefer not to be included in the group photo, please just keep your camera off.
  The screenshots for the group photo will be taken during the dedicated sessions on Monday afternoon (15:15) and on Tuesday morning (10:30). If you participate in one of them, there is no need to attend the other.
- Opening Session
  
  Conveners: James Catmore (University of Oslo (NO)), Simone Campana (CERN)
  
  Video recording
  
  Zoom
  - 3
    
    Keynote Talk: Computing Perspectives
    
    Speakers: Ian BIRD (CNRS), Ian Bird
    
    ComputingPerspectives-vCHEP21.pdf
    
    ComputingPerspectives-vCHEP21.pptx
    
    Recording
  - 4
    
    Keynote Talk: Software Perspectives
    
    Speaker: Heather Gray (UC Berkeley/LBNL)
    
    HeatherCHEP5.pdf
    
    Recording
- 16:20
  
  Break
- Monday PM plenaries: Plenaries
  
  Conveners: Catherine Biscarat (L2I Toulouse, IN2P3/CNRS (FR)), Stefan Roiser (CERN)
  
  Mattermost
  
  Video recording
  
  Zoom
  - 5
    
    Preparing distributed computing operations for the HLLHC era with Operational Intelligence
    
    The Operational Intelligence (OpInt) project is a joint effort from
    various WLCG communities aimed at increasing the level of automation
    in computing operations and reducing human interventions. The currently deployed systems have proven to be mature and capable of meeting the experiments goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters and operational teams is needed to manage efficiently such heterogeneous infrastructures.
    Under the scope of the OpInt project, experts from most of the relevant areas
    have gathered to propose and work on “smart” solutions. Machine learning,
    data mining, log analysis, and anomaly detection are only some of the tools we
    have evaluated for our use cases . Discussions have led to a number of ideas on
    how to achieve our goals and the development of solutions has started. In this
    contribution, we will report on the development of a suite of OpInt services to
    cover various use cases of: workload management, data management, and site
    operations.
    
    Speaker: Panos Paparrigopoulos (CERN)
    
    Operational_Intelligence.pdf
    
    Recording
  - 6
    
    Implementation of ACTS into sPHENIX Track Reconstruction
    
    sPHENIX is a high energy nuclear physics experiment under construction at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The primary physics goals of sPHENIX are to measure jets, their substructure, and the upsilon resonances in $p$$+$$p$, $p$+Au, and Au+Au collisions. sPHENIX will collect approximately 200 PB of data over three run periods utilizing a finite-sized computing center; thus, performing track reconstruction in a timely manner is a challenge due to the high occupancy of heavy ion collisions. To achieve the goal of reconstructing tracks with high efficiency and within a 5 second per event computational budget, the sPHENIX experiment has recently implemented the A Common Tracking Software (ACTS) track reconstruction toolkit. This paper reports the performance status of ACTS as the default track fitting tool within sPHENIX, including discussion of the first implementation of a TPC geometry within ACTS.
    
    Speaker: Joe Osborn (Oak Ridge National Laboratory)
    
    Recording
    
    sPHENIX_ACTS_CHEP.pdf
  - 17:40
    
    Break
  - 7
    
    The new (and improved!) CERN Single-Sign-On
    
    The new CERN Single-Sign-On (SSO), built around an open sourcestack, has been in production for over a year and many CERN users are alreadyfamiliar with its approach to authentication, either as a developer or as an enduser. What is visible upon logging in, however, is only the tip of the iceberg.Behind the scenes there has been a significant amount of work taking placeto migrate accounts management and to decouple Kerberos [1] authenticationfrom legacy Microsoft components. Along the way the team has been engagingwith the community through multiple fora, to make sure that a solution is pro-vided that not only replaces functionality but also improves the user experiencefor all CERN members. This paper will summarise key evolutions and clarifywhat is to come in the future.
    
    Speaker: Mary Georgiou (CERN)
    
    New CERN SSO, CHEP -2021.pdf
    
    Recording
    
    The new (and improved!) CERN Single-Sign-On
  - 8
    
    Porting HEP Parameterized Calorimeter Simulation Code to GPUs
    
    The High Energy Physics (HEP) experiments, such as those at theLarge Hadron Collider (LHC), traditionally consume large amounts of CPUcycles for detector simulations and data analysis, but rarely use compute accel-erators such as GPUs. As the LHC is upgraded to allow for higher luminosity,resulting in much higher data rates, purely relying on CPUs may not provideenough computing power to support the simulation and data analysis needs. Asa proof of concept, we investigate the feasibility of porting a HEP parameterized calorimeter simulation code to GPUs. We have chosen to use FastCaloSim,the ATLAS fast parametrized calorimeter simulation. While FastCaloSim issufficiently fast such that it does not impose a bottleneck in detector simula-tions overall, significant speed-ups in the processing of large samples can beachieved from GPU parallelization at both the particle (intra-event) and eventlevels; this is especially beneficial in conditions expected at the high-luminosityLHC, where an immense number of per-event particle multiplicities will resultfrom the many simultaneous proton-proton collisions. We report our experi-ence with porting FastCaloSim to NVIDIA GPUs using CUDA. A preliminaryKokkos implementation of FastCaloSim for portability to other parallel archi-tectures is also described
    
    Speaker: Dr Charles Leggett (Lawrence Berkeley National Lab (US))
    
    FastCaloSim_for_vCHEP_2021_f.pdf
    
    Recording
Tuesday 18 May
- Tues AM Plenaries: Plenaries
  
  Conveners: Caterina Doglioni (Lund University (SE)), Maria Girone (CERN)
  
  Mattermost
  
  Video recording
  
  Zoom
  - 9
    
    Towards a realistic track reconstruction algorithm based on graph neural networks for the HL-LHC
    
    The physics reach of the HL-LHC will be limited by how efficiently the experiments can use the available computing resources, i.e. affordable software and computing are essential. The development of novel methods for charged particle reconstruction at the HL-LHC incorporating machine learning techniques or based entirely on machine learning is a vibrant area of research. In the past two years, algorithms for track pattern recognition based on graph neural networks (GNNs) have emerged as a particularly promising approach. Previous work mainly aimed at establishing proof of principle. In the present document we describe new algorithms that can handle complex realistic detectors. The new algorithms are implemented in ACTS, a common framework for tracking software. This work aims at implementing a realistic GNN-based algorithm that can be deployed in an HL-LHC experiment.
    
    Speaker: Charline Rougier (Laboratoire des 2 Infinis - Toulouse, CNRS / Univ. Paul Sabatier (FR))
    
    Recording
    
    vCHEP_CR_GNN.pdf
  - 10
    
    ALICE Central Trigger System for LHC Run 3
    
    A major upgrade of the ALICE experiment is ongoing aiming to a high-rate data taking during LHC Run 3 (2022-2024).
    The LHC interaction rate at Point 2 will be increased to $50\ \mathrm{kHz}$ kHz in Pb-Pb collisions and $1\ \mathrm{MHz}$ in pp collisions. ALICE experiment will be able to readout full interaction rate leading to an increase of the collected luminosity up a factor of about 100 with respect to the LHC Run 1 and 2. To satisfy these requirements a new readout system has been developed for most of the ALICE detectors allowing the full readout of the data at the required interaction rates without the need for a hardware trigger selection. A novel trigger and timing distribution system will be implemented based on Passive Optical Network (PON) and GigaBit Transceiver (GBT) technology. To assure backward compatibility a triggered mode based on RD12 TTC technology as the one used in the previous LHC runs will be maintained and re-implemented under the new Central Trigger System (CTS). A new universal ALICE Trigger Board (ATB) based on the Xilinx Kintex Ultrascale FPGA has been designed to function as a Central Trigger Processor (CTP), Local Trigger Unit (LTU), and monitoring interfaces.
    
    In this paper, this hybrid multilevel system with continuous readout will be described, together with the triggering mechanism and algorithms. An overview of the CTS, the design of the ATB and the different communication protocols will be presented.
    
    Speaker: Jakub Kvapil (University of Birmingham (GB))
    
    ALICE_CTS_JakubKvapil.pdf
    
    Recording
  - 11
    
    Public Engagement in a Global Pandemic
    
    UKRI/STFC’s Scientific Computing Department (SCD) has a long and rich history of delivering face to face public engagement and outreach, both on site and in public places, as part of the wider STFC programme. Due to the global COVID-19 pandemic, SCD was forced to abandon an extensive planned programme of public engagement, alongside altering the day-to-day working methods of the majority of its staff. SCD had to respond rapidly to create a new, remote only, programme for the summer and for the foreseeable future. This was initially an exercise in improvisation, identifying existing activities that could be delivered remotely with minimal changes. As the pandemic went on, SCD also created new resources specifically for a remote audience and adapted existing activities where appropriate, using our evaluation framework to ensure these activities continued to meet the aims of the in-person programme. This paper presents the process through which this was achieved, some of the benefits and challenges of remote engagement and the plans for 2021 and beyond.
    
    Speaker: Mr Greg Corbett (STFC)
    
    20210518 PE Public Engagement in a Global Pandemic.pdf
    
    20210518 PE Public Engagement in a Global Pandemic.pptx
    
    Recording
- 10:30
  Conference Photo
  The group photo of the conference participants will be composed of small but recognizable pictures of people connected to the Zoom meeting with their cameras enabled. The names will be blurred. The final group photo will afterwards be published on the conference website, and possibly in other publications.
  - If you want to appear on the group photo, please enable your camera when we will be taking the photo (technically, screenshots of the Zoom gallery view).
  - If you prefer not to be included in the group photo, please just keep your camera off.
  The screenshots for the group photo will be taken during the dedicated sessions on Monday afternoon (15:15) and on Tuesday morning (10:30). If you participate in one of them, there is no need to attend the other.
- 10:35
  
  Break
- Algorithms: Tue AM
  
  Conveners: David Rohr (CERN), John Derek Chapman (University of Cambridge (GB))
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday morning
  
  Zoom
  - 12
    
    A C++ Cherenkov photons simulation in CORSIKA 8
    
    CORSIKA is a standard software for simulations of air showers induced by cosmic rays. It has been developed in Fortran 77 continuously over the last thirty years. So it becomes very difficult to add new physics features to CORSIKA 7. CORSIKA 8 aims to be the future of the CORSIKA project. It is a framework in C++17 which uses modern concepts in object oriented programming for an efficient modularity and flexibility. The CORSIKA 8 project aims to obtain high performance by exploiting techniques such as vectorization, gpu/cpu parallelization, extended use of static polymorphism and the most precise physical models available.
    In this paper we focus on the Cherenkov photon propagation module of CORSIKA, which is of particular interest for gamma-ray experiments, like the Cherenkov Telescope Array. First, we present the optimizations that we have applied to the Cherenkov module thanks to the results of detailed profiling using performance counters.
    Then, we report our preliminary work to develop the Cherenkov Module in the CORSIKA 8 framework. Finally, we will demonstrate the first performance comparison with the current CORSIKA software as well as the validation of physics results.
    
    Speaker: Mr Matthieu Carrère (CNRS)
    
    A_C++_Cherenkov_photons_simulation_in_CORSIKA_8.pdf
    
    Recording
  - 13
    
    Studies of GEANT4 performance for different ATLAS detector geometries and code compilation methods
    
    Full detector simulation is known to consume a large proportion of computing resources available to the LHC experiments, and reducing time consumed by simulation will allow for more profound physics studies. There are many avenues to exploit, and in this work we investigate those that do not require changes in the GEANT4 simulation suite. In this study, several factors affecting the full GEANT4 simulation execution time are investigated. A broad range of configurations has been tested to ensure consistency of physical results. The effect of a single dynamic library GEANT4 build type has been investigated and the impact of different primary particles at different energies has been evaluated using GDML and GeoModel geometries. Some configurations have an impact on the physics results and are therefore excluded from further analysis. Usage of the single dynamic library is shown to increase execution time and does not represent a viable option for optimizations. Lastly, the static build type is confirmed as the most effective method to reduce the simulation execution time.
    
    Speaker: Mrs Caterina Marcon (Lund University (SE))
    
    ATL-COM-SOFT-2021-034.pdf
    
    Recording
  - 14
    
    CMS Full Simulation for Run 3
    
    We report status of the CMS full simulation for Run-3. During the long shutdown of the LHC a significant update has been introduced to the CMS code for simulation. CMS geometry description is reviewed. Several important modifications were needed. CMS detector description software is migrated to the DD4Hep community developed tool. We will report on our experience obtained during the process of this migration. Geant4 10.7 is the CMS choice for Run-3 simulation productions. We will discuss arguments for this choice, the strategy of adaptation of a new Geant4 version, and will report on physics performance of CMS simulation. A special Geant4 Physics List configuration FTFP_BERT_EMM will be described, which provides a compromise between simulation accuracy and CPU performance. A significant fraction of time for simulation of CMS events is spent on tracking of charge particles in magnetic field. In CMS simulation a dynamic choice of Geant4 parameters for tracking in field is implemented. A new method is introduced into simulation of electromagnetic components of hadronic showers in the electromagnetic calorimeter of CMS. For low-energy electrons and positrons a parametrization of GFlash type is applied. Results of tests of this method will be discussed. In summary, we expect about 25% speedup of CMS simulation production for Run-3 compared to the Run-2 simulations.
    
    Speaker: Prof. Vladimir Ivantchenko (CERN)
    
    CMSFullSim.pdf
    
    Recording
  - 15
    
    Fast simulation of Time-of-Flight detectors at the LHC
    
    The modelling of Cherenkov based detectors is traditionally done using Geant4 toolkit. In this work, we present another method based on Python programming language and Numba high performance compiler to speed up the simulation. As an example we take one of the Forward Proton Detectors at the CERN LHC - ATLAS Forward Proton (AFP) Time-of-Flight, which is used to reduce the background from multiple proton-proton collisions in soft and hard diffractive events. We describe the technical details of the fast Cherenkov model of photon generation and transportation through the optical part of the ToF detector. The fast simulation is revealed to be about 200 times faster than the corresponding Geant4 simulation, and provides similar results concerning length and time distributions of photons. The study is meant as the first step in a construction of a building kit allowing creation of a fast simulation of an arbitrary shaped optical part of detectors.
    
    Speaker: Olivier Rousselle (Laboratoire Kastler Brossel (FR))
    
    Presentation - Olivier Rousselle - CHEP.pdf
    
    Recording
  - 16
    
    Monte Carlo matching in the Belle II software
    
    The Belle II experiment is an upgrade to the Belle experiment, and is located at the SuperKEKB facility in KEK, Tsukuba, Japan. The Belle II software is completely new and is used for everything from triggering data, generation of Monte Carlo events, tracking, clustering, to high-level analysis. One important feature is the matching between the combinations of reconstructed objects which form particle candidates and the underlying simulated particles from the event generators. This is used to study detector effects, analysis backgrounds, and efficiencies. This document describes the algorithm that is used by Belle II.
    
    Speaker: Yo Sato (Tohoku University)
    
    MCmatching_Belle_II_yosato_vCHEP2021.pdf
    
    Recording
- Artificial Intelligence: Tue AM
  
  Conveners: Eduardo Rodrigues (University of Liverpool (GB)), Simone Pigazzini (ETH Zurich (CH))
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 17
    
    C++ Code Generation for Fast Inference of Deep Learning Models in ROOT/TMVA
    
    We report the latest development in ROOT/TMVA, a new system that takes trained ONNX deep learning models and emits C++ code that can be easily included and invoked for fast inference of the model, with minimal dependency. We present an overview of the current solutions for conducting inference in C++ production environment, discuss the technical details and examples of the generated code, and demonstrates its development status with a preliminary benchmark against popular tools.
    
    Speaker: Sitong An (CERN, Carnegie Mellon University (US))
    
    Recording
    
    TMVA CHEP 2021-4.pdf
  - 18
    
    Deep learning based low-dose synchrotron radiation CT reconstruction
    
    Synchrotron radiation sources are widely used in various fields, among which computed tomography (CT) is one of the most important fields. The amount of effort expended by the operator varies depending on the subject. If the number of angles needed to be used can be greatly reduced under the condition of similar imaging effects, the working time and workload of the experimentalists will be greatly reduced. However, decreasing the sampling Angle can produce serious artifacts and blur the details. We try to use the deep learning which can build high quality reconstruction sparse data sampling from the Angle of the image and ResAttUnet are put forward. ResAttUnet is roughly a symmetrical U-shaped network that incorporates similar mechanisms to ResNet and attention. In addition, the hybrid precision training technique is adopted to reduce the demand for video memory of the model.
    
    Speaker: Ling Li (Institute of High Energy Physics, CAS;University of Chinese Academy of Sciences)
    
    Deep learning based low-dose synchrotron radiation CT reconstruction.pdf
    
    Recording
  - 19
    
    Intelligent compression for synchrotron radiation source image
    
    Synchrotron radiation sources (SRS) produce a huge amount of image data. This scientific data, which needs to be stored and transferred losslessly, will bring great pressure on storage and bandwidth. The SRS images have the characteristics of high frame rate and high resolution, and traditional image lossless compression methods can only save up to 30% in size. Focus on this problem, we propose a lossless compression method for SRS images based on deep learning. First, we use the difference algorithm to reduce the linear correlation within the image sequence. Then we propose a reversible truncated mapping method to reduce the range of the pixel value distribution. Thirdly, we train a deep learning model to learn the nonlinear relationship within the image sequence. Finally, we use the probability distribution predicted by the deep leaning model combined with arithmetic coding to fulfil lossless compression. Test result based on SRS images shows that our method can further decrease 20% of the data size compared to PNG, JPEG2000 and FLIF.
    
    Speaker: Shiyuan Fu
    
    Intelligent_Compression_for_Synchrotron_Radiation_Source_Image.pdf
    
    Recording
  - 20
    
    Event Classification with Multi-step Machine Learning
    
    The usefulness and valuableness of Multi-step ML, where a task is organized into connected sub-tasks with known intermediate inference goals, as opposed to a single large model learned end-to-end without intermediate sub-tasks, is presented. Pre-optimized ML models are connected and better performance is obtained by re-optimizing the connected one. The selection of a ML model from several small ML model candidates for each sub-task has been performed by using the idea based on NAS. In this paper, DARTS and SPOS-NAS are tested, where the construction of loss functions is improved to keep all ML models smoothly learning. Using DARTS and SPOS-NAS as an optimization and selection as well as the connecting for multi-step machine learning systems, we find that (1) such system can quickly and successfully select highly performant model combinations, and (2) the selected models are consistent with baseline algorithms such as grid search and their outputs are well controlled.
    
    Speaker: Masahiko Saito (University of Tokyo (JP))
    
    Recording
    
    vCHEP2021.pdf
  - 21
    
    The use of Boosted Decision Trees for Energy Reconstruction in JUNO experiment
    
    The Jiangmen Underground Neutrino Observatory (JUNO) is a neutrino experiment with a broad physical program. The main goals of JUNO are the determination of the neutrino mass ordering and high precision investigation of neutrino oscillation properties. The precise reconstruction of the event energy is crucial for the success of the experiment.
    JUNO is equiped with 17 612 + 25 600 PMT channels of two kind which provide both charge and hit time information. In this work we present a fast Boosted Decision Trees model using small set of aggregated features. The model predicts event energy deposition. We describe the motivation and the details of our feature engineering and feature selection procedures. We demonstrate that the proposed aggregated approach can achieve a reconstruction quality that is competitive with the quality of much more complex models like Convolution Neural Networks (ResNet, VGG and GNN).
    
    Speaker: Mr Arsenii Gavrikov (HSE University)
    
    Gavrikov_Ratnikov_BDT_for_energy_reco_vCHEP2021.pdf
    
    Recording
- Online: Tue AM
  
  Conveners: Dmytro Kresan (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE)), Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))
  
  Mattermost
  
  Video recording Tuesday morning
  
  Zoom
  - 22
    
    The Controls and Configuration Software of the ATLAS Data Acquisition System: evolution towards LHC Run 3
    
    The ATLAS experiment at the Large Hadron Collider (LHC) op- erated very successfully in the years 2008 to 2018, in two periods identified as Run 1 and Run 2. ATLAS achieved an overall data-taking efficiency of 94%, largely constrained by the irreducible dead-time introduced to accommodate the limitations of the detector read-out electronics. Out of the 6% dead-time only about 15% could be attributed to the central trigger and DAQ system, and out of these, a negligible fraction was due to the Control and Configuration sub- system. Despite these achievements, and in order to improve even more the already excellent efficiency of the whole DAQ system in the coming Run 3, a new campaign of software updates was launched for the second long LHC shutdown (LS2). This paper presents, using a few selected examples, how the work was approached and which new technologies were introduced into the AT- LAS Control and Configuration software. Despite these being specific to this system, many solutions can be considered and adapted to different distributed DAQ systems.
    
    Speaker: Andrei Kazarov (NRC Kurchatov Institute PNPI (RU))
    
    CCSWEvolutionRun2to3_v3.pdf
    
    Recording
  - 23
    
    Development of the Safety System for the Inner Tracking System of the ALICE Experiment
    
    During the LHC Long Shutdown 2, the ALICE experiment has undergone numerous upgrades to cope with the large amount of data expected in Run3. Among all new elements integrated into ALICE, the experiment counts with a new Inner Tracking System (ITS), equipped with innovative pixel sensors that will substantially improve the performance of the system. The new detector is equipped with a complex Low Voltage (LV) distribution, increasing the power dissipated by the detector and requiring the installation of a large number of temperature measurement points. In 2020, a new safety system has been developed to distribute the ITS LV interlock system and to monitor the new temperature values. The safety system is based on a Siemens S7-1500 PLC device. The control application governing the PLC has been configured through the UNICOS- CPC infrastructure made at CERN for the standardisation of industrial applications. UNICOS-CPC enables both the automatisation of control tasks governing the PLC and the interface to the WinCC OA based SCADA system. This paper provides a complete description of the setup of this safety system.
    
    Speaker: Patricia Mendez Lorenzo (CERN)
    
    chep2021_talk_aliceITS.pdf
    
    Recording
  - 24
    
    Understanding ATLAS infrastructure behaviour with an Expert System
    
    The ATLAS detector requires a huge infrastructure consisting of numerous interconnected systems forming a complex mesh which requires constant maintenance and upgrades. The ATLAS Technical Coordination Expert System provides, by the means of a user interface, a quick and deep understanding of the infrastructure, which helps to plan interventions by foreseeing unexpected consequences, and to understand complex events when time is crucial in the ATLAS control room.
    It is an object-oriented expert system based on the knowledge composed of inference rules and information from diverse domains such as detector control and safety systems, gas, water, cooling, ventilation, cryogenics, and electricity distribution.
    
    This paper discusses the latest developments in the inference engine and the implementation of the most probable cause algorithm based on them. One example from the annual maintenance of the 15$^{\circ}$C water circuit chillers is discussed.
    
    Speaker: Ignacio Asensi Tortajada (Univ. of Valencia and CSIC (ES))
    
    Recording
    
    slides_vchep2021_final.pdf
  - 25
    
    Integration and Commissioning of the Software-based Readout System for ATLAS Level-1 Endcap Muon Trigger in Run 3
    
    The Large Hadron Collider and the ATLAS experiment at CERN will explore new frontiers in physics in Run 3 starting in 2022. In the Run 3 ATLAS Level-1 endcap muon trigger, new detectors called New Small Wheel and additional Resistive Plate Chambers will be installed to improve momentum resolution and to enhance the rejection of fake muons. The Level-1 endcap muon trigger algorithm will be processed by new trigger processor boards with modern FPGAs and high-speed optical serial links. For validation and performance evaluation, the inputs and outputs of their trigger logic will be read out using a newly developed software-based readout system. We have successfully integrated this readout system in the ATLAS online software framework, enabling commissioning in the actual Run 3 environment. Stable trigger readout has been realized for input rates up to 100 kHz with a developed event-building application. We have verified that its performance is sufficient for Run 3 operation in terms of event data size and trigger rate. The paper will present the details of the integration and commissioning of the software-based readout system for ATLAS Level-1 endcap muon trigger in Run 3.
    
    Speaker: Kaito Sugizaki (University of Tokyo (JP))
    
    Recording
    
    sugizaki_vchep_final.pdf
  - 26
    
    A real-time FPGA-based cluster finding algorithm for LHCb silicon pixel detector
    
    Starting from the next LHC run, the upgraded LHCb High Level Trigger will process events at the full LHC collision rate (averaging 30 MHz). This challenging goal, tackled using a large and heterogeneous computing farm, can be eased addressing lowest-level, more repetitive tasks at the earliest stages of the data acquisition chain. FPGA devices are very well-suited to perform with a high degree of parallelism and efficiency certain computations, that would be significantly demanding if performed on general-purpose architectures. A particularly time-demanding task is the cluster-finding process, due to the 2D pixel geometry of the new LHCb pixel detector. We describe here a custom highly parallel FPGA-based clustering algorithm and its firmware implementation. The algorithm implementation has shown excellent reconstruction quality during qualification tests, while requiring a modest amount of hardware resources. Therefore it can run in the LHCb FPGA readout cards in real time, during data taking at 30 MHz, representing a promising alternative solution to more common CPU-based algorithms.
    
    Speaker: Giovanni Bassi (SNS & INFN Pisa (IT))
    
    CHEP_FPGA_clustering_v3.pdf
    
    Recording
- Software: Tue AM
  
  Conveners: Benjamin Krikler (University of Bristol (GB)), David Bouvet (IN2P3/CNRS (FR))
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 27
    
    Daisy: Data analysis integrated software system for X-ray experiments
    
    Daisy (Data Analysis Integrated Software System) has been designed for the analysis and visualization of the X-ray experiments. To address an extensive range of Chinese radiation facilities community’s requirements from purely algorithmic problems to scientific computing infrastructure, Daisy sets up a cloud-native platform to support on-site data analysis services with fast feedback and interaction. The plugs-in based application is convenient to process the expected high throughput data flow in parallel at next-generation facilities such as the High Energy Photon Source (HEPS). The objectives, functionality and architecture of Daisy are described in this article.
    
    Speaker: Haolai Tian (Institute of High Energy Physics)
    
    HEPS-tianhl-20210518.pdf
    
    Recording
  - 28
    
    Readable and efficient HEP data analysis with bamboo
    
    With the LHC continuing to collect more data and experimental analyses becoming increasingly complex, tools to efficiently develop and execute
    these analyses are essential. The bamboo framework defines a domain-specific
    language, embedded in python, that allows to concisely express the analysis
    logic in a functional style. The implementation based on ROOT’s RDataFrame
    and cling C++ JIT compiler approaches the performance of dedicated native
    code. Bamboo is currently being used for several CMS Run 2 analyses that
    rely on the NanoAOD data format, which will become more common in Run
    3 and beyond, and for which many reusable components are included, but it
    provides many possibilities for customisation, which allow for straightforward
    adaptation to other formats and workflows.
    
    Speaker: Pieter David (Universite Catholique de Louvain (UCL) (BE))
    
    bamboo_vCHEP2021.pdf
    
    Recording
  - 29
    
    Recent advances in ADL, CutLang and adl2tnm
    
    This paper presents an overview and features of an Analysis Description Language (ADL) designed for HEP data analysis. ADL is a domain-specific, declarative language that describes the physics content of an analysis in a standard and unambiguous way, independent of any computing frameworks. It also describes infrastructures that render ADL executable, namely CutLang, a direct runtime interpreter (originally also a language), and adl2tnm, a transpiler converting ADL into C++ code. In ADL, analyses are described in human-readable plain text files, clearly separating object, variable and event selection definitions in blocks, with a syntax that includes mathematical and logical operations, comparison and optimisation operators, reducers, four-vector algebra and commonly used functions. Recent studies demonstrate that adapting the ADL approach has numerous benefits for the experimental and phenomenological HEP communities. These include facilitating the abstraction, design, optimization, visualization, validation, combination, reproduction, interpretation and overall communication of the analysis contents and long term preservation of the analyses beyond the lifetimes of experiments. Here we also discuss some of the current ADL applications in physics studies and future prospects based on static analysis and differentiable programming.
    
    Speaker: Gokhan Unel (University of California Irvine (US))
    
    Recording
    
    vCHEP-unel-2021.pdf
  - 30
    
    ALICE Run 3 Analysis Framework
    
    In LHC Run 3 the ALICE Collaboration will have to cope in Run 3 with an increase of lead-lead collision data of two orders of magnitude com- pared to the Run 1 and 2 data-taking periods. The Online-Offline (O$^2$) software framework has been developed to allow for distributed and efficient process- ing of this unprecedented amount of data. Its design, which is based on a message-passing back end, required the development of a dedicated Analysis Framework that uses columnar data format provided by Apache Arrow. The O2 Analysis Framework provides a user-friendly high-level interface and hides the complexity of the underlying distributed framework. It allows the users to access and manipulate the data in the new format both in the traditional "event loop" and a declarative approach using bulk processing operations based on Arrow’s Gandiva sub-project. Building on the well-tested system of analysis trains developed by ALICE in Run 1 and 2, the AliHyperloop infrastructure is being developed. It provides a fast and intuitive user interface for running demand- ing analysis workflows in the GRID environment and on the dedicated Analysis Facility. In this document, we report on the current state and ongoing develop- ments of the Analysis Framework and of AliHyperloop, highlighting the design choices and the benefits of the new system.
    
    Speaker: Anton Alkin (CERN)
    
    Alkin-vCHEP-Run3AnalysisFramework-180521.pdf
    
    Recording
  - 31
    
    Analysis of heavy-flavour particles in ALICE with the O2 analysis framework
    
    Precise measurements of heavy-flavour hadrons down to very low pT represent the core of the physics program of the upgraded ALICE experiment in Run 3.
    These physics probes are characterised by a very small signal-to-background ratio requiring very large statistics of minimum-bias events.
    In Run 3, ALICE is expected to collect up to 13 nb^{-1} of lead–lead collisions, corresponding to about 1e11 minimum-bias events.
    In order to analyse this unprecedented amount of data, which is about 100 times larger than the statistics collected in Run 1 and Run 2, the ALICE collaboration is developing a complex analysis framework that aims at maximising the processing speed and data volume reduction.
    In this paper, the strategy of reconstruction, selection, skimming, and analysis of heavy-flavour events for Run 3 will be presented.
    Some preliminary results on the reconstruction of charm mesons and baryons will be shown and the prospects for future developments and optimisation discussed.
    
    Speaker: Vit Kucera (CERN)
    
    2021-05-18_vCHEP_HFO2.pdf
    
    Recording
  - 32
    
    FuncADL: Functional Analysis Description Language
    
    The traditional approach in HEP analysis software is to loop over every event and every object via the ROOT framework. This method follows an imperative paradigm, in which the code is tied to the storage format and steps of execution. A more desirable strategy would be to implement a declarative language, such that the storage medium and execution are not included in the abstraction model. This will become increasingly important to managing the large dataset collected by the LHC and the HL-LHC. A new analysis description language (ADL) inspired by functional programming, FuncADL, was developed using Python as a host language. The expressiveness of this language was tested by implementing example analysis tasks designed to benchmark the functionality of ADLs. Many simple selections are expressible in a declarative way with FuncADL, which can be used as an interface to retrieve filtered data. Some limitations were identified, but the design of the language allows for future extensions to add missing features. FuncADL is part of a suite of analysis software tools being developed by the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP). These tools will be available to develop highly scalable physics analyses for the LHC.
    
    Speaker: Mason Proffitt (University of Washington (US))
    
    Recording
    
    vCHEP 2021 FuncADL v4 (final).pdf
- Storage: Tue AM
  
  Conveners: Patrick Fuhrmann (Deutsches Elektronen-Synchrotron (DE)), Peter Clarke (The University of Edinburgh (GB))
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 33
    
    Evaluation of a high-performance storage buffer with 3D XPoint devices for the DUNE data acquisition system
    
    The DUNE detector is a neutrino physics experiment that is expected to take data starting from 2028. The data acquisition (DAQ) system of the experiment is designed to sustain several TB/s of incoming data which will be temporarily buffered while being processed by a software based data selection system.
    
    In DUNE, some rare physics processes (e.g. Supernovae Burst events) require storing the full complement of data produced over 1-2 minute window. These are recognised by the data selection system which fires a specific trigger decision. Upon reception of this decision data are moved from the temporary buffers to local, high performance, persistent storage devices. In this paper we characterize the performance of novel 3DXPoint SSD devices under different workloads suitable for high-performance storage applications. We then illustrate how such devices may be applied to the DUNE use-case: to store, upon a specific signal, 100 seconds of incoming data at 1.5 TB/s distributed among 150 identical units each operating at approximately 10 GB/s.
    
    Speaker: Adam Abed Abud (University of Liverpool (GB) and CERN)
    
    Adam_Abed_Abud_vCHEP_Micron.pdf
    
    Recording
  - 34
    
    Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System
    
    The ATLAS experiment will undergo a major upgrade to take advantage of the new conditions provided by the upgraded High-Luminosity LHC. The Trigger and Data Acquisition system (TDAQ) will record data at unprecedented rates: the detectors will be read out at 1 MHz generating around 5 TB/s of data. The Dataflow system (DF), component of TDAQ, introduces a novel design: readout data are buffered on persistent storage while the event filtering system analyses them to select 10000 events per second for a total recorded throughput of around 60 GB/s. This approach allows for decoupling the detector activity from the event selection process. New challenges then arise for DF: design and implement a distributed, reliable, persistent storage system supporting several TB/s of aggregated throughput while providing tens of PB of capacity. In this paper we first describe some of the challenges that DF is facing: data safety with persistent storage limitations, indexing of data at high-granularity in a highly-distributed system, and high-performance management of storage capacity. Then the ongoing R&D to address each of them is presented and the performance achieved with a working prototype is shown.
    
    Speaker: Matias Alejandro Bonaventura (CERN)
    
    Recording
    
    vCHEP2021 - ATLAS DAQ PhaseII (5).pdf
  - 35
    
    Enabling interoperable data and application services in a federated ScienceMesh
    
    In recent years, cloud sync & share storage services, provided by academic and research institutions, have become a daily workplace environment for many local user groups in the High Energy Physics (HEP) community. These, however, are primarily disconnected and deployed in isolation from one another, even though new technologies have been developed and integrated to further increase the value of data. The EU-funded CS3MESH4EOSC project is connecting locally and individually provided sync and share services, and scaling them up to the European level and beyond. It aims to deliver the ScienceMesh service, an interoperable platform to easily sync and share data across institutions and extend functionalities by connecting to other research services using streamlined sets of interoperable protocols, APIs and deployment methodologies. This supports multiple distributed application workflows: data science environments, collaborative editing and data transfer services.
    
    In this paper, we present the architecture of ScienceMesh and the technical design of its reference implementation, a platform that allows organizations to join the federated service infrastructure easily and to access application services out-of-the-box. We discuss the challenges faced during the process, which include diversity of sync & share platforms (Nextcloud, Owncloud, Seafile and others), absence of global user identities and user discovery, lack of interoperable protocols and APIs, and access control and protection of data endpoints. We present the rationale for the design decisions adopted to tackle these challenges and describe our deployment architecture based on Kubernetes, which enabled us to utilize monitoring and tracing functionalities. We conclude by reporting on the early user experience with ScienceMesh.
    
    Speaker: Ishank Arora (CERN)
    
    CHEP_2021_ScienceMesh.pdf
    
    Recording
  - 36
    
    Porting the EOS from X86 (Intel) to aarch64 (ARM) architecture
    
    With the advancement of many large HEP experiments, the amount of data that needs to be processed and stored has increased significantly, so we must upgrade computing resources and improve the performance of storage software. This article discusses porting the EOS software from the x86_64 architecture to the aarch64 architecture, with the aim of finding a more cost-effective storage solution. In the process of porting, the biggest challenge is that many dependent packages do not have aarch64 version and need to be compiled by ourselves, and the assembly part of the software code also needs to be adjusted accordingly. Despite these challenges, we have successfully ported the EOS code to the aarch64. This article discusses the current status and plans for the software port as well as performance testing after porting.
    
    Speaker: Yaosong Cheng (IHEp)
    
    Porting the EOS from X86 (Intel) to aarch64 (ARM) architecture.pdf
    
    Recording
  - 37
    
    The first disk-based custodial storage for the ALICE experiment
    
    We proposed a disk-based custodial storage as an alternative to tape for the ALICE experiment at CERN to preserve its raw data.
    The proposed storage system relies on RAIN layout -- the implementation of erasure coding in the EOS storage suite, which is developed by CERN -- for data protection and takes full advantage of high-density JBOD enclosures to maximize storage capacity as well as to achieve cost-effectiveness comparable to tape.
    The system we present provides 18 PB of total raw capacity from the 18 set of high-density JBOD enclosures attached to 9 EOS front-end servers.
    In order to balance between usable space and data protection, the system will stripe a file into 16 chunks on the 4-parity enabled RAIN layout configured on top of 18 containerized EOS FSTs.
    Although the reduction rate of available space increases up to $33.3\%$ with this layout, the estimated annual data loss rate drops down to $8.6 \times 10^{-5}\%$.
    In this paper, we discuss the system architecture of the disk-based custodial storage, 4-parity RAIN layout, deployment automation, and the integration to the ALICE experiment in detail.
    
    Speaker: Sang Un Ahn (Korea Institute of Science & Technology Information (KR))
    
    Recording
    
    vchep2021_storage_asu.pdf
- Accelerators: Tue PM
  
  Conveners: Felice Pantaleo (CERN), Simon George (Royal Holloway, University of London)
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Wednesday afternoon
  
  Zoom
  - 38
    
    A Portable Implementation of RANLUX++
    
    High energy physics has a constant demand for random number generators (RNGs) with high statistical quality. In this paper, we present ROOT's implementation of the RANLUX++ generator. We discuss the choice of relying only on standard C++ for portability reasons. Building on an initial implementation, we describe a set of optimizations to increase generator speed. This allows to reach performance very close to the original assembler version. We test our implementation on an Apple M1 and Nvidia GPUs to demonstrate the advantages of portable code.
    
    Speaker: Jonas Hahnfeld (CERN)
    
    Hahnfeld_RANLUX++.pdf
    
    Recording
  - 39
    
    A Computing and Detector Simulation Framework for the HIBEAM/NNBAR Experimental Program at the ESS
    
    The HIBEAM/NNBAR program is a proposed two-stage experiment for the European Spallation Source focusing on searches for baryon number violation via processes in which neutrons convert to anti-neutrons. This paper outlines the computing and detector simulation framework for the HIBEAM/NNBAR program. The simulation is based on predictions of neutron flux and neutronics together with signal and background generation. A range of diverse simulation packages are incorporated, including Monte Carlo transport codes, neutron ray-trace simulation packages, and detector simulation software. The common simulation package in which these elements are interfaced together is discussed. Data management plans and triggers are also described.
    
    Speaker: Bernhard Meirose (Stockholms Universitet)
    
    Recording
    
    vCHEP2021_v6_Meirose.pdf
  - 40
    
    Performance of CUDA Unified Memory in CMS Heterogeneous Pixel Reconstruction
    
    The management of separate memory spaces of CPUs and GPUs brings an additional burden to the development of software for GPUs. To help with this, CUDA unified memory provides a single address space that can be accessed from both CPU and GPU. The automatic data transfer mechanism is based on page faults generated by the memory accesses. This mechanism has a performance cost, that can be with explicit memory prefetch requests. Various hints on the inteded usage of the memory regions can also be given to further improve the performance. The overall effect of unified memory compared to an explicit memory management can depend heavily on the application. In this paper we evaluate the performance impact of CUDA unified memory using the heterogeneous pixel reconstruction code from the CMS experiment as a realistic use case of a GPU-targeting HEP reconstruction software. We also compare the programming model using CUDA unified memory to the explicit management of separate CPU and GPU memory spaces.
    
    Speaker: Ka Hei Martin Kwok (Fermi National Accelerator Lab. (US))
    
    Recording
    
    vCHEP_patarack_UVM.pdf
  - 41
    
    Porting CMS Heterogeneous Pixel Reconstruction to Kokkos
    
    Programming for a diverse set of compute accelerators in addition to the CPU is a challenge. Maintaining separate source code for each architecture would require lots of effort, and development of new algorithms would be daunting if it had to be repeated many times. Fortunately there are several portability technologies on the market such as Alpaka, Kokkos, and SYCL. These technologies aim to improve the developer productivity by making it possible to use the same source code for many different architectures. In this paper we use heterogeneous pixel reconstruction code from the CMS experiment at the CERNL LHC as a realistic use case of a GPU-targeting HEP reconstruction software, and report experience from prototyping a portable version of it using Kokkos. The development was done in a standalone program that attempts to model many of the complexities of a HEP data processing framework such as CMSSW. We also compare the achieved event processing throughput to the original CUDA code and a CPU version of it.
    
    Speaker: Matti Kortelainen (Fermi National Accelerator Lab. (US))
    
    20210518-vCHEP21_CMSKokkos.pdf
    
    Recording
  - 42
    
    Heterogeneous techniques for rescaling energy deposits in the CMS Phase-2 endcap calorimeter
    
    We present the porting to heterogeneous architectures of the algorithm used for applying linear transformations of raw energy deposits in the CMS High Granularity Calorimeter (HGCAL). This is the first heterogeneous algorithm to be fully integrated with HGCAL’s reconstruction chain. After introducing the latter and giving a brief description of the structural components of HGCAL relevant for this work, the role of the linear transformations in the calibration is reviewed. We discuss how this work facilitates the porting of other algorithms in the existing reconstruction process, as well as integrating algorithms previously ported (but not yet integrated). The many ways in which parallelization is achieved are described, and the successful validation of the heterogeneous algorithm is covered. Detailed performance measurements are presented, showing the wall time of both CPU and GPU algorithms, and therefore establishing the corresponding speedup.
    
    Speaker: Bruno Alves (ADI Agencia de Inovacao (PT))
    
    HGCAL_CMS_GPU_Rescaling.pdf
    
    paper.pdf
    
    Recording
  - 43
    
    Usage of GPUs in ALICE Online and Offline processing during LHC Run 3
    
    ALICE will significantly increase its Pb--Pb data taking rate from the 1\,kHz of triggered readout in Run 2 to 50 kHz of continuous readout for LHC Run 3.
    Updated tracking detectors are installed for Run 3 and a new two-phase computing strategy is employed.
    In the first synchronous phase during the data taking, the raw data is compressed for storage to an on-site disk buffer and the required data for the detector calibration is collected.
    In the second asynchronous phase the compressed raw data is reprocessed using the final calibration to produce the final reconstruction output.
    Traditional CPUs are unable to cope with the huge data rate and processing demands of the synchronous phase, therefore ALICE employs GPUs to speed up the processing.
    Since the online computing farm performs a part of the asynchronous processing when there is no beam in the LHC, ALICE plans to use the GPUs also for this second phase.
    This paper gives an overview of the GPU processing in the synchronous phase, the full system test to validate the reference GPU architecture, and the prospects for the GPU usage in the asynchronous phase.
    
    Speaker: David Rohr (CERN)
    
    2021-05-18 CHEP2021.pdf
    
    Recording
- Algorithms: Tue PM
  
  Conveners: Dorothea Vom Bruch (Aix Marseille Univ, CNRS/IN2P3, CPPM, Marseille, France), Gordon Watts (University of Washington (US))
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday morning
  
  Zoom
  - 44
    
    Optimization of Geant4 for the Belle II software library
    
    The SuperKEKB/Belle II experiment expects to collect 50 $\mathrm{ab}^{-1}$ of collision data during the next decade. Study of this data requires monumental computing resources to process and to generate the required simulation events necessary for physics analysis. At the core of the Belle II simulation library is the Geant4 toolkit. To use the available computing resources more efficiently, the physics list for Geant4 has been optimized for the Belle II environment, and various other strategies were applied to improve the performance of the Geant4 toolkit in the Belle II software library. Following the inclusion of this newly optimized physics list in an updated version of Geant4 toolkit, we obtain much better CPU usage during event simulation and reduce the computing resource usage by $\sim$ 44 %.
    
    Speaker: Swagato Banerjee (University of Louisville (US))
    
    Belle2_G4Val_CHEP21.pdf
    
    Recording
  - 45
    
    Validation of Physics Models of Geant4 Versions 10.4.p03, 10.6.p02 and 10.7.p01 using Data from the CMS Experiment
    
    CMS tuned its simulation program and chose a specific physics model of Geant4 by comparing the simulation results with dedicated test beam experiments. Test beam data provide measurements of energy response of the calorimeter as well as resolution for well identified charged hadrons over a large energy region. CMS continues to validate the physics models using the test beam data as well as collision data from the Large Hadron Collider. Isolated charged particles are measured simultaneously in the tracker as well as in the calorimeters. These events are selected using dedicated triggers and are used to measure the response in the calorimeter. Different versions of Geant4 (10.2.p02, 10.4.p03, 10.6.p02) have been used by CMS for its Monte Carlo production and a new version (10.7) is now chosen for future productions. A suitable physics list (collection of physics models) is chosen by optimizing performance against accuracy. A detailed comparison between data and Geant4 predictions is presented in this paper.
    
    Speaker: Sunanda Banerjee (Fermi National Accelerator Lab. (US))
    
    Recording
    
    Sim-Talk125.pdf
    
    ValidationCMS_v1.pdf
  - 46
    
    The Fast Simulation Chain in the ATLAS experiment
    
    The ATLAS experiment relies heavily on simulated data, requiring the production on the order of billions of Monte Carlo-based proton-proton collisions every run period. As such, the simulation of collisions (events) is the single biggest CPU resource consumer. ATLAS's finite computing resources are at odds with the expected conditions during the High Luminosity LHC era, where the increase in proton-proton centre-of-mass energy and instantaneous luminosity will result in higher particle multiplicities and roughly fivefold additional interactions per bunch-crossing with respect to LHC Run-2. Therefore, significant effort within the collaboration is being focused on increasing the rate at which MC events can be produced by designing and developing fast alternatives to the algorithms used in the standard Monte Carlo production chain.
    
    Speaker: Martina Javurkova (University of Massachusetts (US))
    
    FastChain_vCHEP21 Javurkova_18May21.pdf
    
    Recording
  - 47
    
    An automated tool to facilitate consistent test-driven development of trigger selections for LHCb’s Run 3
    
    Upon its restart in 2022, the LHCb experiment at the LHC will run at higher instantaneous luminosity and utilize an unprecedented full-software trigger, promising greater physics reach and efficiency. On the flip side, conforming to offline data storage constraints becomes far more challenging. Both of these considerations necessitate a set of highly optimised trigger selections. We therefore present HltEfficiencyChecker: an automated extension to the LHCb trigger application, facilitating trigger development before data-taking driven by trigger rates and efficiencies. Since the default in 2022 will be to persist only the event's signal candidate to disk, discarding the rest of the event, we also compute efficiencies where the decision was due to the true MC signal, evaluated by matching it to the trigger candidate hit-by-hit. This matching procedure – which we validate here – demonstrates that the distinction between a “trigger” and a “trigger-on-signal” is crucial in characterising the performance of a trigger selection.
    
    Speaker: Ross John Hunter (University of Warwick (GB))
    
    Recording
    
    vCHEP_RossHunter_HltEfficiencyChecker.pdf
  - 48
    
    Determination of inter-system timing for Mini-CBM in 2020
    
    Future operation of the CBM detector requires ultra-fast analysis of the continuous stream of data from all subdetector systems. Determining the inter-system time shifts among individual detector systems in the existing prototype experiment Mini-CBM is an essential step for data processing and in particular for stable data taking. Based on the input of raw measurements from all detector systems, the corresponding time correlations can be obtained at digital level by evaluating the differences in time stamps. If the relevant systems are stable during data taking and sufficient digital measurements are available, the distribution of time differences should display a clear peak. Up to now, the outcome of the processed time differences is stored in histograms and the maximum peak is considered, after the evaluation of all timeslices of a run leading to significant run times. The results presented here demonstrate the stability of the synchronicity of Mini-CBM systems. Furthermore it is illustrated that relatively small amounts of raw measurements are sufficient to evaluate corresponding time correlations among individual Mini-CBM detectors, thus enabling fast online monitoring of them in future online data processing.
    
    Speaker: Dr Andreas Ralph Redelbach (Goethe University Frankfurt (DE))
    
    Recording
    
    Redelbach_CHEP_mCBM_slides.pdf
  - 49
    
    Apprentice for Event Generator Tuning
    
    Apprentice is a tool developed for event generator tuning. It contains a range of conceptual improvements and extensions over the tuning tool Professor. Its core functionality remains the construction of a multivariate analytic surrogate model to computationally expensive Monte Carlo event generator predictions. The surrogate model is used for numerical optimization in chi-square minimization and likelihood evaluation. Apprentice also introduces algorithms to automate the selection of observable weights to minimize the effect of mismodeling in the event generators. We illustrate our improvements for the task of MC-generator tuning and limit setting.
    
    Speaker: Mohan Krishnamoorthy (Argonne National Laboratory)
    
    Recording
    
    vCHEP2021_PPT.pdf
    
    vCHEP2021_PPT.pdf
    
    vCHEP2021_PPT.pdf
- Artificial Intelligence: Tue PM
  
  Conveners: Patrick Fuhrmann (Deutsches Elektronen-Synchrotron (DE)), Sofia Vallecorsa (CERN)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 50
    
    A Deep Learning approach to LHCb Calorimeter reconstruction using a Cellular Automaton
    
    The optimization of reconstruction algorithms has become a key aspect in LHCb as it is currently undergoing a major upgrade that will considerably increase the data processing rate. Aiming to accelerate the second most time consuming reconstruction process of the trigger, we propose an alternative reconstruction algorithm for the Electromagnetic Calorimeter of LHCb. Together with the use of deep learning techniques and the understanding of the current algorithm, our proposal decomposes the reconstruction process into small parts that benefit the generalized learning of small neural network architectures and simplifies the training dataset. This approach takes as input the full simulation data of the calorimeter and outputs a list of reconstructed clusters in a nearly constant time without any dependency in the event complexity.
    
    Speaker: Nuria Valls Canudas (La Salle, Ramon Llull University (ES))
    
    DLCaloReco_vCHEP21_v2.pdf
    
    Recording
  - 51
    
    Fast simulation of the electromagnetic calorimeter response using Self-Attention Generative Adversarial Networks
    
    Simulation is one of the key components in high energy physics. Historically it relies on the Monte Carlo methods which require a tremendous amount of computation resources. These methods may have difficulties with the expected High Luminosity Large Hadron Collider need, so the experiment is in urgent need of new fast simulation techniques. The application of Generative Adversarial Networks is a promising solution to speed up the simulation while providing the necessary physics performance. In this paper we propose the Self-Attention Generative Adversarial Network as a possible improvement of the network architecture. The application is demonstrated on the performance of generating responses of the LHCb type of the electromagnetic calorimeter.
    
    Speaker: Alexander Rogachev (Yandex School of Data Analysis (RU))
    
    Fast simulation of the electromagnetic calorimeter response using Self-Attention Generative Adversarial Networks vCHEP2021.pdf
    
    Recording
  - 52
    
    Graph Variational Autoencoder for Detector Reconstruction and Fast Simulation in High-Energy Physics
    
    Accurate and fast simulation of particle physics processes is crucial for the high-energy physics community. Simulating particle interactions with the detector is both time consuming and computationally expensive. With its proton-proton collision energy of 13 TeV, the Large Hadron Collider is uniquely positioned to detect and measure the rare phenomena that can shape our knowledge of new interactions. The High-Luminosity Large Hadron Collider (HL-LHC) upgrade will put a significant strain on the computing infrastructure and budget due to increased event rate and levels of pile-up. Simulation of high-energy physics collisions needs to be significantly faster without sacrificing the physics accuracy. Machine learning approaches can offer faster solutions, while maintaining a high level of fidelity. We introduce a graph generative model that provides effective reconstruction of LHC events on the level of calorimeter deposits and tracks, paving the way for full detector level fast simulation.
    
    Speaker: Ali Hariri (American University of Beirut (LB))
    
    Graph Variational Autoencoder for Detector Reconstruction and Fast Simulation in High-Energy Physics.pdf
    
    Recording
  - 53
    
    Particle identification with an electromagnetic calorimeter using a Convolutional Neural Network
    
    Based on the fact that showers in calorimeters depend on the type of particle, this note attempts to perform a particle classifier for electromagnetic and hadronic particles on an electromagnetic calorimeter, based on the energy deposit of individual cells. Using data from a Geant4 simulation of a proposal of a Crystal Fiber Calorimeter (SPACAL), foreseen for a future upgrade of the LHCb detector, a classifier is built using Convolutional Neural Networks. Results obtained demonstrate that the higher resolution of this ECAL allows to attain over 95% precision in some classifications such as photons against neutrons.
    
    Speaker: Mr Alex Rua Herrera (DS4DS, La Salle, Universitat Ramon Llull)
    
    Recording
    
    vCHEP2021 - Particle identification with an electromagnetic calorimeter using a Convolutional Neural Network.pdf
  - 54
    
    Conditional Wasserstein Generative Adversarial Networks for Fast Detector Simulation
    
    Detector simulation in high energy physics experiments is a key yet computationally expensive step in the event simulation process. There has been much recent interest in using deep generative models as a faster alternative to the full Monte Carlo simulation process in situations in which the utmost accuracy is not necessary. In this work we investigate the use of conditional Wasserstein Generative Adversarial Networks to simulate both hadronization and the detector response to jets. Our model takes the $4$-momenta of jets formed from partons post-showering and pre-hadronization as inputs and predicts the $4$-momenta of the corresponding reconstructed jet. Our model is trained on fully simulated $t\overline{t}$ events using the publicly available GEANT-based simulation of the CMS Collaboration. We demonstrate that the model produces accurate conditional reconstructed jet transverse momentum ($p_T$) distributions over a wide range of $p_T$ for the input parton jet. Our model takes only a fraction of the time necessary for conventional detector simulation methods, running on a CPU in less than a millisecond per event.
    
    Speaker: John Blue (Davidson College)
    
    CHEPPresentation.pdf
    
    Recording
- Facilities and Networks: Tue PM
  
  Conveners: David Bouvet (IN2P3/CNRS (FR)), Dr Shawn McKee (University of Michigan (US))
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Tuesday afternoon
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 55
    
    Ethernet evaluation in data distribution traffic for the LHCb filtering farm at CERN
    
    This paper evaluates the real-time distribution of data over Ethernet for the upgraded LHCb data acquisition cluster at CERN. The total estimated throughput of the system is 32 Terabits per second. After the events are assembled, they must be distributed for further data selection to the filtering farm of the online trigger. High-throughput and very low overhead transmissions will be an essential feature of such a system. In this work RoCE high-throughput Ethernet protocol and Ethernet flow control algorithms have been used to implement lossless events distribution. To generate LHCb-like traffic, a custom benchmark has been implemented. It was used to stress-test the selected Ethernet networks and to check resilience to uneven workload distribution. Performance tests were made with selected evaluation clusters. 100 Gb/s and 25 Gb/s links were used. Performance results and overall evaluation of this Ethernet-based approach are discussed.
    
    Speaker: Rafal Dominik Krawczyk (CERN)
    
    CHEP_21_RKRAWCZY.pdf
    
    Recording
  - 56
    
    Systematic benchmarking of HTTPS third party copy on 100Gbps links using XRootD
    
    The High Luminosity Large Hadron Collider provides a data challenge. The amount of data recorded from the experiments and transported to hundreds of sites will see a thirty fold increase in annual data volume. A systematic approach to contrast the performance of different Third Party Copy (TPC) transfer protocols arises. Two contenders, XRootD-HTTPS and the GridFTP are evaluated in their performance for transferring files from one server to another over 100Gbps interfaces. The benchmarking is done by scheduling pods on the Pacific Research Platform Kubernetes cluster to ensure reproducible and repeatable results. This opens a future pathway for network testing of any TPC transfer protocol.
    
    Speaker: Aashay Arora (University of California San Diego)
    
    Recording
    
    Systematic benchmarking of HTTPS third party copy on 100Gbps links using XRootD
    
    TPC CHEP.pdf
  - 57
    
    NOTED: a framework to optimise network traffic via the analysis of data from File Transfer Services
    
    Network traffic optimisation is difficult as the load is by nature dynamic and random. However, the increased usage of file transfer services may help the detection of future loads and the prediction of their expected duration. The NOTED project seeks to do exactly this and to dynamically adapt network topology to deliver improved bandwidth for users of such services. This article introduces, and explains the features of, the two main components of NOTED, the Transfer Broker and the Network Intelligence component.
    The Transfer Broker analyses all queued and on-going FTS transfers, producing a traffic report which can be used by network controllers. Based on this report and its knowledge of the network topology and routing, the Network Intelligence (NI) component makes decisions as to when a network reconfiguration could be beneficial. Any Software Defined Network controller can then apply these decision to the network, so optimising transfer execution time and reducing operating costs.
    
    Speaker: Edoardo Martelli (CERN)
    
    CHEP21-NOTED_framework.odp
    
    CHEP21-NOTED_framework.pdf
    
    Recording
  - 58
    
    Benchmarking NetBASILISK: a Network Security Project for Science
    
    Infrastructures supporting distributed scientific collaborations must address competing goals in both providing high-performance access to resources while simultaneously securing the infrastructure against security threats. The NetBASILISK project is attempting to improve the security of such infrastructures while not adversely impacting their performance. This paper will present our work to create a benchmark and monitoring infrastructure that allows us to test for any degradation in transferring data into a NetBASILISK protected site.
    
    Speaker: Jem Aizen Mendiola Guhit (University of Michigan (US))
    
    Recording
    
    vCHEP_Guhit.pdf
  - 59
    
    Proximeter CERN's detecting device for personnel
    
    The SARS COV 2 virus, the cause of the better known COVID-19 disease, has greatly altered our personal and professional lives. Many people are now expected to work from home but this is not always possible and, in such cases, it is the responsibility of the employer to implement protective measures. One simple such measure is to require that people maintain a distance of 2 metres but this places responsibility on employees and leads to two problems. Firstly, the likelihood that safety distances are not maintained and secondly that someone who becomes infected does not remember with whom they may have been in contact. To address both problems, CERN has developed the “proximeter”, a device that, when worn by employees, detects when they are in close proximity to others. Information about any such close contacts is sent securely over a Low Power Wide Area Network (LPWAN) and stored in a manner that respects confidentiality and privacy requirements. In the event that an employee becomes infected with COVID-19 CERN can thus identify all the possible contacts and so prevent the spread of the virus. We describe here the details of the proximeter device, the LPWAN infrastructure deployed at CERN, the communication mechanisms and the protocols used to respect the confidentiality of personal data.
    
    Speaker: Christoph Merscher (CERN)
    
    Proximeter_CERN_detecting_device_for_personnel.pdf
    
    Recording
- Software: Tue PM
  
  Conveners: Enrico Guiraud (EP-SFT, CERN), Teng Jian Khoo (Humboldt University of Berlin (DE))
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 60
    
    The GeoModel tool suite for detector description
    
    The GeoModel class library for detector description has recently been released as an open-source package and extended with a set of tools to allow much of the detector modeling to be carried out in a lightweight development environment, outside of large and complex software frameworks. These tools include the mechanisms for creating persistent representation of the geometry, an interactive 3D visualization tool, various command-line tools, a plugin system, and XML and JSON parsers. The overall goal of the tool suite is a fast geometry development cycle with quick visual feedback. The tool suite can be built on both Linux and Macintosh systems with minimal external dependencies. It includes useful command-line utilities: gmclash which runs clash detection, gmgeantino which generates geantino maps, and fullSimLight which runs GEANT4 simulation on geometry imported from GeoModel description. The GeoModel tool suite is presently in use in both the ATLAS and FASER experiments. In ATLAS it will be the basis of the LHC Run 4 geometry description.
    
    Speaker: Vakho Tsulaia (Lawrence Berkeley National Lab. (US))
    
    ATLAS_GeoModel_vCHEP.pdf
    
    Recording
  - 61
    
    Counter-based pseudorandom number generators for CORSIKA 8: A multi-thread friendly approach
    
    This document is devoted to the description of advances in the generation of high-quality random numbers for CORSIKA 8, which is being developed in modern C++17 and is designed to run on modern multi-thread processors and accelerators. CORSIKA 8 is a Monte Carlo simulation framework to model ultra-high energy secondary particle cascades in astroparticle physics. The aspects associated with the generation of high-quality random numbers on massively parallel platforms, like multi-core CPUs and GPUs, are reviewed and the deployment of counter-based engines using an innovative and multi-thread friendly API are described. The API is based on iterators providing a very well known access mechanism in C++, and also supports lazy evaluation. Moreover,an upgraded version of the Squares algorithm with highly efficient internal 128 as well as 256 bit counters is presented in this context. Performance measurements are provided, as well as comparisons with conventional designs are given. Finally, the integration into CORSIKA 8 is commented.
    
    Speaker: Dr Antonio Augusto Alves Junior (Institute for Astroparticle Physics of Karlsruhe Institute of Technology)
    
    Recording
    
    vCHEP2021.pdf
  - 62
    
    CAD support and new developments in DD4hep
    
    Consistent detector description is an integral part of all modern experiments and also the main motivation behind the creation of DD4hep, which tries to address detector description in a broad sense including: geometry and the materials used in the device, additional parameters describing e.g. the detection techniques, constants required for alignment and calibration, description of the readout structures and conditions data. A central component of DD4hep is DDG4 which is a mechanism that converts arbitrary DD4hep detector geometries to Geant4 and provides access to all Geant4 action stages. In addition to that DDG4 also offers a comprehensive plugins suite that includes handling of different IO formats, Monte Carlo truth linking and a large set of segmentation and sensitive detector classes, allowing the simulation of a wide variety of detector technologies. One of the last remaining open issues of detector description was support for drawings from civil engineers for passive detector components. In this proceedings we highlight recent developments in DD4hep/DDG4 that enable support for CAD drawings and generic tessellated shapes and through the help of the library assimp enable the import of a wide variety of CAD formats, thus eliminating the need for writing complex re-implementations of CAD drawings in source code. In addition, we present other developments such as support for a new output format called EDM4hep and developments for a more unified and easier handling of units.
    
    Speaker: Markus Frank (CERN)
    
    2021-05-18-CHEP21-DD4hep-Developments.pdf
    
    Recording
  - 63
    
    Key4hep: Status and Plans
    
    Detector optimisation and physics performance studies are an
    integral part for the development of future collider
    experiments. The Key4hep project aims to design a common set of
    software tools for future, or even present, High Energy Physics
    projects. These proceedings describe the main components that are
    developed as part of Key4hep: the event data model EDM4hep,
    simulation interfaces to Delphes and Geant4, the k4MarlinWrapper
    to integrate iLCSoft components, and build and validation tools
    to ensure functionality and compatibility among the
    components. They also include the different adaptation processes
    by the CEPC, CLIC, FCC, and ILC communities towards this project,
    which show that Key4hep is a viable long term solution as
    baseline software for high energy experiments.
    
    Speaker: Andre Sailer (CERN)
    
    210518_sailer_key4hep.pdf
    
    Recording
  - 64
    
    Preservation through modernisation: The software of the H1 experiment at HERA
    
    The lepton–proton collisions produced at the HERA collider represent a unique high energy physics data set. A number of years after the end of collisions, the data collected by the H1 experiment, as well as the simulated events and all software needed for reconstruction, simulation and data analysis were migrated into a preserved operational mode at DESY. A recent modernisation of the H1 software architecture has been performed, which will not only facilitate on going and future data analysis efforts with the new inclusion of modern analysis tools, but also ensure the long-term availability of the H1 data and associated software. The present status of the H1 software stack, the data, simulations and the currently supported computing platforms for data analysis activities are discussed.
    
    Speaker: Daniel Britzger (Max-Planck-Institut für Physik München)
    
    210518-britzger-CHEP-H1software.pdf
    
    Recording
- Storage: Tue PM
  
  Conveners: Cedric Serfon (Brookhaven National Laboratory (US)), Peter Clarke (The University of Edinburgh (GB))
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 65
    
    An intelligent Data Delivery Service for and beyond the ATLAS experiment
    
    The intelligent Data Delivery Service (iDDS) has been developed to cope with the huge increase of computing and storage resource usage in the coming LHC data taking. iDDS has been designed to intelligently orchestrate workflow and data management systems, decoupling data pre-processing, delivery, and main processing in various workflows. It is an experiment-agnostic service around a workflow- oriented structure to work with existing and emerging use cases in ATLAS and other experiments. Here we will present the motivation for iDDS, its design schema and architecture, use cases and current status, and plans for the future.
    
    Speaker: Wen Guan (University of Wisconsin (US))
    
    iDDS_vchep2021-9.pdf
    
    Recording
  - 66
    
    The ATLAS Data Carousel Project Status
    
    The High Luminosity upgrade to the LHC, which aims for a ten-fold increase in the luminosity of proton-proton collisions at an energy of 14 TeV, is expected to start operation in 2028/29, and will deliver an unprecedented volume of scientific data at the multi-exabyte scale. This amount of data has to be stored and the corresponding storage system must ensure fast and reliable data delivery for processing by scientific groups distributed all over the world. The present LHC computing and data management model will not be able to provide the required infrastructure growth even taking into account the expected hardware technology evolution. To address this challenge, the Data Carousel R&D project was launched by the ATLAS experiment in the fall of 2018. State-of-the-art data and workflow management technologies are under active development, and their current status is presented here.
    
    Speaker: Alexei Klimentov (Brookhaven National Laboratory (US))
    
    DataCarousel_vCHEP2021.pdf
    
    Recording
  - 67
    
    dCache: Inter-disciplinary storage system
    
    The dCache project provides open-source software deployed internationally to satisfy ever more demanding storage requirements. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, data sharing over wide area networks, efficient access from HPC clusters and long term data persistence on a tertiary storage. Though it was originally developed for the HEP experiments, today it is used by various scientific communities, including astrophysics, biomed, life science, which have their specific requirements. In this paper we describe some of the new requirements as well as demonstrate how dCache developers are addressing them.
    
    Speaker: Mr Tigran Mkrtchyan (DESY)
    
    dcache-ids_v3.pdf
    
    Recording
  - 68
    
    The GridKa tape storage: latest improvements and current production setup
    
    Tape storage remains the most cost-effective system for safe long-term storage of petabytes of data and reliably accessing it on demand. It has long been widely used by Tier-1 centers in WLCG. GridKa uses tape storage systems for LHC and non-LHC HEP experiments. The performance requirements on the tape storage systems are increasing every year, creating an increasing number of challenges in providing a scalable and reliable system. Therefore, providing high-performance, scalable and reliable tape storage systems is a top priority for Tier-1 centers in WLCG.
    
    At GridKa, various performance tests were recently done to investigate the existence of bottlenecks in the tape storage setup. As a result, several bottlenecks were identified and resolved, leading to a significant improvement in the overall tape storage performance. These results were achieved in a test environment and introduction of these achievements in to the production environment required a great effort, among many other things, a new software had to be developed to interact with the tape management software.
    
    This contribution provides detailed information on the latest improvements and changes on the GridKa tape storage setup.
    
    Speaker: Haykuhi Musheghyan (Georg August Universitaet Goettingen (DE))
    
    KIT_tape_vCHEP_2021_f.pdf
    
    Recording
  - 69
    
    Improving Performance of Tape Restore Request Scheduling in the Storage System dCache
    
    Given the anticipated increase in the amount of scientific data, it is widely accepted that primarily disk based storage will become prohibitively expensive. Tape based storage, on the other hand, provides a viable and affordable solution for the ever increasing demand for storage space. Coupled with a disk caching layer that temporarily holds a small fraction of the total data volume to allow for low latency access, it turns tape based systems into active archival storage (write once, read many) that imposes additional demands on data flow optimization compared to traditional backup setups (write once, read never). In order to preserve the lifetime of tapes and minimize the inherently higher access latency, different tape usage strategies are being evaluated. As an important disk storage system for scientific data that transparently handles tape access, dCache is making efforts to evaluate its recall optimization potential and is introducing a proof-of-concept, high-level stage request scheduling component within its SRM implementation.
    
    Speaker: Lea Morschel (Deutsches Elektronen-Synchrotron DESY)
    
    2021-CHEP-Bring-Online-Scheduling.pdf
    
    Recording
  - 70
    
    dCache: from Resilience to Quality of Service
    
    A major goal of future dCache development will be to allow users to define file Quality of Service (QoS) in a more flexible way than currently available. This will mean implementing what might be called a QoS rule engine responsible for registering and managing time-bound QoS transitions for files or storage units. In anticipation of this extension to existing dCache capabilities, the Resilience service, which maintains on-disk replica state, needs to undergo both structural modification and generalization. This paper describes ongoing work to transform Resilience into the new architecture which will eventually support a more broadly defined file QoS.
    
    Speaker: ALBERT ROSSI (Fermi National Accelerator Laboratory)
    
    CHEP-dcache-qos-2021.pdf
    
    CHEP-dcache-qos-2021.pptx
    
    Recording
- 16:20
  
  Break
- Tues PM Plenaries: Plenaries
  
  Conveners: Elizabeth Sexton-Kennedy (Fermi National Accelerator Lab. (US)), Richard Philip Mount (SLAC National Accelerator Laboratory (US))
  
  Mattermost
  
  Video recording
  
  Zoom
  - 71
    
    Deep Learning strategies for ProtoDUNE raw data denoising
    
    In this work we investigate different machine learning based strategies for
    denoising raw simulation data from ProtoDUNE experiment. ProtoDUNE detector
    is hosted by CERN and it aims to test and calibrate the technologies for DUNE, a
    forthcoming experiment in neutrino physics. Our models leverage deep learning
    algorithms to make the first step in the reconstruction workchain, which
    consists in converting digital detector signals into physical high level
    quantities. We benchmark this approach against traditional algorithms
    implemented by the DUNE collaboration. We test the capabilities of graph
    neural networks, while exploiting multi-GPU setups to accelerate training and
    inference processes.
    
    Speaker: Marco Rossi (CERN)
    
    pDUNEreco.pdf
    
    Recording
  - 72
    
    Artificial Neural Networks on FPGAs for Real-Time Energy Reconstruction of the ATLAS LAr Calorimeters
    
    Within the Phase-II upgrade of the LHC, the readout electronics of the ATLAS LAr Calorimeters is prepared for high luminosity operation expecting a pile-up of up to 200 simultaneous pp interactions. Moreover, the calorimeter signals of up to 25 subsequent collisions are overlapping, which increases the difficulty of energy reconstruction. Real-time processing of digitized pulses sampled at 40 MHz is thus performed using FPGAs.
    
    To cope with the signal pile-up, new machine learning approaches are explored: convolutional and recurrent neural networks outperform the optimal signal filter currently used, both in assignment of the reconstructed energy to the correct bunch crossing and in energy resolution.
    
    Very good agreement between neural network implementations in FPGA and software based calculations is observed. The FPGA resource usage, the latency and the operation frequency are analysed. Latest performance results and experience with prototype implementations will be reported.
    
    Speaker: Thomas Calvet (CPPM, Aix-Marseille Université, CNRS/IN2P3 (FR))
    
    calvet_LAr-ANN_210518.pdf
    
    Recording
  - 17:40
    
    Break
  - 73
    
    Quantum Support Vector Machines for Continuum Suppression in B Meson Decays
    
    Quantum computers have the potential for significant speed-ups of certain computational tasks. A possibility this opens up within the field of machine learning is the use of quantum features that would be inefficient to calculate classically. Machine learning algorithms are ubiquitous in particle physics and as advances are made in quantum machine learning technology, there may be a similar adoption of these quantum techniques.
    In this work a quantum support vector machine (QSVM) is implemented for signal-background classification. We investigate the effect of different quantum encoding circuits, the process that transforms classical data into a quantum state, on the final classification performance. We show an encoding approach that achieves an Area Under Receiver Operating Characteristic Curve (AUC) of 0.877 determined using quantum circuit simulations. For this same dataset the best classical method, a classical Support Vector Machine (SVM) using the Radial Basis Function (RBF) Kernel achieved an AUC of 0.865. Using a reduced dataset we then ran the algorithm on the IBM Quantum ibmq_casablanca device achieving an average AUC of 0.703. As further improvements to the error rates and availability of quantum computers materialise, they could form a new approach for data analysis in high energy physics.
    
    Speaker: Jamie Heredge (The University of Melbourne)
    
    Quantum Support Vector Machines for B Meson Continuum Supression (3).pdf
    
    Recording
  - 74
    
    EDM4hep and podio - The event data model of the Key4hep project and its implementation
    
    The EDM4hep project aims to design the common event data model for the Key4hep project and is generated via the podio toolkit. We present the first version of EDM4hep and discuss some of its use cases in the Key4hep project. Additionally, we discuss recent developments in podio, like the updates of the automatic code generation and also the addition of a second I/O backend based on SIO. We compare the available backends using benchmarks based on physics use cases, before we conclude with a discussion of currently ongoing work and future developments.
    
    Speaker: Thomas Madlener (Deutsches Elektronen-Synchrotron (DESY))
    
    edm4hep_podio_vchep2021.pdf
    
    Recording
Wednesday 19 May
- Weds AM Plenaries: Plenaries
  
  Conveners: Catherine Biscarat (L2I Toulouse, IN2P3/CNRS (FR)), Tommaso Boccali (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P)
  
  Mattermost
  
  Video recording
  
  Zoom
  - 75
    
    Full detector simulation with unprecedented background occupancy at a Muon Collider
    
    In recent years a Muon Collider has attracted a lot of interest in the High-Energy Physics community thanks to its ability of achieving clean inter- action signatures at multi-TeV collision energies in the most cost-effective way. Estimation of the physics potential of such an experiment must take into account the impact of beam-induced background on the detector performance, which has to be carefully evaluated using full detector simulation. Tracing of all the back- ground particles entering the detector region in a single bunch crossing is out of reach for any realistic computing facility due to the unprecedented number of such particles. In order to make it feasible a number of optimisations have been applied to the detector simulation workflow.
    
    This contribution presents an overview of the main characteristics of the beam-induced background at a Muon Collider, the detector technologies considered for the experiment and how they are taken into account to strongly reduce the number of irrelevant computations performed during the detector simulation. Special attention is dedicated to the optimisation of track reconstruction with the Conformal Tracking algorithm in this high-occupancy environment, which is the most computationally demand- ing part of event reconstruction.
    
    Speaker: Nazar Bartosik (Universita e INFN Torino (IT))
    
    2021_05_17_bartosik_v1.pdf
    
    Recording
  - 76
    
    HEPiX benchmarking solution for WLCG computing resources
    
    The HEPiX Benchmarking Working Group has been developing a benchmark based on actual software workloads of the High Energy Physics community. This approach, based on container technologies, is designed to provide a benchmark that is better correlated with the actual throughput of the experiment production workloads. It also offers the possibility to separately explore and describe the independent architectural features of different computing resource types. This is very important in view of the growing heterogeneity of the HEP computing landscape, where the role of non-traditional computing resources such as HPCs and GPUs is expected to increase significantly.
    
    Speaker: Miguel Fontes Medeiros (CERN)
    
    HEPiX_Benchmarking.pdf
    
    Recording
  - 77
    
    Integration of Rucio in Belle II
    
    Dirac and Rucio are two standard pieces of software widely used in the HEP domain. Dirac provides Workload and Data Management functionalities, among other things, while Rucio is a dedicated, advanced Distributed Data Management system. Many communities that already use Dirac express their interest in using Dirac for workload management in combination with Rucio for the Data management part. In this paper, we describe the integration of the Rucio File Catalog into Dirac that was initially developed for the Belle II collaboration.
    
    Speaker: Cedric Serfon (Brookhaven National Laboratory (US))
    
    2021-05-19 - Integration of Rucio into Belle II.pdf
    
    Recording
- 10:30
  
  Break
- Algorithms: Wed AM
  
  Conveners: David Rohr (CERN), Felice Pantaleo (CERN)
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday morning
  
  Zoom
  - 78
    
    Application of the missing mass method in the fixed-target program of the STAR experiment
    
    As part of the FAIR Phase-0 program, the fast FLES (First-Level Event Selection) package algorithms developed for the CBM experiment (FAIR/GSI, Germany) has been adapted for online and offline processing in the STAR experiment (BNL, USA). Using the same algorithms creates a bridge between online and offline modes. This allows combining online and offline resources for data processing.
    
    Thus, an express data production chain was created based on the STAR HLT farm, which extends the real-time functionality of HLT all the way down to physics analysis. The same express data production chain can be used on the RCF farm, which is used for fast offline production with the same tasks as the extended HLT. The express analysis chain does not interfere with the standard analysis chain.
    
    An important advantage of express analysis is that it allows you to start calibration, production, and analysis of the data as soon as it is available. Therefore, the use of express analysis can be useful for BES-II data production and help accelerate scientific discovery by helping to get results within a year after data collection is complete.
    
    Here we describe and discuss in detail the missing mass method that has been implemented as part of the KF Particle Finder package for searching and analyzing short-lived particles. Features of the application of the method within the framework of express real-time data processing are given, as well as the results of real-time reconstruction of short-lived particle decays in the BES-II environment.
    
    Speaker: Mr Pavel Kisel (Uni-Frankfurt, JINR)
    
    Kisel-Pavel-vCHEP-2021.pdf
    
    Recording
  - 79
    
    Track Finding for the PANDA Detector Based on Hough Transformations
    
    The PANDA experiment at FAIR (Facility for Antiproton and Ion
    Research) in Darmstadt is currently under construction. In order to reduce the
    amount of data collected during operation, it is essential to find all true tracks
    and to be able to distinguish them from false tracks. Part of the preparation
    for the experiment is therefore the development of a fast online track finder.
    This work presents an online track finding algorithm based on Hough transfor-
    mations, which is comparable in quality and performance to the currently best
    offline track finder in PANDA. In contrast to most track finders the algorithm
    can handle the challenge of extended hits delivered by PANDA’s central Straw
    Tube Tracker and thus benefit from its precise spatial resolution. Furthermore,
    optimization methods are presented that improved the ghost ratio as well as the
    speed of the algorithm by 70 %. Due to further development potential in terms
    of displaced vertex finding and speed optimization on GPUs, this algorithm
    promises to exceed the quality and speed of other track finders developed for
    PANDA.
    
    Speaker: Anna Alicke (Forschungszentrum Jülich)
    
    Alicke_HoughTrackFinder.pdf
    
    CHEP2021_05_19.pdf
    
    Recording
  - 80
    
    A novel reconstruction framework for an imaging calorimeter for HL-LHC
    
    To sustain the harsher conditions of the high-luminosity LHC, the CMS collaboration is designing a novel endcap calorimeter system. The new calorimeter will predominantly use silicon sensors to achieve sufficient radiation tolerance and will maintain highly-granular information in the readout to help mitigate the effects of pileup. In regions characterised by lower radiation levels, small scintillator tiles with individual on-tile SiPM readout are employed.
    A unique reconstruction framework (TICL: The Iterative CLustering) is being developed to fully exploit the granularity and other significant detector features, such as particle identification and precision timing, with a view to mitigate pileup in the very dense environment of HL-LHC. The inputs to the framework are clusters of energy deposited in individual calorimeter layers. Clusters are formed by a density-based algorithm. Recent developments and tunes of the clustering algorithm will be presented. To help reduce the expected pressure on the computing resources in the HL-LHC era, the algorithms and their data structures are designed to be executed on GPUs. Preliminary results will be presented on decreases in clustering time when using GPUs versus CPUs.
    Ideas for machine-learning techniques to further improve the speed and accuracy of reconstruction algorithms will be presented.
    
    Speaker: Dr Leonardo Cristella (CERN)
    
    HGCAL.pdf
    
    Recording
  - 81
    
    Simultaneous Global and Local Alignment of the Belle II Tracking Detectors
    
    The alignment of the Belle II tracking system composed of a pixel and strip vertex detectors and central drift chamber is described by approximately 60,000 parameters. These include internal local alignment: positions, orientations and surface deformations of silicon sensors and positions of drift chamber wires as well as global alignment: relative positions of the sub-detectors and larger structures.
    
    In the next data reprocessing, scheduled since Spring 2021, we aim to determine all parameters in a simultaneous fit by Millepede II, where recent developments allow to achieve a direct solution of the full problem in about one hour and make it practically feasible for regular detector alignment.
    
    The tracking detectors and the alignment technique are described and the alignment strategy is discussed in the context of studies on simulations and experience obtained from recorded data. Preliminary results and further refinements based on studies of real Belle II data are presented.
    
    Speaker: Tadeas Bilka (Charles University)
    
    Belle2AlignmentCHEP2021.pdf
    
    Recording
  - 82
    
    Improvements to ATLAS Inner Detector Track reconstruction for LHC Run-3
    
    This talk summarises the main changes to the ATLAS experiment’s Inner Detector Track reconstruction software chain in preparation of LHC Run 3 (2022-2024). The work was carried out to ensure that the expected high-activity collisions with on average 50 simultaneous proton-proton interactions per bunch crossing (pile-up) can be reconstructed promptly using the available computing resources. Performance figures in terms of CPU consumption for the key components of the reconstruction algorithm chain and their dependence on the pile-up are shown. For the design pile-up value of 60 the updated track reconstruction is a factor of 2 faster than the previous version.
    
    Speaker: Zachary Michael Schillaci (Brandeis University (US))
    
    Recording
    
    zschillaciCHEP2021.pdf
  - 83
    
    Basket Classifier: Fast and Optimal Restructuring of the Classifier for Differing Train and Target Samples
    
    The common approach for constructing a classifier for particle selection assumes reasonable consistency between train data samples and the target data sample used for the particular analysis. However, train and target data may have very different properties, like energy spectra for signal and background contributions. We suggest using ensemble of pre-trained classifiers, each of which is trained on exclusive subset of the total dataset, data baskets. Appropriate separate adjustment of separation thresholds for every basket classifier allows to dynamically adjust combined classifier and make optimal prediction for data with differing properties without re-training of the classifier. The approach is illustrated with a toy example. Quality dependency on the number of used data baskets is also presented
    
    Speaker: Mr Anton Philippov (HSE)
    
    Basket Classifier.pdf
    
    Recording
- Artificial Intelligence: Wed AM
  
  Conveners: Agnieszka Dziurda (Polish Academy of Sciences (PL)), Joosep Pata (National Institute of Chemical Physics and Biophysics (EE))
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 84
    
    Pixel Detector Background Generation using Generative Adversarial Networks at Belle II
    
    The pixel vertex detector (PXD) is an essential part of the Belle II detector recording particle positions. Data from the PXD and other sensors allow us to reconstruct particle tracks and decay vertices. The eﬀect of background hits on track reconstruction is simulated by adding measured or simulated background hit patterns to the hits produced by simulated signal particles. This model requires a large set of statistically independent PXD background noise samples to avoid a systematic bias of reconstructed tracks. However, data from the ﬁne-grained PXD requires a substantial amount of storage. As an eﬃcient way of producing background noise, we explore the idea of an on-demand PXD background generator using conditional Generative Adversarial Networks (GANs), adapted by the number of PXD sensors in order to both increase the image ﬁdelity and produce sensor-dependent PXD hitmaps.
    
    Speaker: Mr Hosein Hashemi (LMU)
    
    Recording
    
    vCHEP_PXDGAN.pdf
  - 85
    
    Machine learning for surface prediction in ACTS
    
    We present an ongoing R&D activity for machine-learning-assisted navigation through detectors to be used for track reconstruction. We investigate different approaches of training neural networks for surface prediction and compare their results. This work is carried out in the context of the ACTS tracking toolkit.
    
    Speaker: Mr Benjamin Huth (Universität Regensburg)
    
    Recording
    
    vchep21_huth_benjamin_ml_navigation.pdf
  - 86
    
    Deep neural network techniques in the calibration of space-charge distortion fluctuations for the ALICE TPC
    
    The Time Projection Chamber (TPC) of the ALICE experiment at the CERN LHC was upgraded for Run 3 and Run 4. Readout chambers based on Gas Electron Multiplier (GEM) technology and a new readout scheme allow continuous data taking at the highest interaction rates expected in Pb-Pb collisions. Due to the absence of a gating grid system, a significant amount of ions created in the multiplication region is expected to enter the TPC drift volume and distort the uniform electric field that guides the electrons to the readout pads. Analytical calculations were considered to correct for space-charge distortion fluctuations but they proved to be too slow for the calibration and reconstruction workflow in Run 3. In this paper, we discuss a novel strategy developed by the ALICE Collaboration to perform distortion-fluctuation corrections with machine learning and convolutional neural network techniques. The results of preliminary studies are shown and the prospects for further development and optimization are also discussed.
    
    Speaker: Ernst Hellbar (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
    
    20210519_DNNforSCfluctuationCalibrationALICETPC.pdf
    
    Recording
  - 87
    
    Accelerating End-to-End Deep Learning for Particle Reconstruction using CMS open data
    
    Machine learning algorithms are gaining ground in high energy physics for applications in particle and event identification, physics analysis, detector reconstruction, simulation and trigger. Currently, most data-analysis tasks at LHC experiments benefit from the use of machine learning. Incorporating these computational tools in the experimental framework presents new challenges.
    This paper reports on the implementation of the end-to-end deep learning with the CMS software framework and the scaling of the end-to-end deep learning with multiple GPUs.
    The end-to-end deep learning technique combines deep learning algorithms and low-level detector representation for particle and event identification. We demonstrate the end-to-end implementation on a top quark benchmark and perform studies with various hardware architectures including single and multiple GPUs and Google TPU.
    
    Speaker: Davide Di Croce (University of Alabama (US))
    
    AcceleratingDeepLearningReconstruction.pdf
    
    bburkle-vCHEP-2021-final.pdf
    
    Recording
  - 88
    
    Development of FPGA-based neural network regression models for the ATLAS Phase-II barrel muon trigger upgrade
    
    Effective selection of muon candidates is the cornerstone of the LHC physics programme. The ATLAS experiment uses the two-level trigger system for real-time selections of interesting events. The first-level hardware trigger system uses the Resistive Plate Chamber detector (RPC) for selecting muon candidates in the central (barrel) region of the detector. With the planned upgrades, the entirely new FPGA-based muon trigger system will be installed in 2025-2026. In this paper, neural network regression models are studied for potential applications in the new RPC trigger system. A simple simulation model of the current detector is developed for training and testing neural network regression models. Effects from additional cluster hits and noise hits are evaluated. Efficiency of selecting muon candidates is estimated as a function of the transverse muon momentum. Several models are evaluated and their performance is compared to that of the current detector, showing promising potential to improve on current algorithms for the ATLAS Phase-II barrel muon trigger upgrade.
    
    Speaker: Rustem Ospanov (University of Science and Technology of China)
    
    Recording
    
    rospanov_rpc_trigger_study.pdf
- Facilities and Networks: Wed AM
  
  Conveners: Daniela Bauer (Imperial College (GB)), David Bouvet (IN2P3/CNRS (FR))
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Tuesday afternoon
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 89
    
    Deploying a new realtime XRootD-v5 based monitoring framework for GridPP
    
    To optimise the performance of distributed compute, smaller lightweight storage caches are needed which integrate with existing grid computing workflows. A good solution to provide lightweight storage caches is to use an XRootD-proxy cache. To support distributed lightweight XRootD proxy services across GridPP we have developed a centralised monitoring framework.
    
    With the v5 release of XRootD it is possible to build a monitoring framework which collects distributed caching metadata broadcast from multiple sites. To provide the best support for these distributed caches we have built a centralised monitoring service for XRootD storage instances within GridPP. This monitoring solution is built upon experiences presented by CMS in setting up a similar service as part of their AAA system. This new framework is designed to provide remote monitoring of the behaviour, performance, and reliability of distributed XRootD services across the UK. Effort has been made to simplify ease of deployment by remote site administrators.
    
    The result of this work is an interactive dashboard system which enables administrators to access real-time metrics on the performance of their lightweight storage systems. This monitoring framework is intended to supplement existing functionality and availability testing metrics by providing detailed information and logging from a site perspective.
    
    Speaker: Dr Robert Andrew Currie (The University of Edinburgh (GB))
    
    Recording
    
    RemoteXrootDMonitoring_rcurrie.pdf
    
    RemoteXrootDMonitoring_rcurrie.pptx
  - 90
    
    Towards Real-World Applications of ServiceX, an Analysis Data Transformation System
    
    One of the biggest challenges in the High-Luminosity LHC (HL- LHC) era will be the significantly increased data size to be recorded and an- alyzed from the collisions at the ATLAS and CMS experiments. ServiceX is a software R&D project in the area of Data Organization, Management and Access of the IRIS- HEP to investigate new computational models for the HL- LHC era. ServiceX is an experiment-agnostic service to enable on-demand data delivery specifically tailored for nearly-interactive vectorized analyses. It is capable of retrieving data from grid sites, on-the-fly data transformation, and delivering user-selected data in a variety of different formats. New features will be presented that make the service ready for public use. An ongoing effort to integrate ServiceX with a popular statistical analysis framework in ATLAS will be described with an emphasis of a practical implementation of ServiceX into the physics analysis pipeline.
    
    Speaker: Kyungeon Choi (University of Texas at Austin (US))
    
    KyungEon_vCHEP2021.pdf
    
    Recording
  - 91
    
    Anomaly detection in the CERN cloud infrastructure
    
    Anomaly detection in the CERN OpenStack cloud is a challenging task due to the large scale of the computing infrastructure and, consequently, the large volume of monitoring data to analyse. The current solution to spot anomalous servers in the cloud infrastructure relies on a threshold-based alarming system carefully set by the system managers on the performance metrics of each infrastructure’s component. This contribution explores fully automated, unsupervised machine learning solutions in the anomaly detection field for time series metrics, by adapting both traditional and deep learning approaches. The paper describes a novel end-to-end data analytics pipeline implemented to digest the large amount of monitoring data and to expose anomalies to the system managers. The pipeline relies solely on open-source tools and frameworks, such as Spark, Apache Airflow, Kubernetes, Grafana, Elasticsearch. In addition, an approach to build annotated datasets from the CERN cloud monitoring data is reported. Finally, a preliminary performance of a number of anomaly detection algorithms is evaluated by using the aforementioned annotated datasets.
    
    Speaker: Stiven Metaj (Politecnico di Milano (IT))
    
    Recording
    
    vCHEP 19 MAY 2021 - Stiven Metaj.pdf
  - 92
    
    Reaching new peaks for the future of the CMS HTCondor Global Pool
    
    The CMS experiment at CERN employs a distributed computing infrastructure to satisfy its data processing and simulation needs. The CMS Submission Infrastructure team manages a dynamic HTCondor pool, aggregating mainly Grid clusters worldwide, but also HPC, Cloud and opportunistic resources. This CMS Global Pool, which currently involves over 70 computing sites worldwide and peaks at 300k CPU cores, is capable of successfully handling the simultaneous execution of up to 150k tasks. While the present infrastructure is sufficient to harness the current computing power scales, CMS latest estimates predict that at least a four-time increase in the total amount of CPU will be required in order to cope with the massive data increase of the High-Luminosity LHC (HL-LHC) era, planned to start in 2027. This contribution presents the latest results of the CMS Submission Infrastructure team in exploring the scalability reach of our Global Pool, in order to preventively detect and overcome any barriers in relation to the HL-LHC goals, while maintaining high efficiency in our workload scheduling and resource utilization.
    
    Speaker: Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
    
    20210519_SI_vCHEP.pdf
    
    Recording
  - 93
    
    Research and Evaluation of RoCE in IHEP Data Center
    
    With more and more large-scale scientific facilities are built, more and more HPC requirements are needed in IHEP. RDMA is a technology that allows servers in a network to exchange data in main memory without involving the processor, cache or operating system of either server, which can provide high bandwidth and low latency. There are two RDMA technologies which were InfiniBand and a relative new comer called RoCE – RDMA over Converged Ethernet. This paper introduces the RoCE technology, we research and compare the performance of both IB and RoCE in IHEP data center, and we also evaluate the application scenarios of RoCE which can support our future technology selection in HEPS. In the end, we present our future plan.
    
    Speaker: Dr Shan Zeng (IHEP)
    
    Recording
    
    Research and Evaluation of RoCE in IHEP Data Center@CHEP 2021.pdf
- Software: Wed AM
  
  Conveners: Enrico Guiraud (EP-SFT, CERN), Stefan Roiser (CERN)
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 94
    
    BAT. jl — A Julia-based tool for Bayesian inference
    
    We present BAT.jl 2.0, the next generation of the Bayesian Analysis Toolkit. BAT.jl is a highly efficient and easy to use software package for Bayesian Inference. It's predecessor, BAT 1.0 in C++, has been very successful over the years with a large number of citations. Our new incarnation of BAT was rewritten from scratch in Julia and we recently released the long-term stable version 2.0.
    
    Solving inference problems in the natural sciences, in particular High Energy Physics, often requires flexibility in using multiple programming languages, differentiable programming, and parallel execution on both CPU and GPU architectures. BAT.jl enables this by drawing on the unique capabilities of the Julia Programing Language. It provides efficient Metropolis-Hastings sampling, Hamiltonian Monte Carlo with automatic differentiation and nested sampling. We also provide algorithms to estimate the evidence (integral of the posterior), necessary to compute Bayesian factors, from posterior samples. BAT.jl uses a minimal set of dependencies and new algorithms can be easily added due to the toolbox structure of the package.
    
    BAT.jl continues to evolve, one of its new experimental features is a sampling algorithm with space partitioning. This algorithm can efficiently utilize distributed computing resources and sample posteriors with reduced burn-in overhead while dealing with multi-modal densities. We also provide the user with a set of plotting recipes to quickly visualize results.
    
    Speaker: Vasyl Hafych (Max-Planck-Institut fur Physik (DE))
    
    BAT.jl: A Julia-Based Tool for Bayesian Inference
    
    BAT-presentation.pdf
    
    Recording
  - 95
    
    ATLAS in-file metadata and multi-threaded processing
    
    Processing and scientific analysis of the data taken by the ATLAS experiment requires reliable information describing the event data recorded by the detector or generated in software. ATLAS event processing applications store such descriptive metadata information in the output data files along with the event information.
    
    To better leverage the available computing resources during LHC Run3 the ATLAS experiment has migrated its data processing and analysis software to a multi-threaded framework: AthenaMT. Therefore in-file metadata must support concurrent event processing, especially around input file boundaries. The in-file metadata handling software was originally designed for serial event processing. It grew into a rather complex system over the many years of ATLAS operation. To migrate this system to the multi-threaded environment it was necessary to adopt several pragmatic solutions, mainly because of the shortage of available person-power to work on this project in early phases of the AthenaMT development.
    
    In order to simplify the migration, first the redundant parts of the code were cleaned up wherever possible. Next the infrastructure was improved by removing reliance on constructs that are problematic during multi-threaded processing. Finally, the remaining software infrastructure was redesigned for thread safety.
    
    Speaker: Frank Berghaus (Argonne National Laboratory (US))
    
    Recording
    
    vCHEP_2021_ATLAS_in_file_Metadata__Presentation_.pdf
  - 96
    
    Software framework for the Super Charm-Tau factory detector project
    
    The project of Super Charm-Tau (SCT) factory --- a high-luminosity
    electron-positron collider for studying charmed hadrons and tau lepton
    --- is proposed by Budker INP. The project implies single collision point
    equipped with a universal particle detector. The Aurora software
    framework has been developed for the SCT detector. It is based on
    trusted and widely used in high energy physics software packages, such
    as Gaudi, Geant4, and ROOT. At the same time, new ideas and
    developments are employed, in particular the Aurora project benefits a
    lot from the turnkey software for future colliders (Key4HEP)
    initiative. This paper describes the first release of the Aurora
    framework, summarizes its core technologies, structure and roadmap for
    the near future.
    
    Speaker: Anastasiia Zhadan (BINP)
    
    Recording
    
    sctau_software.pdf
  - 97
    
    Exploring the virtues of XRootD5: Declarative API
    
    Across the years, being the backbone of numerous data management solutions used within the WLCG collaboration, the XRootD framework and protocol became one of the most important building blocks for storage solutions in the High Energy Physics (HEP) community. The latest big milestone for the project, release 5, introduced multitude of architectural improvements and functional enhancements, including the new client side declarative API, which is the main focus of this study. In this contribution we give an overview of the new client API and we discuss its motivation and its positive impact on overall software quality (coupling, cohesion), readability and composability.
    
    Speaker: Michal Kamil Simon (CERN)
    
    Recording
    
    vchep2021.pdf
    
    vchep2021.pptx
  - 98
    
    Building and steering binned template fits with cabinetry
    
    The cabinetry library provides a Python-based solution for building and steering binned template fits. It tightly integrates with the pythonic High Energy Physics ecosystem, and in particular with pyhf for statistical inference. cabinetry uses a declarative approach for building statistical models, with a JSON schema describing possible configuration choices. Model building instructions can additionally be provided via custom code, which is automatically executed when applicable at key steps of the workflow. The library implements interfaces for performing maximum likelihood fitting, upper parameter limit determination, and discovery significance calculation. cabinetry also provides a range of utilities to study and disseminate fit results. These include visualizations of the fit model and data, visualizations of template histograms and fit results, ranking of nuisance parameters by their impact, a goodness-of-fit calculation, and likelihood scans. The library takes a modular approach, allowing users to include some or all of its functionality in their workflow.
    
    Speaker: Alexander Held (New York University (US))
    
    20210519_cabinetry_vCHEP2021.pdf
    
    Building and steering template fits with cabinetry
    
    Recording
  - 99
    
    CORSIKA 8 -- A novel high-performance computing tool for particle cascade Monte Carlo simulations
    
    The CORSIKA 8 project is an international collaboration of scientists working together to deliver the most modern, flexible, robust and efficient framework for the simulation of ultra-high energy secondary particle cascades in matter. The main application is for cosmic ray air shower simulations, but is not limited to that. Besides a comprehensive collection of physics models and algorithms relevant for the field, also all possible interfaces to hardware acceleration (e.g.\ GPU) and parallelization (vectorization, multi-threading, multi-core) will be provided. We present the status and roadmap of this project. This code will soon be available for novel explorative studies and phenomonological research, and at the same time for massive productions runs for experiments.
    
    Speaker: Ralf Ulrich (KIT - Karlsruhe Institute of Technology (DE))
    
    CHEP2021_CORSIKA8.pdf
    
    CORSIKA main webpage
    
    Recording
- Storage: Wed AM
  
  Conveners: Cedric Serfon (Brookhaven National Laboratory (US)), Edoardo Martelli (CERN)
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 100
    
    Prototype of the Russian Scientific Data Lake
    
    The High Luminosity phase of the LHC, which aims for a ten-fold increase in the luminosity of proton-proton collisions is expected to start operation in eight years. An unprecedented scientific data volume at the multi-exabyte scale will be delivered to particle physics experiments at CERN. This amount of data has to be stored and the corresponding technology must ensure fast and reliable data delivery for processing by the scientific community allover the world. The present LHC computing model will not be able to provide the required infrastructure growth even taking into account the expected hard-ware evolution. To address this challenge the Data Lake R&D project has been launched by the DOMA community in the fall of 2019. State-of-the-art data handling technologies are under active development, and their current status for the Russian Scientific Data Lake prototype is presented here.
    
    Speaker: Mr Andrey Kirianov (NRC Kurchatov Institute PNPI (RU))
    
    Kiryanov - RU Data Lake.pdf
    
    Recording
  - 101
    
    ESCAPE Data Lake: Next-generation management of cross-discipline Exabyte-scale scientific data
    
    The European-funded ESCAPE project (Horizon 2020) aims to address computing challenges in the context of the European Open Science Cloud. The project targets Particle Physics and Astronomy facilities and research infrastructures, focusing on the development of solutions to handle Exabyte-scale datasets. The science projects in ESCAPE are in different phases of evolution and count a variety of specific use cases and challenges to be addressed. This contribution describes the shared-ecosystem architecture of services, the Data Lake, fulfilling the needs in terms of data organisation, management, and access of the ESCAPE community. The Pilot Data Lake consists of several storage services operated by the partner institutes and connected through reliable networks, and it adopts Rucio to orchestrate data management and organisation. The results of a 24-hour Full Dress Rehearsal are also presented, highlighting the achievements of the Data Lake model and of the ESCAPE sciences.
    
    Speaker: Dr Riccardo Di Maria (CERN)
    
    ESCAPE Data Lake - Next Generation Management of Exabytes of Cross-Discipline Scientific Data.pdf
    
    Recording
  - 102
    
    LHC Data Storage: Preparing for the Challenges of Run-3
    
    The CERN IT Storage Group ensures the symbiotic development
    and operations of storage and data transfer services for all CERN physics data,
    in particular the data generated by the four LHC experiments (ALICE, ATLAS,
    CMS and LHCb).
    In order to accomplish the objectives of the next run of the LHC (Run-3), the
    Storage Group has undertaken a thorough analysis of the experiments’ requirements,
    matching them to the appropriate storage and data transfer solutions, and
    undergoing a rigorous programme of testing to identify and solve any issues before
    the start of Run-3.
    In this paper, we present the main challenges presented by each of the four LHC
    experiments. We describe their workflows, in particular how they communicate
    with and use the key components provided by the Storage Group: the EOS
    disk storage system; its archival back-end, the CERN Tape Archive (CTA); and
    the File Transfer Service (FTS). We also describe the validation and commissioning
    tests that have been undertaken and challenges overcome: the ATLAS
    stress tests to push their DAQ system to its limits; the CMS migration from
    PhEDEx to Rucio, followed by large-scale tests between EOS and CTA with
    the new FTS “archive monitoring” feature; the LHCb Tier-0 to Tier-1 staging
    tests and XRootD Third Party Copy (TPC) validation; and the erasure coding
    performance in ALICE.
    
    Speaker: Dr Maria Arsuaga Rios (CERN)
    
    Recording
    
    RUN3 presentation_CHEP.pdf
  - 103
    
    CERN Tape Archive: a distributed, reliable and scalable scheduling system
    
    The CERN Tape Archive (CTA) provides a tape backend to disk systems and, in conjunction with EOS, is managing the data of the LHC experiments at CERN.
    
    Magnetic tape storage offer the lowest cost per unit volume today, followed by hard disks and flash. In addition, current tape drives deliver a solid bandwidth (typically 360MB/s per device), but at the cost of high latencies, both for mounting a tape in the drive and for positioning when accessing non-adjacent files. As a consequence, the transfer scheduler should queue transfer requests before the volume warranting a tape mount is reached. In spite of these transfer latencies, user-interactive operations should have a low latency.
    
    The scheduling system for CTA was built from the experience gained with CASTOR. Its implementation ensures reliability and predictable performance, while simplifying development and deployment. As CTA is expected to be used for a long time, lock-in to vendors or technologies was minimized.
    
    Finally quality assurance systems were put in place to validate reliability and performance while allowing fast and safe development turnaround.
    
    Speaker: Eric Cano (CERN)
    
    CHEP2021_CTA_slides.pdf
    
    Recording
  - 104
    
    Preparing for HL-LHC: Increasing the LHCb software publication rate to CVMFS by an order of magnitude
    
    In the HEP community, software plays a central role in the operation of experiments’ facilities and for reconstruction jobs, with CVMFS being the service enabling the distribution of software at scale. In view of High Luminosity LHC, CVMFS developers investigated how to improve the publication workflow to support the most demanding use cases. This paper reports about recent CVMFS developments and infrastructural updates that enable faster publication into existing repositories. A new CVMFS component, the CVMFS Gateway, allows for concurrent transactions and the use of multiple publishers, increasing the overall publication rate on a single repository. Also, the repository data has been migrated to Ceph-based S3 object storage, which brings a relevant performance enhancement over the previously-used Cinder volumes. We demonstrate how recent improvements allow for faster publication of software releases in CVMFS repositories by focusing on the LHCb nightly builds use case, which is currently by far the most demanding one for the CVMFS infrastructure at CERN. The publication of nightly builds is characterized by a high churn rate, needs regular garbage collection, and requires the ability to ingest a huge amount of software files over a limited period of time.
    
    Speaker: Enrico Bocchi (CERN)
    
    Increasing the LHCb software publication rate to CVMFS
    
    Recording
    
    vCHEP21, Increasing the LHCb software publication rate to CVMFS.pdf
  - 105
    
    Addressing a billion-entries multi-petabyte distributed filesystem backup problem with cback: from files to objects
    
    CERNBox is the cloud collaboration hub at CERN. The service has more than 37,000 user accounts. The backup of user and project data is critical for the service. The underlying storage system hosts over a billion files which amount to 12PB of storage distributed over several hundred disks with a two-replica RAIN layout. Performing a backup operation over this vast amount of data is a non-trivial task.
    
    The original CERNBox backup system (an in-house event-driven file-level system) has been reconsidered and replaced by a new distributed and scalable backup infrastructure based on the open source tool restic. The new system, codenamed cback, provides features needed in the HEP community to guarantee data safety and smooth operation from the system administrators. Daily snapshot-based backups of all our user and project areas along with automatic verification and restores are possible with this the new development.
    
    The backup data is also de-duplicated in blocks and stored as objects in a disk-based S3 cluster in another geographical location on the CERN campus, reducing storage costs and protecting critical data from major catastrophic events. We report on the design and operational experience of running the system and future improvement possibilities.
    
    Speaker: Roberto Valverde Cameselle (CERN)
    
    Recording
    
    vCHEP_cback.pdf
- Weds PM Plenaries: Plenaries
  
  Conveners: James Catmore (University of Oslo (NO)), Oxana Smirnova (Lund University (SE))
  
  Mattermost
  
  Video recording
  
  Zoom
  - 106
    
    An Error Analysis Toolkit for Binned Counting Experiments
    
    We introduce the MINERvA Analysis Toolkit (MAT), a utility for centralizing the handling of systematic uncertainties in HEP analyses. The fundamental utilities of the toolkit are the MnvHnD, a powerful histogram container class, and the systematic Universe classes, which provide a modular implementation of the many universe error analysis approach. These products can be used stand-alone or as part of a complete error analysis prescription. They support the propagation of systematic uncertainty through all stages of analysis, and provide flexibility for an arbitrary level of user customization. This extensible solution to error analysis enables the standardization of systematic uncertainty definitions across an experiment and a transparent user interface to lower the barrier to entry for new analyzers.
    
    Speaker: Dr Ben Messerly (University of Minnesota)
    
    2021-05-19_vCHEP_MAT.pdf
    
    Recording
  - 107
    
    Convolutional LSTM models to estimate network traffic
    
    Network utilisation efficiency can, at least in principle, often be improved by dynamically re-configuring routing policies to better distribute on-going large data transfers. Unfortunately, the information necessary to decide on an appropriate reconfiguration---details of on-going and upcoming data transfers such as their source and destination and, most importantly, their volume and duration---is usually lacking. Fortunately, the increased use of scheduled transfer services, such as FTS, makes it possible to collect the necessary information. However, the mere detection and characterisation of larger transfers is not sufficient to predict with confidence the likelihood a network link will become overloaded. In this paper we present the use of LSTM-based models (CNN-LSTM and Conv-LSTM) to effectively estimate future network traffic and so provide a solid basis for formulating a sensible network configuration plan.
    
    Speaker: Joanna Waczynska (Wroclaw University of Science and Technology (PL))
    
    2021-05-19_vCHEP_Waczynska.pdf
    
    Recording
  - 16:00
    
    Break
  - 108
    
    Design and engineering of a simplified workflow execution for the MG5aMC event generator on GPUs and vector CPUs
    
    Physics event generators are essential components of the data analysis software chain of high energy physics experiments, and important consumers of their CPU resources. Improving the software performance of these packages on modern hardware architectures, such as those deployed at HPC centers, is essential in view of the upcoming HL-LHC physics programme. In this contribution, we describe an ongoing activity to reengineer the Madgraph5_aMC@NLO physics event generator, primarily to port it and allow its efficient execution on GPUs, but also to modernize it and optimize its performance on traditional CPUs. In our presentation at the conference, we will describe the motivation, engineering process and software architecture design of our developments, as well as some of the challenges and future directions for this project. We also plan to present the status and results of our developments at the time of the presentation, including detailed software performance metrics.
    
    Speaker: Andrea Valassi (CERN)
    
    20210519-MGonGPU-vCHEP-AV-010.pdf
    
    20210519-MGonGPU-vCHEP-AV-010.pptx
    
    Recording
  - 109
    
    Accelerating IceCube's Photon Propagation Code with CUDA
    
    The IceCube Neutrino Observatory is a cubic kilometer neutrino detector located at the geographic South Pole designed to detect high-energy astrophysical neutrinos. To thoroughly understand the detected neutrinos and their properties, the detector response to signal and background has to be modeled using Monte Carlo techniques. An integral part of these studies are the optical properties of the ice the observatory is built into. The simulated propagation of individual photons from particles produced by neutrino interactions in the ice can be greatly accelerated using graphics processing units (GPUs). In this paper, we (a collaboration between NVIDIA and IceCube) reduced the propagation time per photon by a factor of 3. We achieved this by porting the OpenCL parts of the program to CUDA and optimizing the performance. This involved careful analysis and multiple changes to the algorithm. We also ported the code to NVIDIA OptiX to handle the collision detection. The hand-tuned CUDA algorithm turned out to be faster than OptiX. It exploits detector geometry and only a small fraction of photons ever travel close to one of the detectors.
    
    Speaker: Benedikt Riedel (University of Wisconsin-Madison)
    
    IceCube_CUDA_CHEP21.pdf
    
    Recording
- 17:20
  
  Break
- Accelerators: Wed PM
  
  Conveners: Dorothea Vom Bruch (Aix Marseille Univ, CNRS/IN2P3, CPPM, Marseille, France), Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Wednesday afternoon
  
  Zoom
  - 110
    
    Integration of JUNO simulation framework with Opticks: GPU accelerated optical propagation via NVIDIA OptiX
    
    Opticks is an open source project that accelerates optical photon simulation by integrating NVIDIA GPU ray tracing, accessed via NVIDIA OptiX, with
    Geant4 toolkit based simulations. A single NVIDIA Turing architecture GPU has been measured to provide optical photon simulation speedup factors exceeding 1500 times single threaded Geant4 with a full JUNO analytic GPU geometry automatically translated from the Geant4 geometry.
    Optical physics processes of scattering, absorption, scintillator reemission and
    boundary processes are implemented within CUDA OptiX programs based on the Geant4
    implementations. Wavelength-dependent material and surface properties as well as
    inverse cumulative distribution functions for reemission are interleaved into
    GPU textures providing fast interpolated property lookup or wavelength generation. Major recent developments are the integration of Opticks with the JUNO simulation framework using the minimal G4Opticks interface class and implementation of collection efficiency hit culling on GPU that enables only collected hits to be copied to CPU, substantially reducing both the CPU memory needed for photon hits and copying overheads. Also progress with the migration of Opticks to the all new NVIDIA OptiX 7 API is described.
    
    Speaker: simon blyth (IHEP, CAS)
    
    html version with javascript arrow key navigation
    
    opticks_vchep_2021_may19_v2c.pdf
    
    Recording
  - 111
    
    GPU simulation with Opticks: The future of optical simulations for LZ
    
    The LZ collaboration aims to directly detect dark matter by using a liquid xenon Time Projection Chamber (TPC). In order to probe the dark matter signal, observed signals are compared with simulations that model the detector response. The most computationally expensive aspect of these simulations is the propagation of photons in the detector’s sensitive volume. For this reason, we propose to offload photon propagation modelling to the Graphics Processing Unit (GPU), by integrating Opticks into the LZ simulations workflow. Opticks is a system which maps Geant4 geometry and photon generation steps to NVIDIA's OptiX GPU raytracing framework. This paradigm shift could simultaneously achieve a massive speedup and an increase in accuracy for LZ simulations. By using the technique of containerization through Shifter, we will produce a portable system to harness the NERSC supercomputing facilities, including the forthcoming Perlmutter supercomputer, and enable the GPU processing to handle different detector configurations. Prior experience with using Opticks to simulate JUNO indicates the potential for speedup factors over 1000$\times$ for LZ, and by extension other experiments requiring photon propagation simulations.
    
    Speaker: Oisin Creaner (Lawrence Berkeley National Laboratory)
    
    GPU simulation with Opticks: The future of optical simulations for LZ
    
    Recording
  - 112
    
    MadFlow: towards the automation of Monte Carlo simulation on GPU for particle physics processes
    
    In this proceedings we present MadFlow, a new framework for the automation of Monte Carlo (MC) simulation on graphics processing units (GPU) for particle physics processes. In order to automate MC simulation for a generic number of processes, we design a program which provides to the user the possibility to simulate custom processes through the MG5_aMC@NLO framework. The pipeline includes a first stage where the analytic expressions for matrix elements and phase space are generated and exported in a GPU-like format. The simulation is then performed using the VegasFlow and PDFFlow libraries which deploy automatically the full simulation on systems with different hardware acceleration capabilities, such as multi-threading CPU, single-GPU and multi-GPU setups. We show some preliminary results for leading-order simulations on different hardware configurations.
    
    Speaker: Dr Juan M. Cruz Martínez (University of Milan)
    
    juanCM.pdf
    
    Recording
  - 113
    
    Novel features and GPU performance analysis for EM particle transport in the Celeritas code
    
    Celeritas is a new computational transport code designed for high-performance
    simulation of high-energy physics detectors. This work describes some of its
    current capabilities and the design choices that enable the rapid development
    of efficient on-device physics. The abstractions that underpin the code design
    facilitate low-level performance tweaks that require no changes to the
    higher-level physics code. We evaluate a set of independent changes that
    together yield an almost 40\% speedup over the original GPU code for a net
    performance increase of $220\times$ for a single GPU over a single CPU running
    8.4M tracks on a small demonstration physics app.
    
    Speaker: Seth Johnson (Oak Ridge National Laboratory)
    
    Recording
    
    vchep-johnson-celeritas.pdf
  - 114
    
    Towards a cross-platform performance portability math kernel library in SYCL
    
    The increasing number of high-performance computing centers around the globe is providing physicists and other researchers access to heterogeneous systems -- comprising multiple central processing units and graphics processing units per node -- with various platforms. However, it is more often than not the case that domain scientists have limited resources such that writing multiple implementations of their codes to target the different platforms is unfeasible. To help address this, a number of portability layers are being developed that aim to allow programmers to achieve performant, portable codes; for example, Intel(R) oneAPI, which is based on the SYCL programming model. Nevertheless, portable application programming interfaces often lack some features and tools that are manifest in a platform-specific API. High-energy physicists in particular rely heavily on large sets of random numbers in nearly their entire workflow, from event generation to analysis. In this paper, we detail the implementation of a cuRAND backend into Intel's oneMKL, permitting random number generation within oneAPI applications on NVIDIA hardware using libraries optimised for these devices. By utilizing existing optimisations, we demonstrate the ability to achieve nearly native performance in cross-platform applications.
    
    Speaker: Vincent Pascuzzi (Lawrence Berkeley National Lab. (US))
    
    Google Slides
    
    Recording
  - 115
    
    PandAna: A Python Analysis Framework for Scalable High Performance Computing in High Energy Physics
    
    Modern experiments in high energy physics analyze millions of events recorded in particle detectors to select the events of interest and make measurements of physics parameters. These data can often be stored as tabular data in files with detector information and reconstructed quantities. Current techniques for event selection in these files lack the scalability needed for high performance computing environments. We describe our work to develop a high energy physics analysis framework suitable for high performance computing. This new framework utilizes modern tools for reading files and implicit data parallelism. Framework users analyze tabular data using standard, easy-to-use data analysis techniques in Python while the framework handles the file manipulations and parallelism without the user needing advanced experience in parallel programming. In future versions, we hope to provide a framework that can be utilized on a personal computer or a high performance computing cluster with little change to the user code.
    
    Speaker: Micah Groh (Fermi National Accelerator Laboratory)
    
    2021_05_17_pandana.pdf
    
    Recording
- Artificial Intelligence: Wed PM
  
  Conveners: Agnieszka Dziurda (Polish Academy of Sciences (PL)), Joosep Pata (National Institute of Chemical Physics and Biophysics (EE))
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 116
    
    Progress in developing a hybrid deep learning algorithm for identifying and locating primary vertices
    
    The locations of proton-proton collision points in LHC experiments
    are called primary vertices (PVs). Preliminary results of a hybrid deep learning
    algorithm for identifying and locating these, targeting the Run 3 incarnation
    of LHCb, have been described at conferences in 2019 and 2020. In the past
    year we have made significant progress in a variety of related areas. Using
    two newer Kernel Density Estimators (KDEs) as input feature sets improves the
    fidelity of the models, as does using full LHCb simulation rather than the “toy
    Monte Carlo" originally (and still) used to develop models. We have also built a
    deep learning model to calculate the KDEs from track information. Connecting
    a tracks-to-KDE model to a KDE-to-hists model used to find PVs provides
    a proof-of-concept that a single deep learning model can use track information
    to find PVs with high efficiency and high fidelity. We have studied a variety of
    models systematically to understand how variations in their architectures affect
    performance. While the studies reported here are specific to the LHCb geometry
    and operating conditions, the results suggest that the same approach could be
    used by the ATLAS and CMS experiments.
    
    Speaker: Simon Akar (University of Cincinnati (US))
    
    Recording
    
    vCHEP_PV-FINDER_20210519.pdf
  - 117
    
    Graph Neural Network for Object Reconstruction in Liquid Argon Time Projection Chambers
    
    This paper presents a graph neural network (GNN) technique for low-level reconstruction of neutrino interactions in a Liquid Argon Time Projection Chamber (LArTPC). GNNs are still a relatively novel technique, and have shown great promise for similar reconstruction tasks in the LHC. In this paper, a multihead attention message passing network is used to classify the relationship between detector hits by labelling graph edges, determining whether hits were produced by the same underlying particle, and if so, the particle type.The trained model is 84% accurate overall, and performs best on the EM shower and muon track classes. The model’s strengths and weaknesses are discussed, and plans for developing this technique further are summarised.
    
    Speaker: Jeremy Edmund Hewes (University of Cincinnati (US))
    
    2021-05-19 vCHEP slides.pdf
    
    Recording
  - 118
    
    Event vertex reconstruction with deep neural networks for the DarkSide-20k experiment
    
    While deep learning techniques are becoming increasingly more popular in high-energy and, since recently, neutrino experiments, they are less confidently used in direct dark matter searches based on dual-phase noble gas TPCs optimized for low-energy signals from particle interactions.
    In the present study, application of modern deep learning methods for event ver- tex reconstruction is demonstrated with an example of the 50-tonne liquid argon DarkSide-20k TPC with almost 10 thousand photosensors.
    The developed methods successfully reconstruct event’s position withing sub- cm precision and are applicable to any dual-phase argon or xenon TPC of arbi- trary size with any sensor shape and array pattern.
    
    Speaker: Victor Goicoechea Casanueva (University of Hawai'i at Manoa (US))
    
    CHEP2021.pdf
    
    Recording
  - 119
    
    Evolutionary Algorithms for Tracking Algorithm Parameter Optimization
    
    The reconstruction of charged particle trajectories, known as tracking, is one of the most complex and CPU consuming parts of event processing in high energy particle physics experiments. The most widely used and best performing tracking algorithms require significant geometry-specific tuning of the algorithm parameters to achieve best results. In this paper, we demonstrate the usage of machine learning techniques, particularly evolutionary algorithms, to find high performing configurations for the first step of tracking, called track seeding. We use a track seeding algorithm from the software framework A Common Tracking Software (ACTS). ACTS aims to provide an experiment- independent and framework-independent tracking software designed for mod- ern computing architectures. We show that our optimization algorithms find highly performing configurations in ACTS without hand-tuning. These tech- niques can be applied to other reconstruction tasks, improving performance and reducing the need for laborious hand-tuning of parameters.
    
    Speaker: Peter Chatain (Stanford)
    
    Recording
    
    Using Evolutionary Algorithms to Optimize Parameters for Track Reconstruction.pdf
  - 120
    
    AI Enabled Data Quality Monitoring with Hydra
    
    Data quality monitoring is critical to all experiments impacting the quality of any physics results. Traditionally, this is done through an alarm system, which detects low level faults, leaving higher level monitoring to human crews. Artificial Intelligence is beginning to find its way into scientific applications, but comes with difficulties, relying on the acquisition of new skill sets, either through education or acquisition, in data science. This paper will discuss the development and deployment of the Hydra monitoring system in production at Gluex. It will show how "off-the-shelf" technologies can be rapidly developed, as well as discuss what sociological hurdles must be overcome to successfully deploy such a system. Early results from production running of Hydra will also be shared as well as a future outlook for development of Hydra.
    
    Speaker: Thomas Britton (JLab)
    
    Britton_vCHEP_2021.pdf
    
    Recording
- Facilities and Networks: Wed PM
  
  Conveners: Alessandra Forti (University of Manchester (GB)), Dr David Crooks (UKRI STFC)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Tuesday afternoon
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 121
    
    Updates on usage of the Czech national HPC center
    
    The distributed computing of the ATLAS experiment at LHC is using computing resources of the Czech national HPC center IT4Innovations for several years. The submission system is based on ARC-CEs installed at the Czech LHC Tier2 site (praguelcg2). Recent improvements of this system will be discussed here. First, there was migration of the ARC-CE from version 5 to 6 which improves the reliability and scalability. The sshfs connection between praguelcg2 and IT4Innovations was a bottleneck of this system but this improved with new a version and setting. Containerisation using Singularity allows for customisation of environment without the need of requesting exceptions to HPC management as well as reduced amount of data on the shared storage. The system will need further modifications to improve CPU efficiency when running on worker nodes with very high number of cores. IT4Innovations HPCs provide significant contribution to computing done in Czech republic for the ATLAS experiment.
    
    Speaker: Michal Svatos (Czech Academy of Sciences (CZ))
    
    ATL-SOFT-SLIDE-2021-118.pdf
    
    Recording
  - 122
    
    Exploitation of the MareNostrum 4 HPC using ARC-CE
    
    HPC resources will help meet the future challenges of HL-LHC in terms of CPU requirements. The Spanish HPC centers have been used recently by implementing all the necessary edge services to integrate the resources into the LHC experiments workflow management system. Since it not always possible to install the edge services on HPC premises, we opted to set up a dedicated ARC-CE and interact with the HPC login and transfer nodes using ssh commands. In the ATLAS experiment, the repository that includes a partial copy of the experiment software in CVMFS is packaged into a container singularity image to overcome network isolation for HPC worker nodes and reduce software requirements. This article shows the Spanish contribution to the simulation of experiments after the agreement between the Spanish Ministry of Science and the Barcelona Supercomputing Center (BSC), the center that operates MareNostrum 4. Finally, we discuss some challenges to take advantage of HPC machines' next generation with heterogeneous architecture combining CPU and GPU.
    
    Speaker: Andreu Pacheco Pages (Institut de Física d'Altes Energies - Barcelona (ES))
    
    210511-LCGES-MN4-vCHEP21-v09.pdf
    
    Recording
  - 123
    
    Exploitation of HPC Resources for data intensive sciences
    
    The Large Hadron Collider (LHC) will enter a new phase begin- ning in 2027 with the upgrade to the High Luminosity LHC (HL-LHC). The increase in the number of simultaneous collisions coupled with a more complex structure of a single event will result in each LHC experiment collecting, stor- ing, and processing exabytes of data per year. The amount of generated and/or collected data greatly outweighs the expected available computing resources. In this paper, we discuss efficient usage of HPC resources as a prerequisite for data-intensive science at exascale. We discuss the work performed within the contexts of three EU-funded projects, DEEP-EST, EGI-ACE and CoE RAISE, with primary focus on three topics that emphasize the areas of work required to run production LHC workloads at the scale of HPC facilities. First, we dis- cuss the experience of porting CMS Hadron and Electromagnetic calorimeters to utilize Nvidia GPUs; second, we look at the tools and their adoption in order to perform benchmarking of a variety of resources available at HPC centers. Finally, we touch on one of the most important aspects of the future of HEP - how to handle the flow of PBs of data to and from computing facilities, be it clouds or HPCs, for exascale data processing in a flexible, scalable and per- formant manner. These investigations are a key contribution to technical work within the HPC collaboration among CERN, SKA, GEANT and PRACE.
    
    Speaker: David Southwick (CERN)
    
    Recording
    
    vCHEP21-DSouthwick.pdf
  - 124
    
    Finalizing Construction of a New Data Center at BNL
    
    Computational science, data management and analysis have been key factors in the success of Brookhaven National Laboratory's scientific programs at the Relativistic Heavy Ion Collider (RHIC), the National Synchrotron Light Source (NSLS-II), the Center for Functional Nanomaterials (CFN), and in biological, atmospheric, and energy systems science, Lattice Quantum Chromodynamics (LQCD) and Materials Science, as well as our participation in international research collaborations, such as the ATLAS Experiment at Europe's Large Hadron Collider (LHC) at CERN (Switzerland) and the Belle II Experiment at KEK (Japan). The construction of a new data center is an acknowledgement of the increasing demand for computing and storage services at BNL in the near term and enable the Lab to address the needs of the future experiments at the High-Luminosity LHC at CERN and the Electron-Ion Collider (EIC) at BNL in the long term. The Computing Facility Revitalization (CFR) project is aimed at repurposing the former National Synchrotron Light Source (NSLS-I) building as the new data center for BNL. The new data center is to become available in early 2021 for ATLAS compute, disk storage and tape storage equipment, and later that year - for all other collaborations supported by the Scientific Data and Computing Center (SDCC), including: STAR, PHENIX and sPHENIX experiments at RHIC collider at BNL, the Belle II Experiment at KEK (Japan), and the Computational Science Initiative at BNL. Migration of the majority of IT load and services from the existing data center to the new data center is expected to begin with the central networking systems and the first BNL ATLAS Tier-1 Site tape robot in 2021Q3, and it is expected to continue throughout FY2021-2024. This presentation will highlight the key mechanical, electrical, and plumbing (MEP) components of the new data center. Also, we will describe plans to migrate a subset of IT equipment between the old and the new data centers in CY2021, the period of operations with both data centers starting from 2021Q3, plans to perform the gradual IT equipment replacement in CY2021-2024, and show the expected state of occupancy and infrastructure utilization for both data centers up to FY2026.
    
    Speaker: Mr Alexandr Zaytsev (Brookhaven National Laboratory (US))
    
    azaytsev_BNL_data_center_19052021_v4.pdf
    
    Recording
  - 125
    
    Designing the RAL Tier-1 Network for HL-LHC and Future data lakes
    
    The Rutherford Appleton Laboratory (RAL) runs the UK Tier-1 which supports all four LHC experiments, as well as a growing number of others in HEP, Astronomy and Space Science. In September 2020, RAL was provided with funds to upgrade its network. The Tier-1 not only wants to meet the demands of LHC Run 3, it also wants to ensure that it can take an active role in data lake development and the network data challenges in the preparation for HL-LHC. It was therefore decided to completely rebuild the Tier-1 network with a Spine / Leaf architecture. This paper describes the network requirements and design decision that went into building the new Tier-1 network. It also includes a cost analysis, to understand if the ever increasing network requirements are deliverable in a continued flat cash environment and what limitations or opportunities this may place on future data lakes.
    
    Speaker: Alastair Dewhurst (Science and Technology Facilities Council STFC (GB))
    
    Recording
    
    Tier1NetworkvCHEP20210519.pdf
- Software: Wed PM
  
  Conveners: Luisa Arrabito (LUPM IN2P3/CNRS), Teng Jian Khoo (Humboldt University of Berlin (DE))
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 126
    
    Grid-based minimization at scale: Feldman-Cousins corrections for light sterile neutrino search
    
    High Energy Physics (HEP) experiments generally employ sophisticated statistical methods to present results in searches of new physics. In the problem of searching for sterile neutrinos, likelihood ratio tests are applied to short-baseline neutrino oscillation experiments to construct confidence intervals for the parameters of interest. The test statistics of the form $\Delta \chi^2$ is often used to form the confidence intervals, however, this approach can lead to statistical inaccuracies due to the small signal rate in the region-of-interest. In this paper, we present a computational model for the computationally expensive Feldman-Cousins corrections to construct a statistically accurate confidence interval for neutrino oscillation analysis. The program performs a grid-based minimization over oscillation parameters and is written in C++. Our algorithms make use of vectorization through Eigen3, yielding a single-core speed-up of 350 compared to the original implementation, and achieve MPI data parallelism by employing DIY. We demonstrate the strong scaling of the application at High-Performance Computing (HPC) sites. We utilize HDF5 along with HighFive to write the results of the calculation to file.
    
    Speaker: Marianette Wospakrik (Fermi National Accelerator Laboratory)
    
    Recording
    
    vCHEP_FC_At_Scale_wospakrik.pdf
  - 127
    
    Laurelin: Java-native ROOT I/O for Apache Spark
    
    Apache Spark is one of the predominant frameworks in the big data space, providing a fully-functional query processing engine, vendor support for hardware accelerators, and performant integrations with scientific computing libraries. One difficulty in adopting conventional big data frameworks to HEP workflows is the lack of support for the ROOT file format in these frameworks. Laurelin implements ROOT I/O with a pure Java library, with no bindings to the C++ ROOT implementation, and is readily installable via standard Java packaging tools. It provides a performant interface enabling Spark to read (and soon write) ROOT TTrees, enabling users to process these data without a pre-processing phase converting to an intermediate format.
    
    Speaker: Andrew Malone Melo (Vanderbilt University (US))
    
    Recording
    
    vchep_laurelin.key
    
    vchep_laurelin.pdf
  - 128
    
    Fine-grained data caching approaches to speedup a distributed RDataFrame analysis
    
    Thanks to its RDataFrame interface, ROOT now supports the execution of the same physics analysis code both on a single machine and on a cluster of distributed resources. In the latter scenario, it is common to read the input ROOT datasets over the network from remote storage systems, which often increases the time it takes for physicists to obtain their results. Storing the remote files much closer to where the computations will run can bring latency and execution time down. Such a solution can be improved further by caching only the actual portion of the dataset that will be processed on each machine in the cluster, reusing it in subsequent executions on the same input data. This paper shows the benefits of applying different means of caching input data in a distributed ROOT RDataFrame analysis. Two such mechanisms will be applied to this kind of workflow with different configurations, namely caching on the same nodes that process data or caching on a separate server.
    
    Speaker: Mr Vincenzo Eduardo Padulano (Valencia Polytechnic University (ES))
    
    chep_2021_data_caching_distributed_rdataframe.pdf
    
    Recording
  - 129
    
    Columnar data analysis with ATLAS analysis formats
    
    Future analysis of ATLAS data will involve new small-sized analysis
    formats to cope with the increased storage needs. The smallest of
    these, named DAOD_PHYSLITE, has calibrations already applied
    to allow fast downstream analysis and avoid the need for further
    analysis-specific intermediate formats. This allows for application
    of the "columnar analysis" paradigm where operations are applied
    on a per-array instead of a per-event basis. We will present methods
    to read the data into memory, using Uproot, and also discuss I/O
    aspects of columnar data and alternatives to the ROOT data format.
    Furthermore, we will show a representation of the event data model
    using the Awkward Array package and present proof of concept for a
    simple analysis application.
    
    Speaker: Nikolai Hartmann (Ludwig Maximilians Universitat (DE))
    
    columnar_data_analysis_atlas_vchep2021_updated.pdf
    
    Recording
  - 130
    
    AwkwardForth: accelerating Uproot with an internal DSL
    
    File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10‒80× faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code.
    
    Speaker: Jim Pivarski (Princeton University)
    
    pivarski-awkwardforth-2.pdf
    
    Recording
  - 131
    
    hep_tables: Heterogeneous Array Programming for HEP
    
    Array operations are one of the most concise ways of expressing common filtering and simple aggregation operations that is the hallmark of the first step of a particle physics analysis: selection, filtering, basic vector operations, and filling histograms. The High Luminosity run of the Large Hadron Collider (HL-LHC), scheduled to start in 2026, will require physicists to regularly skim datasets that are over a PB in size, and repeatedly run over datasets that are 100's of TB's – too big to fit in memory. Declarative programming techniques are a way of separating the intent of the physicist from the mechanics of finding the data, processing the data, and using distributed computing to process it efficiently that is required to extract the plot or data desired in a timely fashion. This paper describes a prototype library that provides a framework for different sub-systems to cooperate in producing this data, using an array-programming declarative interface. This prototype has a servicex data-delivery sub-system and an \awkward array sub-system cooperating to generate requested data. The ServiceX system runs against ATLAS xAOD data.
    
    Speaker: Gordon Watts (University of Washington (US))
    
    2021-05-20 - hep_tables.pdf
    
    2021-05-20 - hep_tables.pptx
    
    Recording
- Storage: Wed PM
  
  Conveners: Christophe Haen (CERN), Xavier Espinal (CERN)
  
  Mattermost
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 132
    
    CERN AFS phaseout: status & plans
    
    In 2016, CERN decided to phase out the legacy OpenAFS storage service due to concerns for the upstream project's longevity, and the potential impact of disorderly service stop on CERN's computing services. Early 2019, the OpenAFS risks of the project collapsing have been reassessed and several early concerns have been allayed. In this paper we recap the work done so far, highlight some of the issues encountered, and present current state and planning.
    
    Speaker: Jan Iven (CERN)
    
    CHEP2021_AFSstatus.pdf
    
    Recording
  - 133
    
    CernVM-FS powered container hub
    
    Containers became the de-facto standard to package and distribute modern applications and their dependencies. The HEP community demonstrates an increasing interest in such technology, with scientists encapsulating their analysis workflow and code inside a container image. The analysis is first validated on a small dataset and minimal hardware resources to then run at scale on the massive computing capacity provided by the grid. The typical approach for distributing containers consists of pulling their image from a remote registry and extracting it on the node where the container runtime (e.g., Docker, Singularity) runs. This approach, however, does not easily scale to large images and thousands of nodes. CVMFS has long been used for the efficient distribution of software directory trees at global scale. In order to extend its optimized caching and network utilization to the distribution of containers, CVMFS recently implemented a dedicated container image ingestion service together with container runtime integrations. CVMFS ingestion is based on per-file deduplication, instead of the per-layer deduplication adopted by traditional container registries. On the client-side, CVMFS implements on-demand fetching of the chunks required for the execution of the container instead of the whole image.
    
    Speaker: Enrico Bocchi (CERN)
    
    CernVM-FS powered container hub
    
    Recording
    
    vCHEP21, CernVM-FS powered container hub.pdf
  - 134
    
    Samba and CERNBox: Providing online access to Windows-based users at CERN
    
    This paper presents the experience in providing CERN users with
    direct online access to their EOS/CERNBox-powered user storage from Win-
    dows. In production for about 15 months, a High-Available Samba cluster is
    regularly used by a signiﬁcant fraction of the CERN user base, following the
    migration of their central home folders from Microsoft DFS in the context of
    CERN’s strategy to move to open source solutions.
    We describe the conﬁguration of the cluster, which is based on standard compo-
    nents: the EOS-backed CERNBox storage is mounted via FUSE, and an addi-
    tional mount provided by CephFS is used to share the cluster’s state. Further, we
    describe some typical shortcomings of such a setup and how they were tackled.
    Finally, we show how such an additional access method ﬁts in the bigger
    picture, where the storage is seamlessly accessed by user jobs, sync clients,
    FUSE/Samba mounts as well as the web UI, whilst aiming at a consistent view
    and user experience.
    
    Speaker: Giuseppe Lo Presti (CERN)
    
    LoPresti_Brosa_Bukowiec_Samba_for_CERNBox.pdf
    
    Recording
  - 135
    
    MetaCat - metadata catalog for data management systems
    
    Metadata management is one of three major areas and parts of functionality of scientific data management along with replica management and workflow management. Metadata is the information describing the data stored in a data item, a file or an object. It includes the data item provenance, recording conditions, format and other attributes. MetaCat is a metadata management database designed and developed for High Energy Physics experiments. As a component of a data management system, it’s main objectives are to provide efficient metadata storage and management and fast data items selection functionality. MetaCat is supposed to work on the scale of 100 million files (or objects) and beyond. The article will discuss the functionality of MetaCat and technological solutions used to implement the product.
    
    Speaker: Igor Mandrichenko (Fermi National Accelerator Lab. (US))
    
    MetaCat CHEP 2021 paper v2.pdf
    
    MetaCat - metadata catalog for data management systems
    
    Recording
  - 136
    
    ARCHIVER - Data archiving and preservation for research environments
    
    Over the last decades, several data preservation efforts have been undertaken by the HEP community, as experiments are not repeatable and consequently their data considered unique. ARCHIVER is a European Commission (EC) co-funded Horizon 2020 pre-commercial procurement project procuring R&D combining multiple ICT technologies including data-intensive scalability, network, service interoperability and business models, in a hybrid cloud environment. The results will provide the European Open Science Cloud (EOSC) with archival and preservation services covering the full research lifecycle. The services are co-designed in partnership with four research organisations (CERN, DESY, EMBL-EBI and PIC/IFAE) deploying use cases from Astrophysics, HEP, Life Sciences and Photon-Neutron Sciences creating an innovation ecosystem for specialist data archiving and preservation companies willing to introduce new services capable of supporting the expanding needs of research. The HEP use cases being deployed include the CERN Opendata portal, preserving a second copy of the completed BaBar experiment and the CERN Digital Memory digitising CERN’s multimedia archive of the 20th century. In parallel, ARCHIVER has established an Early Adopter programme whereby additional use cases can be incorporated at each of the project phases thereby expanding services to multiple research domains and countries.
    
    Speaker: Ignacio Peluaga Lozada (CERN)
    
    ARCHIVER.pdf
    
    Recording
  - 137
    
    Exploring Object Stores for High-Energy Physics Data Storage
    
    Over the last two decades, ROOT TTree has been used for storing over one exabyte of High-Energy Physics (HEP) events. The TTree columnar on-disk layout has been proved to be ideal for analyses of HEP data that typically require access to many events, but only a subset of the information stored for each of them. Future accelerators, and particularly HL-LHC, will bring an increase of at least one order of magnitude in the volume of generated data. To this end, RNTuple has been designed to overcome TTree's limitations, providing improved efficiency and taking advantage of modern storage systems, e.g. low-latency high-bandwidth NVMe devices and object stores. In this paper, we extend RNTuple with a backend that leverages Intel DAOS as the underlying storage, proving that RNTuple's architecture can accommodate such changes. From the RNTuple user's perspective, this data can be accessed with minimal changes to the user code, i.e. replacing a filesystem path by a DAOS URI. Our performance evaluation shows that the contributed backend can be used for realistic analyses, while outperforming the compatibility solution provided by the DAOS project.
    
    Speaker: Javier Lopez Gomez (CERN)
    
    Recording
    
    vCHEP2021_Exploring_Object_Stores_for_High-Energy_Physics_Data_Storage.pdf
- Streaming: Wed PM
  
  Conveners: Simon George (Royal Holloway, University of London), Vardan Gyurjyan (Jefferson Lab)
  
  Mattermost
  
  Video recording Wednesday afternoon
  
  Zoom
  - 138
    
    HOSS!
    
    The Hall-D Online Skim System (HOSS) was developed to simultaneously solve two issues for the high intensity GlueX experiment. One was to parallelize the writing of raw data files to disk in order to improve bandwidth. The other was to distribute the raw data across multiple compute nodes in order to produce calibration \textit{skims} of the data online. The highly configurable system employs RDMA, RAM disks, and zeroMQ driven by Python to simultaneously store and process the full high intensity GlueX data stream.
    
    Speaker: David Lawrence (Jefferson Lab)
    
    2021.05.19.vCHEP21_HOSS.pdf
    
    HOSS!
    
    Recording
  - 139
    
    Streaming Readout of the CLAS12 Forward Tagger Using TriDAS and JANA2
    
    An effort is underway to develop streaming readout data acquisition system for the CLAS12 detector in Jefferson Lab's experimental Hall-B. Successful beam tests were performed in the spring and summer of 2020 using a 10GeV electron beam from Jefferson Lab's CEBAF accelerator. The prototype system combined elements of the TriDAS and CODA data acquisition systems with the JANA2 analysis/reconstruction framework. This successfully merged components that included an FPGA stream source, a distributed hit processing system, and software plugins that allowed offline analysis written in C++ to be used for online event filtering. Details of the system design and performance are presented.
    
    Speaker: Tommaso Chiarusi (INFN - Sezione di Bologna)
    
    Recording
    
    SRO_Clas12_TriDAS_JANA_20210519_chiarusi.pdf
  - 140
    
    Simple and Scalable Streaming: The GRETA Data Pipeline
    
    The Gamma Ray Energy Tracking Array (GRETA) is a state of the art gamma-ray spectrometer being built at Lawrence Berkeley National Laboratory to be first sited at the Facility for Rare Isotope Beams (FRIB) at Michigan State University. A key design requirement for the spectrometer is to perform gamma-ray tracking in near real time. To meet this requirement we have used an inline, streaming approach to signal processing in the GRETA data acquisition system, using a GPU-equipped computing cluster. The data stream will reach 480 thousand events per second at an aggregate data rate of 4 gigabytes per second at full design capacity. We have been able to simplify the architecture of the streaming system greatly by interfacing the FPGA-based detector electronics with the computing cluster using standard network technology. A set of high-performance software components to implement queuing, flow control, event processing and event building have been developed, all in a streaming environment which matches detector performance. Prototypes of all high-performance components have been completed and meet design specifications.
    
    Speaker: Mario Cromaz (Lawrence Berkeley National Laboratory )
    
    greta-vchep-v3.pdf
    
    Recording
  - 141
    
    Free-running data acquisition system for the AMBER experiment
    
    Triggered data acquisition systems provide only limited possibilities of triggering methods. In our paper, we propose a novel approach that completely removes the hardware trigger and its logic. It introduces an innovative free-running mode instead, which provides unprecedented possibilities to physics experiments. We would like to present such system, which is being developed for the AMBER experiment at CERN. It is based on an intelligent data acquisition framework including FPGAs modules and advanced software processing. The system provides the triggerless mode that allows to gain more time for the data filtration and implement more complex algorithms. Moreover, it utilises a custom data protocol optimized for needs of the free-running system. The filtration procedure takes place in a server farm playing the role of the high-level trigger. For this purpose, we introduce a high-performance filtration framework providing optimized algorithms and load balancing to cope with excessive data rates. Furthermore, this paper also describes the filtration pipeline as well as the simulation chain that is being used for production of artificial data, for testing, and validation.
    
    Speaker: Martin Zemko (Czech Technical University in Prague (CZ))
    
    Recording
    
    Zemko - Free-running DAQ.pdf
  - 142
    
    FELIX: the Detector Interface for the ATLAS Experiment at CERN
    
    The Front-End Link eXchange (FELIX) system is an interface between the trigger and detector electronics and commodity switched networks for the ATLAS experiment at CERN. In preparation for the LHC Run 3, to start in 2022, the system is being installed to read out the new electromagnetic calorimeter, calorimeter trigger, and muon components being installed as part of the ongoing ATLAS upgrade programme. The detector and trigger electronic systems are largely custom and fully synchronous with respect to the 40.08 MHz clock of the Large Hadron Collider (LHC). The FELIX system uses FPGAs on server-hosted PCIe boards to pass data between custom data links connected to the detector and trigger electronics and host system memory over a PCIe interface then route data to network clients, such as the Software Readout Drivers (SW ROD), via a dedicated software platform running on these machines. The SW RODs build event fragments, buffer data, perform detector-specific processing and provide data for the ATLAS High Level Trigger. The FELIX approach takes advantage of modern FPGAs and commodity computing to reduce the system complexity and effort needed to support data acquisition systems in comparison to previous designs. Future upgrades of the experiment will introduce FELIX to read out all other detector components.
    
    Speaker: Alexander Paramonov (Argonne National Laboratory (US))
    
    2021_05_06_CHEP21_FELIX.pdf
    
    Recording
Thursday 20 May
- Thurs AM Plenaries: Plenaries
  
  Conveners: Benedikt Hegner (CERN), Patrick Fuhrmann (Deutsches Elektronen-Synchrotron (DE))
  
  Mattermost
  
  Video recording
  
  Zoom
  - 143
    
    Coffea-casa: an analysis facility prototype
    
    Data analysis in HEP has often relied on batch systems and event loops; users are given a non-interactive interface to computing resources and consider data event-by-event. The "Coffea-casa" prototype analysis facility is an effort to provide users with alternate mechanisms to access computing resources and enable new programming paradigms. Instead of the command-line interface and asynchronous batch access, a notebook-based web interface and interactive computing is provided. Instead of writing event loops, the column-based Coffea library is used.
    
    In this paper, we describe the architectural components of the facility, the services offered to end users, and how it integrates into a larger ecosystem for data access and authentication.
    
    Speaker: Oksana Shadura (University of Nebraska Lincoln (US))
    
    Coffea-casa: an analysis facility prototype
    
    Recording
    
    vchep2021(1).pdf
  - 144
    
    Evaluating CephFS Performance vs. Cost on High-Density Commodity Disk Servers
    
    CephFS is a network filesystem built upon the Reliable Autonomic Distributed Object Store (RADOS). At CERN we have demonstrated its reliability and elasticity while operating several 100-to-1000TB clusters which provide NFS-like storage to infrastructure applications and services. At the same time, our lab developed EOS to offer high performance 100PB-scale storage for the LHC at extremely low costs, while also supporting the complete set of security and functional APIs required by the particle-physics user community. This work seeks to evaluate the performance of CephFS on this cost-optimized hardware when it is combined with EOS to support the missing functionalities. To this end, we have setup a proof-of-concept Ceph Octopus cluster on high-density JBOD servers (840 TB each) with 100Gig-E networking. The system uses EOS to provide an overlayed namespace and protocol gateways for HTTP(S) and XROOTD, and uses CephFS as an erasure-coded object storage backend. The solution also enables operators to aggregate several CephFS instances and adds features such as third-party-copy, SciTokens, and high-level user and quota management. Using simple benchmarks we measure the cost/performance tradeoffs of different erasure-coding layouts, as well as the network overheads of these coding schemes. We demonstrate some relevant limitations of the CephFS metadata server and offer improved tunings which can be generally applicable. To conclude, we reflect on the advantages and drawbacks related to this architecture, such as RADOS-level free space requirements and double-network penalties, and offer ideas for improvements in the future.
    
    Speaker: Dan van der Ster (CERN)
    
    CHEP2021_ CephFS + EOS (3).pdf
    
    Recording
  - 145
    
    Fast and Accurate Electromagnetic and Hadronic Showers from Generative Models
    
    Generative machine learning models offer a promising way to efficiently amplify classical Monte Carlo generators' statistics for event simulation and generation in particle physics. Given the already high computational cost of simulation and the expected increase in data in the high-precision era of the LHC and at future colliders, such fast surrogate simulators are urgently needed.
    
    This contribution presents a status update on simulating particle showers in high granularity calorimeters for future colliders. Building on prior work using Generative Adversarial Networks (GANs), Wasserstein-GANs, and the information-theoretically motivated Bounded Information Bottleneck Autoencoder (BIB-AE), we further improve the fidelity of generated photon showers. The key to this improvement is a detailed understanding and optimisation of the latent space. The richer structure of hadronic showers compared to electromagnetic ones makes their precise modelling an important yet challenging problem.
    We present initial progress towards accurately simulating the core of hadronic showers in a highly granular scintillator calorimeter.
    
    Speaker: Sascha Daniel Diefenbacher (Hamburg University (DE))
    
    Recording
    
    vChep_2021_Fast_and_Accurate_Electromagnetic_and_Hadronic_Showers_from_Generative_Models.pdf
- 10:30
  
  Break
- Artificial Intelligence: Thu AM
  
  Conveners: Gian Michele Innocenti (CERN), Jason Webb (Brookhaven National Lab)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 146
    
    Decoding Photons: Physics in the Latent Space of a BIB-AE Generative Network
    
    Given the increasing data collection capabilities and limited computing resources of future collider experiments, interest in using generative neural networks for the fast simulation of collider events is growing. In our previous study, the Bounded Information Bottleneck Autoencoder (BIB-AE) architecture for generating photon showers in a high-granularity calorimeter showed a high accuracy modeling of various global differential shower distributions. In this work, we investigate how the BIB-AE encodes this physics information in its latent space. Our understanding of this encoding allows us to propose methods to optimize the generation performance further, for example, by altering latent space sampling or by suggesting specific changes to hyperparameters. In particular, we improve the modeling of the shower shape along the particle incident axis.
    
    Speaker: Erik Buhmann (Hamburg University (DE))
    
    Recording
    
    vCHEP2021_Buhmann.pdf
  - 147
    
    Distributed training and scalability for the particle clustering method UCluster
    
    In recent years, machine learning methods have become increasingly important for the experiments of the Large Hadron Collider (LHC). They are utilized in everything from trigger systems to reconstruction to data analysis. The recent UCluster method is a general model providing unsupervised clustering of particle physics data, that can be easily modified for a variety of different tasks. In the current paper, we improve on the UCluster method by adding the option of training the model in a scalable and distributed fashion, which extends its usefulness even further. UCluster combines the graph-based neural network ABCnet with a clustering step, using a combined loss function to train. It was written in TensorFlow v1.14 and has previously been trained on a single GPU. It shows a clustering accuracy of 81% when applied to the problem of multiclass classification of simulated jet events. Our implementation adds the distributed training functionality by utilizing the Horovod distributed training framework, which necessitated a migration of the code to TensorFlow v2. Together with using parquet files for splitting data up between different nodes, the distributed training makes the model scalable to any amount of input data, something that will be essential for use with real LHC datasets. We find that the model is well suited for distributed training, with the training time decreasing in direct relation to the number of GPU's used.
    
    Speaker: Olga Sunneborn Gudnadottir (Uppsala University (SE))
    
    Recording
    
    Updated-DistributedUClusterCHEP_OSunnebornGudnadottir.pdf
  - 148
    
    Training and Serving ML workloads with Kubeflow at CERN
    
    Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.
    
    Speaker: Dejan Golubovic (CERN)
    
    2021-05-20-Kubeflow-vCHEP.pdf
    
    Recording
  - 149
    
    Accelerating GAN training using highly parallel hardware on public cloud
    
    With the increasing number of Machine and Deep Learning applications in High Energy Physics, easy access to dedicated infrastructure represents a requirement for fast and efficient R&D. This work explores different types of cloud services to train a Generative Adversarial Network (GAN) in a parallel
    environment, using Tensorflow data parallel strategy. More specifically, we parallelize the training process on multiple GPUs and Google Tensor Processing Units (TPU) and we compare two algorithms: the TensorFlow built-in logic and a custom loop, optimised to have higher control of the elements assigned to each GPU worker or TPU core. The quality of the generated data is compared to Monte Carlo simulation. Linear speed-up of the training process is obtained, while retaining most of the performance in terms of physics results. Additionally, we benchmark the aforementioned approaches, at scale, over multiple GPU nodes, deploying the training process on different public cloud providers, seeking for overall efficiency and cost-effectiveness. The combination of data science, cloud deployment options and associated economics
    allows to burst out heterogeneously, exploring the full potential of cloud-based services.
    
    Speaker: Renato Paulo Da Costa Cardoso (Universidade de Lisboa (PT))
    
    Accelerating GAN training using highly parallel hardware on public cloud.pdf
    
    Recording
  - 150
    
    Multi-particle reconstruction in the High Granularity Calorimeter using object condensation and graph neural networks
    
    The high-luminosity upgrade of the LHC will come with unprecedented physics and computing challenges. One of these challenges is the accurate reconstruction of particles in events with up to 200 simultaneous proton-proton interactions. The planned CMS High Granularity Calorimeter offers fine spatial resolution for this purpose, with more than 6 million channels, but also poses unique challenges to reconstruction algorithms aiming to reconstruct individual particle showers. In this contribution, we propose an end-to-end machine-learning method that performs clustering, classification, and energy and position regression in one step while staying within memory and computational constraints. We employ GravNet, a graph neural network, and an object condensation loss function to achieve this task. Additionally, we propose a method to relate truth showers to reconstructed showers by maximising the energy weighted intersection over union using maximal weight matching. Our results show the efficiency of our method and highlight a promising research direction to be investigated further.
    
    Speaker: Shah Rukh Qasim (Manchester Metropolitan University (GB))
    
    Recording
    
    vChep.pdf
- Education, Training, Outreach: Thu AM
  
  Conveners: Clara Nellist (Radboud University Nijmegen and NIKHEF (NL)), Marzena Lapka (CERN)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Zoom
  - 151
    
    EsbRootView
    
    EsbRootView is an event display for the detectors of ESSnuSB able to exploit natively all the nice devices that we have in hands today; desktop, laptops but also smartphones and tablets.
    
    Speaker: Guy Barrand (Université Paris-Saclay (FR))
    
    G_Barrand_vCHEP_2021_EsbRootView.pdf
    
    Recording
  - 152
    
    Browser-based visualization framework Tracer for Outreach & Education
    
    Education & outreach is an important part of HEP experiments. With outreach & education, experiments can have an impact on the public, students and their teachers, as well as policymakers and the media. The tools and methods for visualization enable to represent the detectors' facilities, explaining their purpose, functionalities, development histories, and participant institutes. In addition, they make it possible to visualize different physical events together with important parameters and plots for physics analyses. 3D visualization and advanced VR (Virtual Reality), AR (Augmented Reality) and MR (Mixed Reality) extensions are the keys for successful outreach & education. This paper describes requirements and methods for the creation of browser-based visualization applications for outreach & education. The visualization framework TRACER is considered as a case study.
    
    Speaker: Alexander Sharmazanashvili (Georgian Technical University (GE))
    
    05-18-2021-vCHEP21-Sharmazanashvili.pdf
    
    Recording
  - 153
    
    The Phoenix event display framework
    
    Visualising HEP experiment event data and geometry is vital for physicists trying to debug their reconstruction software, their detector geometry or their physics analysis, and also for outreach and publicity purposes. Traditionally experiments used in-house applications that required installation (often as part of a much larger experiment specific framework). In recent years, web-based event/geometry displays have started to appear, dramatically lowering the entry barrier to use, but which typically are still per-experiment. The Phoenix framework is an extensible, experiment-agnostic framework for event and geometry visualisation.
    
    Speaker: Edward Moyse (University of Massachusetts (US))
    
    Phoenix vCHEP2021.key
    
    Phoenix vCHEP2021.pdf
    
    Recording
  - 154
    
    The fight against COVID-19: Running Folding@Home simulations on ATLAS resources
    
    Following the outbreak of the COVID-19 pandemic, the ATLAS experiment considered how it could most efficiently contribute using its distributed computing resources. After considering many suggestions, examining several potential projects and following the advice of the CERN COVID Task Force, it was decided to engage in the Folding@Home initiative, which provides payloads that perform protein folding simulations. This paper describes how ATLAS made a significant contribution to this project over the summer of 2020.
    
    Speaker: David Michael South (Deutsches Elektronen-Synchrotron (DE))
    
    210517_vCHEP21Covid.pdf
    
    Recording
- Monitoring: Thu AM
  
  Conveners: Julia Andreeva (CERN), Sang Un Ahn (Korea Institute of Science & Technology Information (KR))
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Zoom
  - 155
    
    The ESCAPE Data Lake: The machinery behind testing, monitoring and supporting a unified federated storage infrastructure of the exabyte-scale
    
    The EU-funded ESCAPE project aims at enabling a prototype federated storage infrastructure, a Data Lake, that would handle data on the exabyte-scale, address the FAIR data management principles and provide science projects a unified scalable data management solution for accessing and analyzing large volumes of scientific data. In this respect, data transfer and management technologies such as Rucio, FTS and GFAL are employed along with monitoring enabling solutions such as Grafana, Elasticsearch and perfSONAR. This paper presents and describes the technical details behind the machinery of testing and monitoring of the Data Lake – this includes continuous automated functional testing, network monitoring and development of insightful visualizations that reflect the current state of the system. Topics that are also addressed include the integration with the CRIC information system as well as the initial support for token based authentication / authorization by using OpenID Connect. The current architecture of these components is provided and future enhancements are discussed.
    
    Speaker: Rizart Dona (CERN)
    
    ESCAPE_Data_Lake_testing_monitoring.pdf
    
    Recording
  - 156
    
    The ATLAS Tile Calorimeter Tools for Data Quality Assessment
    
    The ATLAS Tile Calorimeter (TileCal) is the central part of the hadronic calorimeter of the ATLAS experiment and provides important information for reconstruction of hadrons, jets, hadronic decays of tau leptons and missing transverse energy. The readout is segmented into nearly 10000 channels that are calibrated by means of Cesium source, laser, charge injection, and integrator-based systems.
    The data quality (DQ) relies on extensive monitoring of both collision and calibration data. Automated checks are performed on a set of predefined histograms and results are summarized in dedicated web pages. A set of tools is then used by the operators for further inspection of the acquired data with the goal of spotting the origins of problems or other irregularities. Consequently,the TileCal conditions data (calibration constants, channel statuses etc) are updated in databases that are used for the data-reprocessing, or serve as an important input for the maintenance works during the shutdown periods. This talk reviews the software tools used for the DQ monitoring with emphasis on recent developments aiming to integrate all tools into a single platform.
    
    Speaker: Daniel Scheirich (Charles University (CZ))
    
    2021-05-07-scheirich-vCHEP.pdf
    
    Recording
  - 157
    
    Improving the automated calibration at Belle II
    
    The Belle II detector began collecting data from $e^+e^-$ collisions at the SuperKEKB electron-positron collider in March 2019 and has already exceeded the Belle instantaneous luminosity. The result is an unprecedented amount of incoming raw data that must be calibrated promptly prior to data reconstruction. To fully automate the calibration process a Python plugin package, b2cal, had been developed based on the open-source Apache Airflow package using Directed Acyclic Graphs (DAGs) to describe the ordering of processes and Flask to provide administration and job submission web pages. This system was used in 2019 to help automate calibrations at the KEK Computing Center and has been upgraded to be capable of running at multiple calibration centers with successful operations at Brookhaven National Laboratory (BNL) and Deutsches Elektronen-Synchrotron Laboratory (DESY). The webserver hosting b2cal has been migrated from Melbourne to DESY where authentication using the internal DESY LDAP server has been added. DAGs have been updated to further automate the calibration as the job submission and validation of payloads occurs as soon as possible and without human intervention. The application has been dockerised so that it can be efficiently deployed on any machine.
    
    Speaker: Francis Pham (The University of Melbourne)
    
    BelleII_AutomatedCalibration_CHEP2021.pdf
    
    Recording
  - 158
    
    Monitoring reconstruction software in LHCb
    
    The LHCb detector at the LHC is currently undergoing a major upgrade to increase full detector read-out rate to 30 MHz. In addition to the detector hardware modernisation, the new trigger system will be software-only. The code base of the new trigger system must be thoroughly tested for data flow, functionality and physics performance. Currently, the testing procedure is based on a system of nightly builds and continuous integration tests of each new code development. The continuous integration tests are now extended to test and evaluate high-level quantities related to LHCb’s physics program, such as track reconstruction and particle identification, which is described in this paper. Before each merge request, the differences after the change in code are shown and automatically compared using an interactive visualisation tool, allowing easy verification of all relevant quantities. This approach gives an extensive control over the physics performance of the new code resulting into better preparation for data taking with the upgraded LHCb detector at Run 3.
    
    Speaker: Yingrui Hou (University of Chinese Academy of Sciences (CN))
    
    Monitoring reconstruction software in LHCb_v2.pdf
    
    Recording
  - 159
    
    Software migration of the CMS ECAL Detector Control System during the CERN Large Hadron Collider Long Shutdown II
    
    During the second long shutdown (LS2) of the CERN Large Hadron Collider (LHC), the Detector Control System (DCS) of the Compact Muon Solenoid (CMS) Electromagnetic Calorimeter (ECAL) is undergoing a large software upgrade at various levels. The ECAL DCS supervisory system has been reviewed and extended to migrate the underlying software toolkits and platform technologies to the latest versions. The resulting software will run on top of a new computing infrastructure, using the WinCC Open Architecture (OA) version 3.16 and newly developed communication drivers for some of the hardware. The ECAL DCS has been configured and managed from a different control version system and stored with more modern encoding and file formats. A new set of development guidelines has been prepared for this purpose, including conventions and recommendations from the CMS Central DCS and CERN Joint Controls Project (JCOP) framework groups. The large list of modifications also motivated the revision and reorganization of the software architecture, which is needed to resolve and satisfy additional software dependencies. Many modifications also aimed to improve the installation process, anticipating in some cases works for the next long shutdown upgrade.
    
    Speaker: Raul Jimenez Estupinan (ETH Zurich (CH))
    
    Recording
    
    Software migration of the CMS ECAL DCS - PP.pdf
- Quantum Computing: Thu AM
  
  Conveners: Andrea Sartirana (Centre National de la Recherche Scientifique (FR)), Sofia Vallecorsa (CERN)
  
  Mattermost
  
  Video recording Thursday morning
  
  Zoom
  - 160
    
    Quantum Gate Pattern Recognition and Circuit Optimization for Scientific Applications
    
    There is no unique way to encode a quantum algorithm into a quantum circuit. With limited qubit counts, connectivities, and coherence times, circuit optimization is essential to make the best use of near-term quantum devices. We introduce two separate ideas for circuit optimization and combine them in a multi-tiered quantum circuit optimization protocol called AQCEL. The first ingredient is a technique to recognize repeated patterns of quantum gates, opening up the possibility of future hardware co-optimization. The second ingredient is an approach to reduce circuit complexity by identifying zero- or low-amplitude computational basis states and redundant gates. As a demonstration, AQCEL is deployed on an iterative and efficient quantum algorithm designed to model final state radiation in high energy physics. For this algorithm, our optimization scheme brings a significant reduction in the gate count without losing any accuracy compared to the original circuit. Additionally, we have investigated whether this can be demonstrated on a quantum computer using polynomial resources. Our technique is generic and can be useful for a wide variety of quantum algorithms.
    
    Speaker: Koji Terashi (University of Tokyo (JP))
    
    CirqOpt_vCHEP_200521.pdf
    
    Recording
  - 161
    
    Dual-Parameterized Quantum Circuit GAN Model in High Energy Physics
    
    Generative Models, and Generative Adversarial Networks (GAN) in particular, are being studied as possible alternatives to Monte Carlo. Meanwhile, it has also been proposed that, in certain circumstances, simulation using GANs can itself be sped-up by using quantum GANs (qGANs).
    
    Our work presents an advanced prototype of qGAN, that we call the dual-Parameterized Quantum Circuit (PQC) GAN, with a classical discriminator and two quantum generators which take the form of PQCs. The first PQC learns the probability distribution over the images of $N$ pixels, while the second generates normalized pixel intensities of an individual image for each PQC input. The performance of the dual-PQC architecture has been evaluated through the application in HEP to imitate calorimeter outputs, translated into pixelated images. The results demonstrate that the model can reproduce a fixed number of images with a reduced size as well as their probability distribution and we anticipate it should allow us to scale up to real calorimeter outputs.
    
    Speaker: Su Yeon Chang (EPFL - Ecole Polytechnique Federale Lausanne (CH))
    
    Dual_PQC_GAN.pdf
    
    Recording
  - 162
    
    Embedding of particle tracking data using hybrid quantum classical neural networks
    
    The High Luminosity Large Hadron Collider (HL-LHC) at CERN will involve a significant increase in complexity and sheer size of data with respect to the current LHC experimental complex. Hence, the task of reconstructing the particle trajectories will become more complex due to the number of simultaneous collisions and the resulting increased detector occupancy. Aiming to identify the particle paths, machine learning techniques such as graph neural networks are being explored in the HEP TrkX project and its successor, the Exa TrkX project. Both show promising results and reduce the combinatorial nature of the problem. Previous results of our team have demonstrated the successful attempt of applying quantum graph neural networks to reconstruct the particle track based on the hits of the detector. A higher overall occuracy is gained by representing the training data in a meaningful way within an embedded space. That has been included in the Exa TrkX project by applying a classical MLP. Consequently, pairs of hits belonging to different trajectories are pushed apart while those belonging to the same ones stay close together. We explore the applicability of quantum circuits within the task of embedding and show preliminary results.
    
    Speaker: Carla Sophie Rieger
    
    Embedding of particle tracking data using hybrid quantum classical neural networks.pdf
    
    Recording
  - 163
    
    Higgs analysis with quantum classifiers
    
    We have developed two quantum classifier models for the $t\bar{t}H$ classification problem, both of which fall into the category of hybrid quantum-classical algorithms for Noisy Intermediate Scale Quantum devices (NISQ). Our results, along with other studies, serve as a proof of concept that Quantum Machine Learning (QML) methods can have similar or better performance, in specific cases of low number of training samples, with respect to conventional ML methods even with a limited number of qubits available in current hardware. To utilise algorithms with a low number of qubits -to accommodate for limitations in both simulation hardware and real quantum hardware- we investigated different feature reduction methods. Their impact on the performance of both the classical and quantum models was assessed. We addressed different implementations of two QML models, representative of the two main approaches to supervised quantum machine learning today: a Quantum Support Vector Machine (QSVM), a kernel-based method, and a Variational Quantum Circuit (VQC), a variational approach.
    
    Speaker: Vasileios Belis (ETH Zurich (CH))
    
    quantum_classifiers_for_higgs_analysis.pdf
    
    Recording
- Virtualisation: Thu AM
  
  Conveners: Alessandra Forti (University of Manchester (GB)), Daniele Spiga (Universita e INFN, Perugia (IT))
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Zoom
  - 164
    
    Seamless integration of commercial Clouds with ATLAS Distributed Computing
    
    The CERN ATLAS Experiment successfully uses a worldwide distributed computing Grid infrastructure to support its physics programme at the Large Hadron Collider (LHC). The Grid workflow system PanDA routinely manages up to 700'000 concurrently running production and analysis jobs to process simulation and detector data. In total more than 500 PB of data is distributed over more than 150 sites in the WLCG and handled by the ATLAS data management system Rucio. To prepare for the ever growing data rate in future LHC runs new developments are underway to embrace industry accepted protocols and technologies, and utilize opportunistic resources in a standard way. This paper reviews how the Google and Amazon Cloud computing services have been seamlessly integrated as a Grid site within PanDA and Rucio. Performance and brief cost evaluations will be discussed. Such setups could offer advanced Cloud tool-sets and provide added value for analysis facilities that are under discussions for LHC Run-4.
    
    Speaker: Johannes Elmsheuser (Brookhaven National Laboratory (US))
    
    c_200521.pdf
    
    Recording
  - 165
    
    Accounting in the CloudVeneto private cloud
    
    CloudVeneto is a private cloud implemented as the result of merging two existing cloud infrastructures: the INFN Cloud Area Padovana, and a private cloud owned by 10 departments of University of Padova.
    This infrastructure is a full production facility, in continuous growth, both in terms of users, and in terms of computing and storage resources.
    Even if the usage of CloudVeneto is not regulated by a strict pay-per-use model,
    the availability of accounting information for such infrastructure is a requirement, to detect if the resources allocated to the user communities are efficiently used, and to perform an effective capacity planning.
    
    We present in this paper how the accounting system used in CloudVeneto evolved over time, focusing on the accounting framework being used now, implemented by integrating existing components.
    
    Speaker: Massimo Sgaravatto (Universita e INFN, Padova (IT))
    
    2021-May-CHEP2021_1_1.pdf
    
    Recording
  - 166
    
    CloudBank for Europe
    
    Abstract. The vast amounts of data generated by scientific research pose enormous challenges for capturing, managing and processing this data. Many trials have been made in different projects (such as HNSciCloud and OCRE), but today, commercial cloud services do not yet play a major role in the production computing environments of the publicly funded research sector in Europe. Funded by the Next Generation Internet programme (NGI-Atlantic) from the EC, in partnership with the University California San Diego (UCSD), CERN is piloting the use of CloudBank in Europe. CloudBank has been developed by the UCSD, University of Washington and University of California, Berkeley with NSF grant support, to provide a set of managed services simplifying access to public cloud for research and education, via a cloud procurement partnership with Strategic Blue, a financial broker SME, specialised in cost management and optimisation. The European NGI experiment is provisioning cloud services from multiple vendors and deploying a series of use-cases in the domain of Machine Learning and HPCaaS, contributing to the scientific programme of the Large Hadron Collider. The main objective is to address technical, financial and legal challenges to determine whether CloudBank can be successfully used by Europe’s research community as part of its global research activity.
    
    Speaker: Apostolos Theodoridis (CERN)
    
    CloudBankEU-vCHEP2021.pptx.pdf
    
    Recording
  - 167
    
    Transparent Integration of Opportunistic Resources into the WLCG Compute Infrastructure
    
    The inclusion of opportunistic resources, for example from High Performance Computing (HPC) centers or cloud providers, is an important contribution to bridging the gap between existing resources and future needs by the LHC collaborations, especially for the HL-LHC era. However, the integration of these resources poses new challenges and often needs to happen in a highly dynamic manner. To enable an effective and lightweight integration of these resources, the tools COBalD and TARDIS are developed at KIT.
    
    In this contribution we report on the infrastructure we use to dynamically offer opportunistic resources to collaborations in the World Wide LHC Computing Grid (WLCG). The core components are COBalD/TARDIS, HTCondor, CVMFS and modern virtualization technology. The challenging task of managing the opportunistic resources is performed by COBalD/TARDIS. We showcase the challenges, employed solutions and experiences gained with the provisioning of opportunistic resources from several resource provides like university clusters, HPC centers and cloud setups in a multi VO environment. This work can serve as a blueprint for approaching the provisioning of resources from other resource providers.
    
    Speaker: Rene Caspart (KIT - Karlsruhe Institute of Technology (DE))
    
    CHEP21_opportunistic.pdf
    
    Recording
  - 168
    
    Opportunistic transparent extension of a WLCG Tier 2 center using HPC resources
    
    Computing resource needs are expected to increase drastically in the future. The HEP experiments ATLAS and CMS foresee an increase of a factor of 5-10 in the volume of recorded data in the upcoming years. The current infrastructure, namely the WLCG, is not sufficient to meet the demands in terms of computing and storage resources.
    
    The usage of non HEP specific resources is one way to reduce this shortage. However, using them comes at a cost: First, with multiple of such resources at hand, it gets more and more difficult for the single user, as each resource normally requires its own authentication and has its own way of accessing it. Second, as they are not specifically designed for HEP workflows, they might lack dedicated software or other necessary services.
    
    Allocating the resources at the different providers can be done by COBalD/TARDIS, developed at KIT. The resource manager integrates resources on demand into one overlay batch system, providing the user with a single point of entry. The software and services, needed for the communities workflows, are transparently served through containers.
    
    With this, an HPC cluster at RWTH Aachen University is dynamically and transparently integrated into a tier 2 WLCG resource, virtually doubling its computing capacities.
    
    Speaker: Ralf Florian Von Cube (KIT - Karlsruhe Institute of Technology (DE))
    
    Recording
    
    vCHEP21.pdf
  - 169
    
    Evolution of the HEPS Jupyter-based remote data analysis System
    
    High Energy Photon Source (HEPS) has the characteristic of large amount of data, high timeliness, and diverse requirements for scientific data analysis. Generally, researchers need to spend a lot of time in the configuration of the experimental environment. In response to the above problems, we introduce a remote data analysis system for HEPS. The platform provides users a web-based interactive interface with Jupyter, which makes scientists are able to process data analysis anytime and anywhere. Particularly, we discuss the system architecture as well as the key points of this system. A solution of managing and scheduling heterogeneous computing resources (CPU and GPU) is proposed, which adopts Kubernetes to achieve centralized heterogeneous resources management and resource expansion on demand. An improved Kubernetes resource scheduler is discussed. The schedular dispatches resources to upper applications in combination with the cluster status, which can transparently and quickly deployment the data analysis environment for users in seconds and reach the maximum resource utilization. We also introduce an automated deployment solution to improve the work efficiency of developers and help deploy multidisciplinary applications faster and better in the production environment. A unified certification is illustrated to make sure the security of remote data access and data analysis. Finally, we will show the running status of the system.
    
    Speaker: Zhibin Liu (Institute of High Energy Physis, CAS; University of Chinese Academy of Sciences)
    
    Evolution of the HEPS Jupyter-based remote data analysis System.pdf
    
    Recording
- Artificial Intelligence: Thu PM
  
  Conveners: Graeme A Stewart (CERN), Jason Webb (Brookhaven National Lab)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Video recording Tuesday afternoon
  
  Video recording Tuesday morning
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 170
    
    Physics Validation of Novel Convolutional 2D Architectures for Speeding Up High Energy Physics Simulations
    
    The precise simulation of particle transport through detectors is a key element for the successful interpretation of high energy physics results.
    However, Monte Carlo based simulation is extremely demanding in terms of computing resources. This challenge motivates investigations of faster, alternative approaches for replacing the standard Monte Carlo approach.
    
    We apply Generative Adversarial Networks, a deep learning technique, to replace the calorimeter detector simulations and speeding up the simulation time by orders of magnitude. We follow a previous approach which used three-dimensional convolutional neural networks and develop new two-dimensional convolutional networks to solve the same image generation problem faster. Additionally, we increased the number of parameters, and the neural networks representational power, obtaining a higher accuracy. We compare our best convolutional 2D neural network architecture and evaluate it versus the previous 3D architecture and Geant4 data. Our results demonstrate a high physics accuracy and further consolidate the use of generative adversarial networks for fast detector simulations.
    
    Speaker: Florian Rehm (RWTH Aachen University (DE))
    
    CHEP v4.pdf
    
    Recording
  - 171
    
    Reframing Jet Physics with New Computational Methods
    
    We reframe common tasks in jet physics in probabilistic terms, including jet reconstruction, Monte Carlo tuning, matrix element – parton shower matching for large jet multiplicity, and efficient event generation of jets in complex, signal-like regions of phase space. We also introduce Ginkgo, a simplified, generative model for jets, that facilitates research into these tasks with techniques from statistics, machine learning, and combinatorial optimization. We also review some of the recent research in this direction that has been enabled with Ginkgo. We show how probabilistic programming can be used to efficiently sample the showering process, how a novel trellis algorithm can be used to efficiently marginalize over the enormous number of clustering histories for the same observed particles, and how the dynamic programming and reinforcement learning can be used to find the maximum likelihood clusterinng in this enor- mous search space. This work builds bridges with work in hierarchical clustering, statistics, combinatorial optmization, and reinforcement learning.
    
    Speaker: Sebastian Macaluso (New York University)
    
    Recording
    
    vCHEP_2021_Reframing_Jet_Physics_v2.pdf
  - 172
    
    Artificial Proto-Modelling: Building Precursors of a Next Standard Model from Simplified Model Results
    
    We present a novel algorithm to identify potential dispersed signals of new physics in the slew of published LHC results. It employs a random walk algorithm to introduce sets of new particles, dubbed “proto-models”, which are tested against simplified-model results from ATLAS and CMS (exploiting the SModelS software framework). A combinatorial algorithm identifies the set of analyses and/or signal regions that maximally violates the SM hypothesis, while remaining compatible with the entirety of LHC constraints in our database.
    Demonstrating our method by running over the experimental results in the SModelS database, we find as currently best-performing proto-model a top partner, a light-flavor quark partner, and a lightest neutral new particle with masses of the order of 1.2 TeV, 700 GeV and 160 GeV, respectively.
    The corresponding global p-value for the SM hypothesis is approximately 0.19; by construction no look-elsewhere effect applies.
    
    Speaker: Wolfgang Waltenberger (Austrian Academy of Sciences (AT))
    
    protomodels_chep.pdf
    
    Recording
  - 173
    
    Jet Single Shot Detection
    
    In this paper, we apply object detection techniques based on convolutional neural networks to jet images, where the input data corresponds to the calorimeter energy deposits. In particular, we focus on the CaloJet reconstruction and tagging as a detection task with a Single Shot Detection network, called Jet-SSD. The model performs simultaneous localization and classification and additional mass regression task. The algorithm will operate in a hardware restricted environment and we report on necessary changes to VGG-16 network architecture, which is the base for the detection model. Finally, as aggressive quantization of weights in the network can be a handle for speeding up inference to match latency constraints of the trigger selection system, we further investigate Ternary Weight Networks with weights constrained to {-1, 0, 1} with per-layer and per-channel scaling factors. We show that the quantized version of the network closely matches the performance of the full precision equivalent.
    
    Speaker: Adrian Alan Pol (CERN)
    
    JetSSD-vCHEP2021.pdf
    
    Recording
  - 174
    
    End-to-End Jet Classification of Boosted Top Quarks with CMS Open Data
    
    We describe a novel application of the end-to-end deep learning technique to the task of discriminating top quark-initiated jets from those originating from the hadronization of a light quark or a gluon. The end-to-end deep learning technique combines deep learning algorithms and low-level detector representation of the high-energy collision event. In this study, we use low-level detector information from the simulated CMS Open Data samples to construct the top jet classifiers.
    To optimize classifier performance we progressively add low-level information from the CMS tracking detector, including pixel detector reconstructed hits and impact parameters, and demonstrate the value of additional tracking information even when no new spatial structures are added.
    Relying only on calorimeter energy deposits and reconstructed pixel detector hits, the end-to-end classifier achieves an AUC score of 0.975$\pm$0.002 for the task of classifying boosted top quark jets.
    After adding derived track quantities, the classifier AUC score increases to 0.9824$\pm$0.0013, serving as the first performance benchmark for these CMS Open Data samples.
    
    Speaker: Bjorn Burkle (Brown University (US))
    
    bburkle-vCHEP-2021-final.pdf
    
    Recording
- Distributed Computing: Thu PM
  
  Conveners: Daniela Bauer (Imperial College (GB)), Luisa Arrabito (LUPM IN2P3/CNRS)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Zoom
  - 175
    
    Building a Distributed Computing System for LDMX
    
    Particle physics experiments rely extensively on computing and data services, making e-infrastructure an integral part of the research collaboration. Constructing and operating distributed computing can however be challenging for a smaller-scale collaboration.
    
    The Light Dark Matter eXperiment (LDMX) is a planned small-scale accelerator-based experiment to search for dark matter in the sub-GeV mass region. Finalizing the design of the detector relies on Monte-Carlo simulation of expected physics processes. A distributed computing pilot project was proposed to better utilize available resources at the collaborating institutes, and to improve scalability and reproducibility.
    
    This paper outlines the chosen lightweight distributed solution, presenting requirements, the component integration steps, and the experiences using a pilot system for tests with large-scale simulations. The system leverages existing technologies wherever possible, minimizing the need for software development, and deploys only non-intrusive components at the participating sites. The pilot proved that integrating existing components can dramatically reduce the effort needed to build and operate a distributed e-infrastructure, making it attainable even for smaller research collaborations.
    
    Speaker: Lene Kristian Bryngemark (Stanford University (US))
    
    LDCS_LKBryngemark_83.pdf
    
    paper_2105.02977.pdf
    
    Recording
  - 176
    
    The Rucio File Catalog in DIRAC implemented for Belle II
    
    Dirac and Rucio are two standard pieces of software widely used in the HEP domain. Dirac provides Workload and Data Management functionalities, among other things, while Rucio is a dedicated, advanced Distributed Data Management system. Many communities that already use Dirac express their interest in using Dirac for workload management in combination with Rucio for the Data management part. In this paper, we describe the integration of the Rucio File Catalog into Dirac that was initially developed for the Belle II collaboration.
    
    Speaker: Ruslan Mashinistov (Brookhaven National Laboratory (US))
    
    Recording
    
    The Rucio File Catalog in Dirac implemented for Belle II-2.pdf
  - 177
    
    Experience with Rucio in the wider HEP community
    
    Managing the data of scientific projects is an increasingly complicated challenge, which was usually met by developing experiment-specific solutions. However, the ever-growing data rates and requirements of even small experiments make this approach very difficult, if not prohibitive. In recent years the scientific data management system Rucio has evolved into a successful open-source project, now being used by many scientific communities and organisations. Rucio is incorporating the contributions and expertise of many scientific projects, offering common features useful to a diverse research community. This article describes the recent experiences in operating Rucio as well as contributions to the project by ATLAS, Belle II, CMS, ESCAPE, IGWN, LDMX, Folding@Home, and the UK's Science and Technology Facilities Council (STFC).
    
    Speaker: Martin Barisits (CERN)
    
    Recording
    
    Rucio CHEP 2021.pdf
  - 178
    
    The Cherenkov Telescope Array production system prototype for large-scale data processing and simulations
    
    The Cherenkov Telescope Array (CTA) is the next-generation instrument in the very-high energy gamma ray astronomy domain. It will consist of tens of Cherenkov telescopes deployed in 2 arrays at La Palma (Spain) and Paranal (ESO, Chile) respectively. Currently under construction, CTA will start operations around 2023 for a duration of about 30 years. During operations CTA is expected to produce about 2 PB of raw data per year plus 5-20 PB of Monte Carlo data. The global data volume to be managed by the CTA archive, including all versions and copies, is of the order of 100 PB with a smooth growing profile. The associated processing needs are also very high, of the order of hundreds of millions of CPU HS06 hours per year. In order to optimize the instrument design and study its performances, during the preparatory phase (2010-2017) and the current construction phase, the CTA consortium has run massive Monte Carlo productions on the EGI grid infrastructure. In order to handle these productions and the future data processing, we have developed a production system based on the DIRAC framework. The current system is the result of several years of hardware infrastructure upgrades, software development and integration of different services like CVMFS and FTS. In this paper we present the current status of the CTA production system and its exploitation during the latest large-scale Monte Carlo campaigns.
    
    Speaker: Johan BREGEON (CNRS)
    
    chep_2021_ctadirac_v2.pdf
    
    Recording
  - 179
    
    Evolution of ATLAS analysis workflows and tools for the HL-LHC era
    
    The High Luminosity LHC project at CERN, which is expected to deliver a ten-fold increase in the luminosity of proton-proton collisions over LHC, will start operation towards the end of this decade and will deliver an unprecedented scientific data volume of multi-exabyte scale. This vast amount of data has to be processed and analyzed, and the corresponding computing facilities must ensure fast and reliable data processing for physics analyses by scientific groups distributed all over the world. The present LHC computing model will not be able to provide the required infrastructure growth, even taking into account the expected evolution in hardware technology. To address this challenge, several novel methods of how end-users analysis will be conducted are under evaluation by the ATLAS Collaboration. State-of-the-art workflow management technologies and tools to handle these methods within the existing distributed computing system are now being evaluated and developed. In addition the evolution of computing facilities and how this impacts ATLAS analysis workflows is being closely followed.
    
    Speaker: Alessandra Forti (University of Manchester (GB))
    
    20210511_vCHEP_AF.pdf
    
    Recording
- Education, Training, Outreach: Thu PM
  
  Conveners: Clara Nellist (Radboud University Nijmegen and NIKHEF (NL)), Marzena Lapka (CERN)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Zoom
  - 180
    
    The Challenges of Open Source Software Alternatives
    
    Developing an Open Source Software application is a challenge. Mainly because there are commercial alternatives that have an army of expert developers behind them, experienced supporters and well-established business processes in their development and promotion.
    
    Nevertheless, web-based applications, that securely handle the users' personal data are an area of freedom and ease of use, features that make such applications very attractive. The "ease-of-use" part is very hard to achieve, for the developers and the end-users.
    Dependencies change often in OSS packages, so the fear that something breaks is always around the corner.
    If the application looks attractive, additional user requirements fall like rain. This poses a problem of continuity, maintenance and operational quality of the packages.
    
    In this paper and presentation we shall share our experience in building such a tool, using https://cern.ch/slides, as a showcase and a learning exercise. We shall describe what was available, what was missing, how it was put together, how much effort it took, and what was achieved.
    
    Speaker: Aristofanis Chionis (National and Kapodistrian University of Athens (GR))
    
    BSc thesis from this work
    
    Recording
    
    The paper
    
    The paper published in EPJ
    
    vCHEP-2021-OSS-Challenges-Slides.v.1.pdf
    
    vCHEP2021.pdf
    
    vCHEP2021.slides
  - 181
    
    LHC Computing – the First 3 Decades
    
    Computing for the Large Hadron Collider (LHC) at CERN arguably started shortly after the commencement of data taking at the previous machine – LEP – some would argue it was even before. Without specifying an exact date, it was certainly prior to when today’s large(st) collaborations, namely ATLAS and CMS, had formed and been approved and before the LHC itself was given the official go-ahead at the 100th meeting of the CERN Council in 1995. Approximately the first decade was spent doing research and development; the second – from the beginning of the new millennium – on grid exploration and hardening; and the third providing support to LHC data taking, production, analysis and most importantly obtaining results.
    
    Speaker: Jamie Shiers (CERN)
    
    LEP to an fcc-‐ee – A Bridge Too Far?
    
    LHC Computing - the first 3 decades (paper).pdf
    
    LHC Computing - the first 3 decades (talk).pdf
    
    LHC Computing - the first 3 decades (talk).pptx
    
    Recording
    
    S/W PRESERVATION AND LEGACY ISSUES AT LEP
    
    The IEEE Mass Storage System Reference Model
  - 182
    
    A proposal for Open Access data and tools multi-user deployment using ATLAS Open Data for Education
    
    The deployment of analysis pipelines has been tightly related and conditioned to the scientific facility’s computer infrastructure or academic institution where it is carried on. Nowadays, Software as a Service (SaaS) and Infrastructure as a Service (IaaS) have reshaped the industry of data handling, analysis, storage, and sharing. The sector of science does not escape those changes. This situation is particularly true in multinational collaborations, where distributed resources allow researchers to deploy data analysis in diverse computational ecosystems. This project explores how the current multi-cloud (e.g., SaaS + IaaS) approach can be adapted to modest scenarios where analysis pipelines can be deployed using Virtual Machines and Containers containing analysis tools and protocols. This approach aims to replicate sophisticated computer facilities in places with fewer resources like small universities, start-ups, and even individuals who want to learn and contribute to this and other sciences and its replicability. It is desired to explore the development of multi-cloud-compatible tools in physics analysis and operations monitoring using ATLAS experimental and simulated data, adding the Big Data component that the High Energy Physics field has by nature.
    
    Speaker: Arturo Sanchez Pineda (Centre National de la Recherche Scientifique (FR))
    
    ATL-SOFT-PROC-2021-008.pdf
    
    Recording
    
    vCHEP-2021-ArturoS-V4.pdf
    
    vCHEP_2021_ArturoS-v8-final.pdf
  - 183
    
    Using CMS Open Data in research -- challenges and directions
    
    The CMS experiment at CERN has released research-quality data from particle collisions at the LHC since 2014. Almost all data from the first LHC run in 2010--2012 with the corresponding simulated samples are now in the public domain, and several scientific studies have been performed using these data. This paper summarizes the available data and tools, reviews the challenges in using them in research, and discusses measures to improve their usability.
    
    Speaker: Edgar Fernando Carrera Jarrin (Universidad San Francisco de Quito (EC))
    
    CMSOpenDataInResearch_ECarrera_vCHEP2021.pdf
    
    Recording
- Facilities and Networks: Thu PM
  
  Conveners: David Crooks (UKRI STFC), Edoardo Martelli (CERN)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Tuesday afternoon
  
  Video recording Wednesday afternoon
  
  Video recording Wednesday morning
  
  Zoom
  - 184
    
    Harnessing HPC resources for CMS jobs using a Virtual Private Network
    
    The processing needs for the High Luminosity (HL) upgrade for the LHC require the CMS collaboration to harness the computational power available on non-CMS resources, such as High-Performance Computing centers (HPCs). These sites often limit the external network connectivity of their computational nodes. In this paper we describe a strategy in which all network connections of CMS jobs inside a facility are routed to a single point of external network connectivity using a Virtual Private Network (VPN) server by creating virtual network interfaces in the computational nodes. We show that when the computational nodes and the host running the VPN server have the namespaces capability enabled, the setup can run entirely on user space with no other root permissions required. The VPN server host may be a privileged node inside the facility configured for outside network access, or an external service that the nodes are allowed to contact. When namespaces are not enabled at the client side, then the setup falls back to using a SOCKS server instead of virtual network interfaces. We demonstrate the strategy by executing CMS Monte Carlo production requests on opportunistic non-CMS resources at the University of Notre Dame. For these jobs, cvmfs support is tested via fusermount (cvmfsexec), and the native fuse module.
    
    Speaker: Benjamin Tovar Lopez (University of Notre Dame)
    
    Recording
    
    vpn-ns-chep2021.pdf
  - 185
    
    Exploitation of network-segregated CPU resources in CMS
    
    CMS is tackling the exploitation of CPU resources at HPC centers where compute nodes do not have network connectivity to the Internet. Pilot agents and payload jobs need to interact with external services from the compute nodes: access to the application software (cmvfs) and conditions data (Frontier), management of input and output data files (data management services), and job management (HTCondor). Finding an alternative route to these services is challenging. Seamless integration in the CMS production system without causing any operational overhead is a key goal.
    
    The case of the Barcelona Supercomputing Center (BSC), in Spain, is particularly challenging, due to its especially restrictive network setup. We describe in this paper the solutions developed within CMS to overcome these restrictions, and integrate this resource in production. Singularity containers with application software releases are built and pre-placed in the HPC facility shared file system, together with conditions data files. HTCondor has been extended to relay communications between running pilot jobs and HTCondor daemons through the HPC shared file system. This operation mode also allows piping input and output data files through the HPC file system.
    
    Results, issues encountered during the integration process, and remaining concerns are discussed.
    
    Speaker: Antonio Delgado Peris (Centro de Investigaciones Energéti cas Medioambientales y Tecno)
    
    BSC_exploitation_vCHEP2021_slides.pdf
    
    Recording
  - 186
    
    WLCG Token Usage and Discovery
    
    Since 2017, the Worldwide LHC Computing Grid (WLCG) has been working towards enabling token based authentication and authorisation throughout its entire middleware stack. Following the publication of the WLCG v1.0 Token Schema in 2019, middleware developers have been able to enhance their services to consume and validate OAuth2.0 tokens and process the authorization information they convey. Complex scenarios, involving mul- tiple delegation steps and command line flows, are a key challenge to be ad- dressed in order for the system to be fully operational. This paper expands on the anticipated token based workflows, with a particular focus on local storage of tokens and their discovery by services. The authors include a walk-through of this token flow in the RUCIO managed data-transfer scenario, including delega- tion to FTS and authorised access to storage elements. Next steps are presented, including the current target of submitting production jobs authorised by Tokens within 2021.
    
    Speaker: Tom Dack (Science and Technology Facilities Council STFC (GB))
    
    2021CHEP_WLCGTokens.pdf
    
    2021CHEP_WLCGTokens.pptx
    
    Recording
  - 187
    
    Secure Command Line Solution for Token-based Authentication
    
    The WLCG is modernizing its security infrastructure, replacing X.509 client authentication with the newer industry standard of JSON Web Tokens (JWTs) obtained through the Open ID Connect (OIDC) protocol. There is a wide variety of software available using the standards, but most of it is for Web browser-based applications and doesn’t adapt well to the command line-based software used heavily in High Throughput Computing (HTC). OIDC command line client software did exist, but it did not meet our requirements for security and convenience. This paper discusses a command line solution we have made based on the popular existing secrets management software from Hashicorp called vault. We made a package called htvault-config to easily configure a vault service and another called htgettoken to be the vault client. In addition, we have integrated use of the tools into the HTCondor workload management system, although they also work well independent of HTCondor. All of the software is open source, under active development, and ready for use.
    
    Speaker: Dave Dykstra (Fermi National Accelerator Lab. (US))
    
    CHEP21_Talk_Htgettoken.pdf
    
    Recording
  - 188
    
    A Unified approach towards Multi-factor Authentication(MFA)
    
    With more applications and services deployed in BNL SDCC that rely on authentication services, adoption of Multi-factor Authentication (MFA) became inevitable. While web applications can be protected by Keycloak (a open source Single sign-on solution directed by Red Hat) with its MFA feature, other service components within the facility rely on FreeIPA (an open source identity management software directed by Red Hat) for MFA authentication. While this satisfies cyber security requirements, it creates a situation where users need to manage multiple tokens and differentiation of them depends upon what they access. Not only this is a major irritation for users, it also adds a burden for staff members who manage user tokens. To tackle the challenges, a solution needs to be found to provide a unified way for token management. In the paper, we elaborate a solution that was explored and implemented at the SDCC, and also plan to extend it's capabilities and flexibility's for future application integration's.
    
    Speaker: Masood Zaran (Brookhaven National Labratory)
    
    Recording
    
    unified_mfa_vchep2021.pdf
    
    unified_mfa_vchep2021.pptx
- Monitoring: Thu PM
  
  Conveners: Julia Andreeva (CERN), Sang Un Ahn (Korea Institute of Science & Technology Information (KR))
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Zoom
  - 189
    
    The evolution of the CMS monitoring infrastructure
    
    The CMS experiment at the CERN LHC (Large Hadron Collider) relies on a distributed computing infrastructure to process the multi-petabyte datasets where the collision and simulated data are stored. A scalable and reliable monitoring system is required to ensure efficient operation of the distributed computing services, and to provide a comprehensive set of measurements of the system performances. In this paper we present the full stack of CMS monitoring applications, partly based on the MONIT infrastructure, a suite of monitoring services provided by the CERN IT department. These are complemented by a set of applications developed over the last few years by CMS, leveraging open-source technologies that are industry-standards in the IT world, such as Kubernetes and Prometheus. We discuss how this choice helped the adoption of common monitoring solutions within the experiment, and increased the level of automation in the operation and deployment of our services.
    
    Speaker: Valentin Y Kuznetsov (Cornell University (US))
    
    CMSMonitoringEvolution.pdf
    
    Recording
  - 190
    
    Exploring the self-service model to visualize the results of the ATLAS Machine Learning analysis jobs in BigPanDA with Openshift OKD3
    
    A large scientific computing infrastructure must offer versatility to host any kind of experiment that can lead to innovative ideas. The ATLAS experiment offers wide access possibilities to perform intelligent algorithms and analyze the massive amount of data produced in the Large Hadron Collider at CERN. The BigPanDA monitoring is a component of the PanDA (Production ANd Distributed Analysis) system and its main role is to monitor the entire lifecycle of a job/task running in the ATLAS Distributed Computing infrastructure. Because many scientific experiments now rely upon Machine Learning algorithms, the BigPanDA community desires to expand the platform’s capabilities and fill the gap between Machine Learning processing and data visualization. In this regard, BigPanDA partially adopts the cloud-native paradigm and entrusts the data presentation to MLFlow services running on Openshift OKD. Thus, BigPanDA interacts with the OKD API and instructs the containers orchestrator how to locate and expose the results of the Machine Learning analysis. The proposed architecture also introduces various DevOps-specific patterns, including continuous integration for MLFlow middleware configuration and continuous deployment pipelines that implement rolling upgrades. The Machine Learning data visualization services operate on demand and run for a limited time, thus optimizing the resource consumption.
    
    Speaker: Ioan-Mihail Stan (University Politehnica of Bucharest (RO))
    
    ATL-SOFT-SLIDE-2021-121.pdf
    
    Recording
  - 191
    
    Archival, anonymization and presentation of HTCondor logs with GlideinMonitor
    
    GlideinWMS is a pilot framework to provide uniform and reliable HTCondor clusters using heterogeneous and unreliable resources. The Glideins are pilot jobs that are sent to the selected nodes, test them, set them up as desired by the user jobs, and ultimately start an HTCondor schedd to join an elastic pool. These Glideins collect information that is very useful to evaluate the health and efficiency of the worker nodes and invaluable to troubleshoot when something goes wrong. This data, including local stats, the results of all the tests, and the HTCondor log files, is packed and sent to the GlideinWMS Factory. To access this information, developers and troubleshooters must exchange emails with Factory operators and dig manually into files. Furthermore, these files contain also information like email and IP addresses, and user IDs, that we want to protect and limit access to. GlideinMonitor is a Web application to make these logs more accessible and useful: it organizes the logs in an efficient compressed archive; it allows to search, unpack, and inspect them, all in a convenient and secure Web interface; via plugins like the log anonymizer, it can redact protected information preserving the parts useful for troubleshooting.
    
    Speaker: Marco Mambelli (Fermilab (US))
    
    GlideinMonitor-vCHEP2021.pdf
    
    mambelli-glideinmonitor-chep2021_v2.pdf
    
    Recording
  - 192
    
    Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC
    
    The ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over the world and is processed continuously by various central production and user analysis tasks. The popularity of data is typically measured as the number of accesses and plays an important role in resolving data management issues: deleting, replicating, moving between tapes, disks and caches. These data management procedures were still carried out in a semi-manual mode and now we have focused our efforts on automating it, making use of the historical knowledge about existing data management strategies. In this study we describe sources of information about data popularity and demonstrate their consistency. Based on the calculated popularity measurements, various distributions were obtained. Auxiliary information about replication and task processing allowed us to evaluate the correspondence between the number of tasks with popular data executed per site and the number of replicas per site. We also examine the popularity of user analysis data that is much less predictable than in the central production and requires more indicators than just the number of accesses.
    
    Speaker: Maria Grigoryeva (M.V. Lomonosov Moscow State University (RU))
    
    CHEP 2021 Slides.pdf
    
    Recording
  - 193
    
    Analysis of data integrity and storage quality of a distributed storage system
    
    CERN uses the world's largest scientific computing grid, WLCG, for distributed data storage and processing. Monitoring of the CPU and storage resources is an important and essential element to detect operational issues in its systems, for example in the storage elements, and to ensure their proper and efficient function. The processing of experiment data depends strongly on the data access quality, as well as its integrity and both of these key parameters must be assured for the data lifetime. Given the substantial amount of data, O(200PB), already collected by ALICE and kept at various storage elements around the globe, scanning every single data chunk would be a very expensive process, both in terms of computing resources usage and in terms of execution time. In this paper, we describe a distributed file crawler that addresses these natural limits by periodically extracting and analyzing statistically significant samples of files from storage elements, evaluates the results and is integrated with the existing monitoring solution, MonALISA.
    
    Speaker: Adrian-Eduard Negru (University Politehnica of Bucharest (RO))
    
    Recording
    
    vCHEP_monitoring_track_data_analysis.pdf
  - 194
    
    Recent Improvements to the ATLAS Offline Data Quality Monitoring System
    
    Recent changes to the ATLAS offline data quality monitoring system are described. These include multithreaded histogram filling and subsequent postprocessing, improvements in the responsiveness and resource use of the automatic check system, and changes to the user interface to improve the user experience.
    
    Speaker: Peter Onyisi (University of Texas at Austin (US))
    
    20210520_chep_monitoring.pdf
    
    Recording
- Virtualisation: Thu PM
  
  Conveners: Gordon Watts (University of Washington (US)), Niko Neufeld (CERN)
  
  Mattermost
  
  Video recording Thursday afternoon
  
  Video recording Thursday morning
  
  Zoom
  - 195
    
    Containerization in ATLAS Software Development and Data Production
    
    The ATLAS experiment’s software production and distribution on the grid benefits from a semi-automated infrastructure that provides up-to-date information about software usability and availability through the CVMFS distribution service for all relevant systems. The software development process uses a Continuous Integration pipeline involving testing, validation, packaging and installation steps. For opportunistic sites that can not access CVMFS, containerized releases are needed. These standalone containers are currently created manually to support Monte-Carlo data production at such sites. In this paper we will describe an automated procedure for the containerization of ATLAS software releases in the existing software development infrastructure, its motivation, integration and testing in the distributed computing system.
    
    Speaker: Nurcan Ozturk (University of Texas at Arlington (US))
    
    ATL-SOFT-SLIDE-2021-141.pdf
    
    Recording
  - 196
    
    Distributed statistical inference with pyhf enabled through funcX
    
    In High Energy Physics facilities that provide High Performance Computing environments provide an opportunity to efficiently perform the statistical inference required for analysis of data from the Large Hadron Collider, but can pose problems with orchestration and efficient scheduling. The compute architectures at these facilities do not easily support the Python compute model, and the configuration scheduling of batch jobs for physics often requires expertise in multiple job scheduling services. The combination of the pure-Python libraries pyhf and funcX reduces the common problem in HEP analyses of performing statistical inference with binned models, that would traditionally take multiple hours and bespoke scheduling, to an on-demand (fitting) ``function as a service'' that can scalably execute across workers in just a few minutes, offering reduced time to insight and inference. We demonstrate execution of a scalable workflow using funcX to simultaneously fit 125 signal hypotheses from a published ATLAS search for new physics using pyhf with a wall time of under 3 minutes. We additionally show performance comparisons for other physics analyses with openly published probability models and argue for a blueprint of fitting as a service systems at HPC centers.
    
    Speaker: Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
    
    Distributed statistical inference with pyhf enabled through funcX
    
    Feickert_2021-05-20.pdf
    
    Project GitHub repository
    
    Recording
  - 197
    
    First experiences with a portable analysis infrastructure for LHC at INFN
    
    The challenges proposed by the HL-LHC era are not limited to the sheer amount of data to be processed: the capability of optimizing the analyser's experience will also bring important benefits for the LHC communities, in terms of total resource needs, user satisfaction and in the reduction of end time to publication. At the Italian National Institute for Nuclear Physics (INFN) a portable software stack for analysis has been proposed, based on cloud-native tools and capable of providing users with a fully integrated analysis environment for the CMS experiment. The main characterizing traits of the solution consist in the user-driven design and the portability to any cloud resource provider. All this is made possible via an evolution towards a “python-based” framework, that enables the usage of a set of open-source technologies largely adopted in both cloud-native and data-science environments. In addition, a “single sign on”-like experience is available thanks to the standards-based integration of INDIGO-IAM with all the tools. The integration of compute resources is done through the customization of a JupyterHUB solution, able to spawn identity-aware user instances ready to access data with no further setup actions. The integration with GPU resources is also available, designed to sustain more and more widespread ML based workflow. Seamless connections between the user UI and batch/big data processing framework (Spark, HTCondor) are possible. Eventually, the experiment data access latency is reduced thanks to the integrated deployment of a scalable set of caches, as developed in the context of ESCAPE project, and as such compatible with the future scenarios where a data-lake will be available for the research community.
    The outcome of the evaluation of such a solution in action is presented, showing how a real CMS analysis workflow can make use of the infrastructure to achieve its results.
    
    Speaker: Diego Ciangottini (INFN, Perugia (IT))
    
    Recording
    
    vchep-final (2).pdf
  - 198
    
    Building a Kubernetes infrastructure for CERN’s Content Management Systems
    
    The infrastructure behind home.cern and 1000 other Drupal websites serves more than 15,000 unique visitors daily. To best serve the site owners, a small engineering team needs development speed to adapt to their evolving needs and operational velocity to troubleshoot emerging problems rapidly. We designed a new Web Frameworks platform by extending Kubernetes to replace the ageing physical infrastructure and reduce the dependency on homebrew components.
    
    The new platform is modular, built around standard components and thus less complex to operate. Some requirements are covered solely by upstream open source projects, whereas others by components shared across CERN's web hosting platforms. We leverage the Operator framework and the Kubernetes API to get observability, policy enforcement, access control and auditing, and high availability for free. Thanks to containers and namespaces, websites are isolated. This isolation clarifies security boundaries and minimizes attack surface, while empowering site owners.
    
    In this work we present the open-source design of the new system and contrast it with the one it replaces, demonstrating how we drastically reduced our technical debt.
    
    Speaker: Konstantinos Samaras-Tsakiris (CERN)
    
    Recording
    
    vCHEP21 - Kubernetes infrastructure for CERN’s Content Management Systems.pdf
  - 199
    
    Building HEP Software with Spack: Experiences from Pilot Builds for Key4hep and Outlook for LCG Releases
    
    Consistent, efficient software builds and deployments are a common concern for all HEP experiments. These proceedings describe the evolution of the usage of the Spack package manager in HEP in the context of the LCG stacks and the current Spack-based management of Key4hep software. Whereas previously Key4hep software used spack only for a thin layer of FCC experiment software on top of the LCG releases, it is now possible to build the
    complete stack, from system libraries to FCC-, iLCSoft- and CEPC software
    packages with Spack. This pilot build doubles as a prototype for a Spack-based
    LCG release. The workflows and mechanisms that can be used for this purpose,
    potential for improvement as well as the roadmap towards a complete LCG release in spack are discussed.
    
    Speaker: Valentin Volkl (University of Innsbruck (AT))
    
    2021-05-20-vCHEP-Spack.pdf
    
    Recording
  - 200
    
    FTS3: Data Movement Service in containers deployed in OKD
    
    The File Transfer Service (FTS3) is a data movement service developed at CERN which is used to distribute the majority of the Large Hadron Collider's data across the Worldwide LHC Computing Grid (WLCG) infrastructure. At Fermilab, we have deployed FTS3 instances for Intensity Frontier experiments (e.g. DUNE) to transfer data in America and Europe, using a container-based strategy. In this article we summarize our experience building docker images based on work from the SLATE project (slateci.io) and deployed in OKD, the community distribution of Red Hat OpenShift. Additionally, we discuss our method of certificate management and maintenance utilizing Kubernetes CronJobs. Finally, we also report on the two different configurations currently running at Fermilab, comparing and contrasting a Docker-based OKD deployment against a traditional RPM-based deployment.
    
    Speaker: Lorena Lobato Pardavila (Fermi National Accelerator Lab. (US))
    
    FTS3containersOKDCHEP.pdf
    
    Recording
- 16:20
  
  Break
- Thurs PM Plenaries: Plenaries
  
  Conveners: Heather Gray (UC Berkeley/LBNL), Reda Tafirout (TRIUMF (CA))
  
  Mattermost
  
  Video recording
  
  Zoom
  - 201
    
    Evaluation of Portable Acceleration Solutions for LArTPC Simulation Using Wire-Cell Toolkit
    
    The Liquid Argon Time Projection Chamber (LArTPC) technology plays an essential role in many current and future neutrino experiments. Accurate and fast simulation is critical to developing efficient analysis algorithms and precise physics model projections. The speed of simulation becomes more important as Deep Learning algorithms are getting more widely used in LArTPC analysis and their training requires a large simulated dataset. Heterogeneous computing is an efficient way to delegate computing-heavy tasks to specialized hardware. However, as the landscape of the compute accelerators is evolving fast, it becomes more and more difficult to manually adapt the code constantly to the latest hardware or software environments. A solution which is portable to multiple hardware architectures while not substantially compromising performance would be very beneficial, especially for long-term projects such as the LArTPC simulations. In search of a portable, scalable and maintainable software solution for LArTPC simulations, we have started to explore high-level portable programming frameworks that support several hardware backends. In this paper, we will present our experience porting the LArTPC simulation code in the Wire-Cell toolkit to NVIDIA GPUs, first with the CUDA programming model and then with a portable library called Kokkos. Preliminary performance results on NVIDIA V100 GPUs and multi-core CPUs will be presented, followed by a discussion of the factors affecting the performance and plans for future improvements.
    
    Speaker: Haiwang Yu (Brookhaven National Laboratory)
    
    Recording
    
    Wire-Cell Kokkos vCHEP.pdf
  - 202
    
    Physics and Computing Performance of the Exa.TrkX TrackML Pipeline
    
    The Exa.TrkX project has applied geometric learning concepts such as metric learning and graph neural networks to HEP particle tracking. The Exa.TrkX tracking pipeline clusters detector measurements to form track candidates and selects track candidates with competitive efficiency and purity. The pipeline, originally developed using the TrackML dataset (a simulation of an LHC-like tracking detector), has been demonstrated on various detectors, including the DUNE LArTPC and the CMS High-Granularity Calorimeter. This paper documents new developments which were needed to study the physics and computing performance of the Exa.TrkX pipeline on the full TrackML dataset, a first step towards validating the pipeline using ATLAS and CMS data. The pipeline achieves tracking efficiency and purity similar to production tracking algorithms. Crucially for HL-LHC and future collider applications, the pipeline benefits significantly from GPU acceleration, and its computational requirements scale close to linearly with the number of particles in the event.
    
    Speaker: Daniel Thomas Murnane (Lawrence Berkeley National Lab. (US))
    
    Physics and Computing Performance of the ExaTrkX TrackML Pipeline.pdf
    
    Recording
  - 17:40
    
    Break
  - 203
    
    A hybrid system for monitoring and automated recovery at the Glasgow Tier-2 cluster
    
    We have deployed a central monitoring and logging system based on Prometheus, Loki and Grafana that collects, aggregates and displays metrics and logs from the Tier-2 ScotGrid cluster at Glasgow. Bespoke dashboards built on Prometheus metrics give a quick overview of cluster performance and make it easy to identify issues. Logs from all nodes and services are collected to a central Loki server and retained over time. This integrated system provides a full overview of the cluster's health and has become an essential tool for daily maintenance and in the investigation of any issue.
    The system includes an automated alerting application that parses metrics and logs and can send notifications when specified conditions are met, and as a further step toward automation, can also perform simple recovery actions based on well known issues and their encoded solutions. The general purpose is to create a more resilient Tier-2 cluster where human intervention is kept to a minimum. Given the funding constraints experienced by many academic research institutions, this promises to free staff from routine tasks, allowing them to address their expertise to more interesting problems.
    In this paper, we describe the tools and set-up of the existing monitoring system, the automated recovery methods implemented so far, and the plan for further automation.
    
    Speaker: Emanuele Simili (University of Glasgow)
    
    MonitoringAndAutomation_vCHEP2021.pdf
    
    Recording
  - 18:30
    
    Final Conference Photo Opportunity
  - 204
    
    DUNE Software and Computing Challenges
    
    The DUNE experiment will begin running in the late 2020’s. The goals of the experiment include 1) studying neutrino oscillations using a beam of neutrinos from Fermilab in Illinois to the Sanford Underground Research Facility, 2) studying astrophysical neutrino sources and rare processes and 3) understanding the physics of neutrino interactions in matter. The DUNE Far Detector, consisting of four 17 kt LArTPC modules, will produce "events" ranging in size from 6 GBs to more than 100 TBs, posing unique challenges for DUNE software and computing. The data processing algorithms, particularly for raw data, drive the requirements for the future DUNE software framework. This paper provides an overview of the DUNE experiment, details early stages of DUNE raw data processing together with information on “event” rates and sizes, and summarises the current understanding of the DUNE software framework requirements. Finally, the timeline for DUNE computing together with early design concepts being tested with ProtoDUNE data are provided.
    
    Speaker: Paul James Laycock (Brookhaven National Laboratory (US))
    
    Recording
    
    vCHEP_DUNE_Plenary_Slides.pdf
Friday 21 May
- Fri AM Plenaries: Plenaries
  
  Conveners: Chiara Ilaria Rovelli (Sapienza Universita e INFN, Roma I (IT)), Stefano Piano (INFN (IT))
  
  Mattermost
  
  Video recording
  
  Zoom
  - 205
    
    AtlFast3: Next Generation of Fast Simulation in ATLAS
    
    ATLAS is one of the largest experiments at the Large Hadron Collider. Its broad physics program ranges from precision measurements to the discovery of new interactions, requiring gargantuan amount of simulated Monte Carlo events. However, a detailed detector simulation with Geant4 is often too slow and requires too many CPU resources. For more than 10 years, ATLAS has developed and utilized tools that replace the slowest component - the calorimeter shower simulation - by faster alternatives. AtlFast3 is the next generation of high precision fast simulation in ATLAS. AtlFast3 is a combination of a parametrization-based Fast Calorimeter Simulation and a new machine learning-based Fast Calorimeter Simulation, and is deployed to meet the computing challenges and Monte Carlo needs now and in the future of ATLAS. With unprecedented precision and the ability to model jet sub-structure, AtlFast3 can be used for the simulation of almost any physics processes.
    
    Speaker: Hasib Ahmed (The University of Edinburgh (GB))
    
    Recording
    
    vCHEP2021_AF3.pdf
  - 206
    
    The Phase-2 Upgrade of the CMS Data Acquisition
    
    The High Luminosity LHC (HL-LHC), will start operating in 2027 after the third Long Shutdown (LS3), and is designed to provide an ultimate instantaneous luminosity of $7.5\times10^{34}$ cm$^{-2}$ s$^{-1}$, at the price of extreme pileup of up to 200 interactions per crossing. The number of overlapping interactions in HL-LHC collisions, their density, and the resulting intense radiation environment, warrant an almost complete upgrade of the CMS detector.
    The upgraded CMS detector will be read out by approximately fifty thousand high-speed front-end optical links at an unprecedented data rate of up to 80~Tb/s, for an average expected total event size of approximately $7-10$ MB.
    Following the present established design, the CMS trigger and data acquisition system will continue to feature two trigger levels, with only one synchronous hardware-based Level-1 Trigger (L1), consisting of custom electronic boards and operating on dedicated data streams, and a second level, the High Level Trigger (HLT), using software algorithms running asynchronously on standard processors and making use of the full detector data to select events for offline storage and analysis.
    The upgraded CMS data acquisition system will collect data fragments for Level-1 accepted events from the detector back-end modules at a rate up to 750 kHz, aggregate fragments corresponding to individual Level-1 accepts into events, and distribute them to the HLT processors where they will be filtered further. Events accepted by the HLT will be stored permanently at a rate of up to 7.5 kHz.
    This paper describes the baseline design of the DAQ and HLT systems for the Phase-2 operation of CMS.
    
    Speaker: Dr Emilio Meschi (CERN)
    
    Recording
    
    vCHEP21-CMS-DAQ-Phase2-EM_v2.pdf
- Fri PM Plenaries: Plenaries
  
  Conveners: Concezio Bozzi (INFN Ferrara), Graeme A Stewart (CERN)
  
  Mattermost
  
  Video recording (last plenary + closing session)
  
  Zoom
  - 207
    
    Software Training in HEP
    
    Long term sustainability of the high energy physics (HEP) research software ecosystem is essential for the field. With upgrades and new facilities coming online throughout the 2020s this will only become increasingly relevant throughout this decade. Meeting this sustainability challenge requires a workforce with a combination of HEP domain knowledge and advanced software skills. The required software skills fall into three broad groups. The first is fundamental and generic software engineering (e.g. Unix, version control, C++, continuous integration). The second is knowledge of domain-specific HEP packages and practices (e.g., the ROOT data format and analysis framework). The third is more advanced knowledge involving more specialized techniques. These include parallel programming, machine learning and data science tools, and techniques to preserve software projects at all scales. This paper discusses the collective software training program in HEP and its activities led by the HEP Software Foundation (HSF) and the Institute for Research and Innovation in Software in HEP (IRIS-HEP). The program equips participants with an array of software skills that serve as ingredients from which solutions to the computing challenges of HEP can be formed. Beyond serving the community by ensuring that members are able to pursue research goals, this program serves individuals by providing intellectual capital and transferable skills that are becoming increasingly important to careers in the realm of software and computing, whether inside or outside HEP.
    
    Speaker: Sudhir Malik (University of Puerto Rico (PR))
    
    Recording
    
    vCHEP_May2021_Malik
    
    vCHEP_May2021_Malik.pdf
  - 208
    
    Evolution of the energy efficiency of LHCb's real-time processing
    
    The upgraded LHCb detector, due to start datataking in 2022, will have to process an average data rate of 4~TB/s in real time. Because LHCb's physics objectives require that the full detector information for every LHC bunch crossing is read out and made available for real-time processing, this challenge mirrors that of the ATLAS and CMS HL-LHC software triggers, but deliverable five years earlier. Over the past six years, the LHCb collaboration has undertaken a bottom-up rewrite of its software infrastructure, pattern recognition, and selection algorithms to make them better able to efficiently exploit modern highly parallel computing architectures. We review the impact of this reoptimization on the energy efficiency of the real-time processing software and hardware which will be used for the upgrade of the LHCb detector. We also review the impact of LHCb's decision to adopt a hybrid computing architecture consisting of GPUs and CPUs for the real-time part of its upgrade data processing. We discuss the implications of these results on how LHCb's real-time power requirements may evolve in the future, particularly in the context of a planned second upgrade of the detector.
    
    Speaker: Rainer Schwemmer (CERN)
    
    Recording
    
    VCHEP2021LHCb_HLTEnergyEfficiency_v2.pdf
  - 209
    
    Charged particle tracking via edge-classifying interaction networks
    
    Recent work has demonstrated that geometric deep learning methods such as graph neural networks (GNNs) are well-suited to address a variety of recon- struction problems in HEP. In particular, tracker events are naturally repre- sented as graphs by identifying hits as nodes and track segments as edges; given a set of hypothesized edges, edge-classifying GNNs predict which rep- resent real track segments. In this work, we adapt the physics-motivated Inter- action Network (IN) GNN to the problem of charged-particle tracking in the high-pileup conditions expected at the HL-LHC. We demonstrate the IN’s ex- cellent edge-classification accuracy and tracking efficiency through a suite of measurements at each stage of GNN-based tracking: graph construction, edge- classification, and track building. Notably, the proposed IN architecture is sub- stantially smaller than previously studied GNN tracking architectures; this type of reduction in size critical for enabling GNN-based tracking in constrained computing environments. Furthermore, the IN is easily expressed as a set of matrix operations, making it a promising candidate for acceleration via hetero- geneous computing resources.
    
    Speaker: Gage DeZoort (Princeton University (US))
    
    Recording
    
    vCHEP_2021_Edge-Classifying_INs.pdf
- Closing Session
  
  Conveners: Concezio Bozzi (INFN Ferrara), Graeme A Stewart (CERN)
  
  Video recording (last plenary + closing session)
  
  Zoom
  - 210
    
    CHEP 2022
    
    Speakers: Amber Boehnlein (Jefferson Lab), Graham Heyes (Jefferson Lab)
    
    JLAB-CHEP 2022.pdf
    
    JLAB-CHEP 2022.pptx
    
    Recording
  - 211
    
    Closeout
    
    Speaker: Simone Campana (CERN)
    
    Recording
    
    vCHEP-Closing.pdf
    
    vCHEP-Closing.pptx

Choose timezone

25th International Conference on Computing in High Energy & Nuclear Physics

25th International Conference on Computing in High-Energy and Nuclear Physics

vCHEP2021

Proceedings