ACAT 2022

Europe/Rome
Villa Romanazzi Carducci, Bari, Italy

Villa Romanazzi Carducci, Bari, Italy

Via Giuseppe Capruzzi, 326, 70124 Bari BA
Lucia Silvestris (Universita e INFN, Bari (IT))
Description

21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research

The 21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2022) will take place between Monday 24th and Friday, 28th October, 2022 at the Villa Romanazzi Carducci in Bari, Italy.

The 21st edition of ACAT will — once again — bring together computational experts from a wide range of disciplines, including particle-, nuclear-, astro-, and accelerator-physics as well as high performance computing. Through this unique forum, we will explore the areas where these disciplines overlap with computer science, fostering the exchange of ideas related to cutting-edge computing, data-analysis, and theoretical-calculation technologies.

News on abstract selections

Information will be provided before August 15th.

AI meets Reality:

The theme of ACAT 2022 will reflect the increasing adoption of AI and ML techniques as standard tools in science and beyond. This use in real production workflows shows both successes and new challenges.

More Info

You can sign up for email notifications acat-info@cern.ch by sending email to acat-loc2022@cern.ch! This list is low traffic and will only get you ACAT conference announcements and general information (for this and future conferences in the ACAT series).

Many people are working together to bring you this conference! The organization page has some details. David Britton is the chair of the International Advisory Committee and Axel Naumann is the chair of the Scientific Program Committee. Lucia Silvestris is the chair of the Local Organizing Committee.

 

Banner, backgrounds and poster photos by Francesco Pepe ©. 

Participants
  • Abhijith Gandrakota
  • Alessandra Carlotta Re
  • Alessandro Lonardo
  • Alexander Held
  • Alexey Rybalchenko
  • Alexis Pompili
  • Ali Marafi
  • Andrea Bocci
  • Andrea Pasquale
  • Andrea Valassi
  • Andrea Wulzer
  • Andrew Schick
  • Andrius Vaitkus
  • Anja Butter
  • Ankur Singha
  • Anna Scaife
  • Antonio Perez-Calero Yzquierdo
  • Antonio Vagnerini
  • Arthur Hennequin
  • Aryan Roy
  • Aurora Perego
  • Axel Naumann
  • Baidyanath Kundu
  • Barry Dillon
  • Benno Kach
  • Beojan Stanislaus
  • Bernhard Manfred Gruber
  • Boyang Yu
  • Carlos Perez Dengra
  • Ceyhun Uzunoglu
  • Christian Gutschow
  • Claudio Caputo
  • Corentin Allaire
  • Daniel Maitre
  • Daniela Mascione
  • Daniele Cesini
  • Daniele Spiga
  • David Britton
  • David Lange
  • David Lawrence
  • Diana McSpadden
  • Diego Ciangottini
  • Dmitry Popov
  • Domenico Colella
  • Egor Danilov
  • Elias Leutgeb
  • Elise de Doncker
  • Elizabeth Sexton-Kennedy
  • Elliott Kauffman
  • Elton Shumka
  • Emanuele Simili
  • Emmanouil Vourliotis
  • Enrico Bothmann
  • Enrico Guiraud
  • Eric Cano
  • Eric Wulff
  • Evangelos Kourlitis
  • Fabio Bisi
  • Fabrizio Alfonsi
  • Farouk Mokhtar
  • Federico Scutti
  • Felix Wagner
  • Florian Reiss
  • FNU Mohammad Atif
  • Fons Rademakers
  • Fukuko YUASA
  • Garima Singh
  • Giovanna Lazzari Miotto
  • Giulia Lavizzari
  • Giulia Tuci
  • Giuseppe De Laurentis
  • Giuseppina Salente
  • Gloria Corti
  • Gábor Bíró
  • Haiwang Yu
  • Henri Hugo Sieber
  • HENRY TRUONG
  • Hongyue Duyang
  • Hosein Karimi Khozani
  • Humberto Reyes-González
  • Ian Fisk
  • Irina Espejo Morales
  • Jack Y. Araz
  • Jacopo Cerasoli
  • Jan Stephan
  • Javier Lopez Gomez
  • Jennifer Ngadiuba
  • Joana Niermann
  • Johann Usovitsch
  • John Lawrence
  • Jonas Rembser
  • Juan Carlos Criado
  • Ka Hei Martin Kwok
  • Kaixuan Huang
  • Laurits Tani
  • lia lavezzi
  • Lukas Breitwieser
  • Manasvi Goyal
  • Mantas Stankevicius
  • Marc Huwiler
  • Marcel Hohmann
  • Marco Barbone
  • Marco Lorusso
  • Marica Antonacci
  • Marta Bertran Ferrer
  • Mate Zoltan Farkas
  • Matteo Barbetti
  • Max Fischer
  • Max Knobbe
  • Maximilian Magnus Julien Mucha
  • Meifeng Lin
  • Meinrad Moritz Schefer
  • Michael Boehler
  • michael goodrich
  • Moonzarin Reza
  • Moritz Bauer
  • Moritz Scham
  • Namitha Chithirasreemadam
  • Nathan Brei
  • Nick Smith
  • Nicola De Filippis
  • Nicola Mori
  • Nicole Schulte
  • Nilotpal Kakati
  • Noam Mouelle
  • Oksana Shadura
  • Oriel Orphee Moira Kiss
  • Oscar Roberto Chaparro Amaro
  • Ouail Kitouni
  • Patrick Rieck
  • Philipp Zehetner
  • R. Florian von Cube
  • Raja Appuswamy
  • Roberto Giacomelli
  • Rosamaria Venditti
  • Rui Zhang
  • Ryan Moodie
  • Sabina Tangaro
  • Sascha Diefenbacher
  • Simon Akar
  • Simon David Badger
  • Simon Schnake
  • Simone Pigazzini
  • Sophie Berkman
  • Spandan Mondal
  • Stefano Bagnasco
  • Stefano Dal Pra
  • Stephen Nicholas Swatman
  • Su Yeon Chang
  • Sven Krippendorf
  • Svenja Diekmann
  • Tao Lin
  • Taylor Childers
  • Theo Heimel
  • Thomas Owen James
  • Tianle Wang
  • Tim Schwägerl
  • Tim Voigtlaender
  • Timo Janssen
  • Tomas Raila
  • Ulrich Schwickerath
  • Umit Sozbilir
  • Vardan Gyurjyan
  • Vasilis Belis
  • Vincenzo Eduardo Padulano
  • Vito Conforti
  • Wahid Redjeb
  • Wenxing Fang
  • Xiaoqian Jia
  • Xiaoshuai Qin
  • Yao Zhang
  • Yasumichi Aoki
  • Ying CHEN
  • Zeno Capatti
  • +75
    • 5:30 PM
      Registration
    • 8:00 AM
      Registration: Registration desk opens
    • Plenary: I Sala Europa (Villa Romanazzi Carducci)

      Sala Europa

      Villa Romanazzi Carducci

      Conveners: Lucia Silvestris (Universita e INFN, Bari (IT)), Dr Maria Girone (CERN)
      • 1
        Welcome to ACAT 2022 in Bari
        Speaker: Lucia Silvestris (Universita e INFN, Bari (IT))
      • 2
        Welcome address by INFN-Bari Director
        Speaker: Vito Manzari (INFN - Bari)
      • 3
        Welcome address by Dipartimento interateneo di Fisica Director (UniBa)
        Speakers: Dr Antonio Marrone (Univ. of Bari), Roberto Bellotti (Dipartimento di Fisica, Università degli Studi di Bari)
      • 4
        ACAT 2022 in Bari Logistics information
        Speaker: Lucia Silvestris (Universita e INFN, Bari (IT))
      • 5
        The European Processor Initiative (EPI), an status update

        The European Processor Initiative (EPI) is an EU-funded project that aims to develop and implement a new family of European processors for high performance computing, artificial intelligence, and a range of emerging application domains. A variety of processor technologies are being implemented as part of EPI. They are divided into two main development lines: the General Purpose Processor (GPP) and the European Processor Accelerator (EPAC).
        The first CPU from the GPP line -- Rhea1, a multi-core processor using the Arm Neoverse V1 architecture --, will be commercialised by SiPEARL SAS. The Rhea1 architectural specifications have been determined via co-design using typical HPC applications and benchmarks. Rhea1 will integrate core technologies from several EPI partners and offers unique features in terms of memory architecture, memory bandwidth optimisation, security and power management. Amongst others, it includes High Bandwidth Memory (HBM2) and a scalable network-on-chip (NoC) that enables high-frequency, high-bandwidth data transfers between cores, accelerators, input/output (IO) and shared memory resources.
        The EPI accelerator line uses the open-source RISC-V Instruction Set Architecture (ISA) to deliver energy-efficient acceleration for HPC and AI workloads. The EPAC v1.0 test chip is the first proof-of-concept of the EPI accelerator stream, which has fully embraced the open-source philosophy by contributing to the expansion of the RISC-V ecosystem, extending the LLVM compiler codebase and providing new patches, drivers and features for the Linux operating system, OpenMP and MPI. In addition, parts of the accelerator hardware such as the STX (Stencil/Tensor accelerator) have been developed using an open source approach with free licensing on the PULP platform.
        The GPP and EPAC streams are complemented by a number of joint activities, including a co-design process to design the EPI processors. Simulations and models of varying levels of detail and precision have been produced to determine the impact of design decisions on the performance of future applications. A benchmark suite containing over 40 applications is used in support of co-design and subsequent evaluation of the EPI processors. The applications are also prepared for use on future EPI systems by adapting and testing them on comparable hardware platforms and emulators.
        This talk will describe the main developments of the EPI project and present their current status and roadmap.

        Speaker: Estela Suarez
      • 6
        Quantum computing: a grand era for simulating fluid

        Transport phenomena remains nowadays the most challenging unsolved problems in computational physics due to the inherent nature of Navier-Stokes equations. As the revolutionary technology, quantum computing opens a grand new perspective for numerical simulations for instance the computational fluid dynamics (CFD). In this plenary talk, starting with an overview of quantum computing including basic conceptions for instance qubits, quantum gates and circuit, more focus are then put on how to translate the algorithms from the classical computation system to quantum system. The possible quantum algorithms (e.g. partial different equation solver, eigenvalue solvers, etc.) for fluid dynamics are overviewed. Two concrete typical examples are presented with details namely: first one based on lattice Boltzmann method, the second one based on quantum Navier-Stokes algorithm. In the latter method the key process of reducing partial different equations to ordinary differential equations is explained. In the end the advantages of quantum computing are compared with the classical computation, indicating that a large application area for simulating fluid using quantum system is yet coming.

        Speaker: Prof. Rui Li (Deggendorf Institute of Technology)
      • 7
        Quantum Technologies: areas of improvement or how not to slide into quantum winter

        The talk provides a short overview of QT history leading up to current times. Lets have a hard look at where we are in terms of QT and what major pitfalls to expect. The presentation will focus particularly on the issue of the growing talent gap.

        Speaker: Helena Liebelt (Deggendorf Institute of Technology)
    • Poster session with coffee break Area Poster (Floor -1) (Villa Romanazzi)

      Area Poster (Floor -1)

      Villa Romanazzi

      • 8
        A comparison of HEPSPEC benchmark performance on ATLAS Grid-Sites versus ideal conditions

        The goal of this study is to understand the observed differences in ATLAS software performance, when comparing results measured under ideal laboratory conditions with those from ATLAS computing resources on the Worldwide LHC Computing Grid (WLCG). The laboratory results are based on the full simulation of a single ttbar event and use dedicated, local hardware. In order to have a common and reproducible base to which to compare, thousands of identical ttbar full simulation benchmark jobs were submitted to hundreds of Grid sites using the HammerCloud infrastructure. The impact of the heterogeneous hardware of the Grid sites and the performance difference of different hardware generations is analysed in detail, and a direct, in depth comparison of jobs performed on identical CPU types is also done. The choice of the physics sample used in the benchmark is validated by comparing the performance on each Grid site measured with HammerCloud, weighted by its contribution to the total ATLAS full simulation production output.

        Speaker: Michael Boehler (Albert Ludwigs Universitaet Freiburg (DE))
      • 9
        A Deep Learning based algorithm for PID study with cluster counting

        Ionization of matters by charged particles are the main mechanism for particle identification in gaseous detectors. Traditionally, the ionization is measured by the total energy loss (dE/dx). The concept of cluster counting, which measures the number of clusters per track length (dN/dx), was proposed in the 1970s. The dN/dx measurement can avoid many sources of fluctuations from the dE/dx measurement, which in the end can potentially have a resolution two times better than the dE/dx.

        The dN/dx measurement requires highly efficient reconstruction algorithm. One need to determine the number of peaks associated with the primary electrons in the induced current waveform in a single detection unit. The main challenge of the algorithm is to handle the highly pileup situations of the single peaks and to discriminate the primary peaks from the secondary electrons and noises. A machine learning based algorithm is developed for the cluster counting problem. The algorithm consists of a peak finding algorithm, which aims to find all peaks in the waveform, based on the Recurrent Neural Network (RNN). And a clustering algorithm, which is to determine the number of primary peaks, based on the Convolutional Neural Network (CNN).

        In the talk, the basic idea of cluster counting and the reconstruction algorithm based on machine learning will be presented.

        Speaker: Dr Guang Zhao (Institute of High Energy Physics)
      • 10
        A distributed infrastructure for interactive analysis: the experience at INFN

        The challenges expected for the HL-LHC era, both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of re-thinking their computing models at many levels. In fact a big chunk of the R&D efforts of the CMS experiment have been focused on optimizing the computing and storage resource utilization for the data analysis, and Run3 could provide a perfect benchmark to make studies on new solutions in a realistic scenario. The work that will be shown is focused on the integration and validation phase of an interactive environment for data analysis with the peculiarity of providing a seamless scaling over grid resources at Italian T2s, and possibly opportunistic providers such as HPC. In this approach the integration of new resources has been proved to be exceptionally easy in terms of requirements, thus computing power can be included dynamically in a very effective way. The presentation will firstly focus on an overview of the architectural pillars and the integration challenges. Then the results of a first set of performance measurements will be presented, thanks to a first real user CMS analysis built on top of Root RDataFrame ecosystem that has been successfully executed over such an infrastructure.

        Speaker: Diego Ciangottini (INFN, Perugia (IT))
      • 11
        A FPGA Implementation of the Hough Transform tracking algorithm for the Phase-II upgrade of ATLAS

        The High Energy Physics world will face challenging trigger requests in the next decade. In particular the luminosity increase to 5-7.5 x 1034 cm-2 s-1 at LHC will push the major experiments as ATLAS to exploit the online tracking for their inner detector to reach 10 kHz of events from 1 MHz of Calorimeter and Muon Spectrometer trigger. The project described here is a proposal for a tuned Hough Transform algorithm implementation on FPGA high-end technology, versatile to adapt different tracking situations. The platform developed allows to study different dataset from a software “emulating” the firmware and consequently to the hardware performance and to generate input dataset from ATLAS simulation. Xilinx FPGA have been destined to this implementation, exploiting up to now the VC709 commercial board and its PCI Express Generation 3 technology. The system provides the features to possibly process a 200 pile up event of ATLAS Run4 in the order of 10 µs averagely, with the possibility to run two events at a time. Best efficiency reached are simulated to be > 95 % for single muon tracking. The project plans to be proposed for the Event Filter TDAQ ATLAS Upgrade of Phase-II.

        Speaker: Fabrizio Alfonsi (Universita e INFN, Bologna (IT))
      • 12
        AI Data Quality Monitoring with Hydra

        Hydra is an AI system employing off-the-shelf computer vision technologies aimed at autonomously monitoring data quality. Data quality monitoring is an essential step in modern experimentation and Nuclear Physics is no exception. Certain failures can be identified through alarms (e.g. electrical heartbeats) while others are more subtle and often require expert knowledge to identify and diagnose. In the GlueX experiment at Jefferson Laboratory data quality monitoring is a multistep, human in the loop process that begins with shift crews looking at a litany of plots (e.g. occupancy plots) which indicate the performance of detector subsystems. With the sheer complexity of the systems and number of plots needing to be monitored subtle issues can be, and are, missed. During its time in production (over 2 years) Hydra has lightened the load of shift takers of GlueX by autonomously monitoring detector systems. This talk will describe the construction, training, and operation of the Hydra system in GlueX as well as the ongoing work to develop and deploy the system with other experiments at Jefferson Laboratory and beyond.

        Speaker: Thomas Britton
      • 13
        Applications of supercomputer Tianhe-II in BESIII

        High energy physics experiments are pushing forward the precision measurements and searching for new physics beyond standard model. It is urgent to simulate and generate mass data to meet requirements from physics. It is one of the most popular areas to make good use of existing power of supercomputers for high energy physics computing. Taking the BESIII experiment as an illustration, we deploy the offline software BOSS into the top-tier supercomputer "Tianhe-II" with the help of Singularity. With very limited internet connection bandwidth and without root privilege, we synchronize and maintain the simulation software up to date through CVMFS successfully, and an acceleration rate in a comparison of HPC and HTC is realized for the same large-scale task. There are two creative ideas to be shared in the community: on one hand, common users constantly meet problems in the real-time internet connection and the conflict of loading locker. We solve these two problems by deployment a squid server and using fuse in memory in each computing node. On the other hand, we provide a MPI python interface for high throughput parallel computation in TianheII. Meanwhile, the program to deal with data output is also specially aligned so that there is no queue issue in the I/O task. The acceleration rate in simulation reaches 80% so far, as we have done the simulation tests up to 15 K processes in parallel.

        Speaker: Biying Hu (Sun Yat-sen University)
      • 14
        AtlFast3: Fast Simulation in ATLAS for Run 3 and beyond

        AtlFast3 is the next generation of high precision fast simulation in ATLAS that is being deployed by the collaboration and was successfully used for the simulation of 7 billion events in Run 2 data taking conditions. AtlFast3 combines a parametrization-based approach known as FastCaloSimV2 and a machine-learning based tool that exploits Generative Adversarial Networks (FastCaloGAN) for the simulation of hadrons.

        For the purpose of Run 3, the parametrization of AtlFast3 was fully reworked and many active developments are ongoing to further enhance the quality of fast simulation in ATLAS. This talk will give a brief overview of AtlFast3 with focus on FastCaloSimV2 and outline several improvements with respect to the previous simulator tool AFII. Furthermore, recent advancements in the parametrised simulation, such as the development of a dedicated tune of electromagnetic shower shapes to data are presented.

        Speaker: Rui Zhang (University of Wisconsin Madison (US))
      • 15
        CMS Tracker Alignment: Legacy results from LHC Run 2 and first results from Run 3

        The inner tracking system of the CMS experiment, consisting of the silicon pixel and strip detectors, is designed to provide a precise measurement of the momentum of charged particles and to perform the primary and secondary vertex reconstruction. The movements of the individual substructures of the tracker detectors are driven by the change in the operating conditions during data taking. Frequent updates in the detector geometry are therefore needed to describe accurately the position, orientation, and curvature of the tracker modules.

        The procedure in which new parameters of the tracker geometry are determined is referred to as the alignment of the tracker. The latter is performed regularly during data taking using reconstructed tracks from both collisions and cosmic rays data, and it is further refined after the end of data-taking. The tracker alignment performance corresponding to the ultimate accuracy of the alignment calibration for the legacy reprocessing of the CMS Run 2 data will be presented. The data-driven methods used to derive the alignment parameters and the set of validations that monitor the performance of the physics observables will be reviewed. The first results obtained with the data taken during the year 2021 and the most recent set of results from LHC Run 3 will be presented.

        Speaker: Antonio Vagnerini (Università di Torino)
      • 16
        CMS tracking performance in Run 2 and early Run 3 data using the tag-and-probe technique

        Accurate reconstruction of charged particle trajectories and measurement of their parameters (tracking) is one of the major challenges of the CMS experiment. A precise and efficient tracking is one of the critical components of the CMS physics program as it impacts the ability to reconstruct the physics objects needed to understand proton-proton collisions at the LHC. In this work, we present the tracking performance measured in data where the tag and-probe technique was applied to $Z\longrightarrow \mu^{+}\mu^{-}$ di-muon resonances for all reconstructed muon trajectories and the subset of trajectories in which the CMS Tracker is used to seed the measurement. The performance is assessed using LHC Run 2 at $\sqrt{s}$ = 13 TeV and early LHC Run 3 data at $\sqrt{s}$ = 13.6 TeV.

        Speakers: Brunella D'Anzi (Universita e INFN, Bari (IT)), CMS Collaboration
      • 17
        Commissioning CMS online reconstruction with GPUs

        Building on top of the multithreading functionality that was introduced in Run-2, the CMS software framework (CMSSW) has been extended in Run-3 to offload part of the physics reconstruction to NVIDIA GPUs. The first application of this new feature is the High Level Trigger (HLT): the new computing farm installed at the beginning of Run-3 is composed of 200 nodes, and for the first time each one is equipped with two AMD Milan CPUs and two NVIDIA T4 GPUs. In order to guarantee that the HLT can run on machines without any GPU accelerators - for example as part of the large scale Monte Carlo production running on the grid - the HLT reconstruction has been implemented both for NVIDIA GPUs and for traditional CPUs.

        CMS has undertaken a comprehensive validation and commissioning activity to ensure the successful operations of the new HLT farm and the reproducibility of the physics results while using either of the two implementations: some have taken place offline, on dedicated Tier-2 centres equipped with NVIDIA GPUs; other activities ran online during the LHC commissioning period, after installing GPUs on few of the nodes from the Run-2 HLT farm. The final steps were the optimisation of the HLT configuration, after the installation of the new HLT farm.

        This contribution will describe the steps taken to validate the GPU-based reconstruction and commission the new HLT farm, leading to the successful data taking activities after the LHC Run-3 start up.

        Speakers: CMS collaboration, Marc Huwiler (University of Zurich (CH))
      • 18
        Custom event sample augmentations for ATLAS analysis data

        High Energy Physics (HEP) has been using column-wise data stored in synchronized containers, such as most prominently ROOT’s TTree, for decades. These containers have proven to be very powerful as they combine row-wise association capabilities needed by most HEP event processing frameworks (e.g. Athena) with column-wise storage, which typically results in better compression and more efficient support for many analysis use-cases. The downside, however, is that all events (rows) need to contain the same attributes and therefore extending the list of items to be stored, even if needed only for a subsample of events, can be costly in storage and lead to data duplication.
        The ATLAS experiment has developed navigational infrastructure to allow storing custom data extensions for subsample of events in separate, but synchronized containers. These extensions can easily be added to ATLAS standard data products (such as DAOD-PHYS or PHYSLITE) avoiding duplication of those core data products, while limiting their size increase. As a proof of principle, a prototype based on the Long Lived Particle search is implemented. Preliminary results concerning the event-size as well as reading/writing performance implications associated with this prototype will be presented.
        Augmented data as described above are stored within the same file as the core data. Storing them in dedicated files will be investigated in future, as this could provide more flexibility to store augmentations separate from core data, e.g. certain sites may only want a subset of several augmentations or augmentations can be archived to disk once their analysis is complete.

        Speaker: Lukas Alexander Heinrich (CERN)
      • 19
        Data Calibration and Processing at Belle II

        The Belle II experiment has been collecting data since 2019 at the second generation e+/e- B-factory SuperKEKB in Tsukuba, Japan. The goal of the experiment is to explore new physics via high precision measurement in flavor physics. This is achieved by collecting a large amount of data that needs to be calibrated promptly for fast reconstruction and recalibrated thoroughly for the final reprocessing. To fully automate the calibration process a Python plugin package, b2cal, had been developed based on the open-source Apache Airflow package using Directed Acyclic Graphs (DAGs) to describe the ordering of processes and Flask to provide administration and job submission web pages. Prompt processing and reprocessing are performed at different calibration centers (BNL and DESY, respectively). After calibration, the raw data are reconstructed on the GRID to an analysis-oriented format (mDST), also stored on the GRID, and delivered to the collaborations. This talk will describe the whole procedure, from raw data calibration to mDST production.

        Speaker: Stefano Lacaprara (INFN sezione di Padova)
      • 20
        Design and implementation of computational storage system based on EOS for HEP data processing

        Computing in high energy physics is one kind of typical data-intensive applications, especially some data analysis , which require access to a large amount of data. The traditional computing system adopts the "computing-storage" separation mode, which leads to large data volume move during the computing process, and and also increase transmission delay and network load. Therefore, it can effectively alleviate this situation by pushing down some data-intensive tasks from computing node to storage node. The philosophy is that bringing computing as close to the source of data as possible in order to reduce latency and bandwidth use. Generally, storage nodes have computing resources like CPUs, necessary for deploying distributed file system. However, the computing power in storage node is often ignored. This paper designed and implemented a computational storage system based on CERN Open Storage (EOS). The system presents transparently the computational storage functions through standard POSIX file system interface, such as open, read and write. A plugin implemented in EOS storage node (FST) will execute the specified algorithm or program when it finds the special arguments in filename, for example "&CSS=decode". The plugin can read and write file locally in FST, then register new-generated file into EOS name node (MGM). The paper finally give some test results showing that the computational storage mode performs faster and supports more parallel computing tasks than the traditional mode in some applications like raw data decode for LHAASO experiment. Computational storage mode reduces computation time by 37% in single task execution and 72% in the case of 40 tasks in parallel compared with traditional mode.

        Speakers: Xiaoyu Liu (Central China Normal University CCNU (CN)), Xiaoyu Liu (Institute of High Energy Physics, CAS)
      • 21
        Enabling continuous speedup of CMS Event Reconstruction through continuous benchmarking

        The outstanding performances obtained by the CMS experiment during Run1 and Run2 represent a great achievement of seamless hardware and software integration. Among the different software parts, the CMS offline reconstruction software is essential for translating the data acquired by the detectors into concrete objects that can be easily handled by the analyzers. The CMS offline reconstruction software needs to be reliable and fast. The long shutdown 2 (LS2) elapsed between LHC Run2 and Run3 has been instrumental in the optimization of the CMS offline reconstruction software and for the introduction of new algorithms reaching a continuous CPU speedup. In order to reach these goals, a continuous benchmarking pipeline has been implemented; CPU timing and memory profiling, using the igprof tool, are performed on a regular basis to monitor the footprint of the new developments and identify the possible areas of performance improvement. The current status and achievement obtained by a continuous benchmarking of CMS experiment offline reconstruction software are described here.

        Speaker: Claudio Caputo (Universite Catholique de Louvain (UCL) (BE))
      • 22
        Evolution of the CMS Submission Infrastructure to support heterogeneous resources in the LHC Run 3

        The landscape of computing power available for the CMS experiment is rapidly evolving, from a scenario dominated by x86 processors deployed at WLCG sites, towards a more diverse mixture of Grid, HPC, and Cloud facilities incorporating a higher fraction of non-CPU components, such as GPUs. Using these facilities’ heterogeneous resources efficiently to process the vast amounts of data to be collected in the LHC Run3 and beyond, in the HL-LHC era, is key to CMS’s achieving its scientific goals.

        The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. The Submission Infrastructure, together with other elements in the CMS workload management, has been modified in its strategies and enlarged in its scope to make use of these new resources.

        In this evolution, key questions such as the optimal level of granularity in the description of the resources, or how to prioritize workflows in this new resource mix must be taken into consideration. In addition, access to many of these resources is considered opportunistic by CMS, thus each resource provider may also play a key role in defining particular allocation policies, diverse from the up-to-now dominant system of pledges. All these matters must be addressed in order to ensure the efficient allocation of resources and matchmaking to tasks to maximize their use by CMS.

        This contribution will describe the evolution of the CMS Submission Infrastructure towards a full integration and support of heterogeneous resources according to CMS needs. In addition, a study of the pool of GPUs already available to CMS Offline Computing will be presented, including a survey of their diversity in relation to CMS workloads, and the scalability reach of the infrastructure to support them.

        Speaker: Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas)
      • 23
        Fast track seed selection for track following in the Inner Detector Trigger track reconstruction

        During ATLAS Run 2, in the online track reconstruction algorithm of the Inner Detector (ID), a large proportion of the CPU time was dedicated to the fast track finding. With the proposed HL-LHC upgrade, where the event pile-up is predicted to reach <μ>=200, track finding will see a further large increase in CPU usage. Moreover, only a small subset of Pixel-only seeds is accepted after the fast track finding procedure, essentially discarding the CPU time used on rejected seeds. Therefore, a computationally cheap track candidate seed pre-selection procedure based on approximate track following was designed, which is described in this report. The algorithm uses a parabolic track approximation in the plane perpendicular to the beamline, a combinatorial Kalman filter simplified by a reference-related coordinate system to find the best track candidates. For such candidates, a set of numerical features are created to classify seeds using machine learning techniques, such as Support Vector Machines (SVM) or kernel-based methods. The algorithm was tuned for high identification and rejection of bad seeds, while ensuring no significant loss of track finding efficiency. Current studies focus on implementing the algorithm into the Athena framework for online seed pre-selection, which could be used during Run 3 or potentially be adapted for the ITk geometry for Run 4 of the HL-LHC.

        Speaker: Andrius Vaitkus (University of London (GB))
      • 24
        Faster simulated track reconstruction in the ATLAS Fast Chain

        The production of simulated datasets for use by physics analyses consumes a large fraction of ATLAS computing resources, a problem that will only get worse as increases in the instantaneous luminosity provided by the LHC lead to more collisions per bunch crossing (pile-up). One of the more resource-intensive steps in the Monte Carlo production is reconstructing the tracks in the ATLAS Inner Detector (ID), which takes up about 60% of the total detector reconstruction time [1]. This talk discusses a novel technique called track overlay, which substantially speeds up the ID reconstruction. In track overlay the pile-up ID tracks are reconstructed ahead of time and overlaid onto the ID tracks from the simulated hard-scatter event. We present our implementation of this track overlay approach as part of the ATLAS Fast Chain simulation, as well as a method for deciding in which cases it is possible to use track overlay in the reconstruction of simulated data without performance degradation.

        [1] ATL-PHYS-PUB-2021-012 (60% refers to Run3, mu=50, including large-radius tracking, p11)

        Speaker: William Axel Leight (University of Massachusetts Amherst)
      • 25
        HDTFS:Cost-effective Hadoop Distributed & Tiered File System for High Energy Physics

        With the scale and complexity of High Energy Physics(HEP) experiments increase, researchers are facing the challenge of large-scale data processing. In terms of storage, HDFS, a distributed file system that supports the "data-centric" processing model, has been widely used in academia and industry. This file system can support Spark and other distributed data localization calculations, researching the application of Hadoop Distributed File System(HDFS) in the field of HEP is the basis for ensuring the application of upper-layer computing in this field. However, HDFS expand the cluster capacity by adding cluster nodes, this way cannot meet the high cost-effective system requirements for the persistence and backup process of massive HEP experimental data. In response to the above problems, researching Hadoop Distributed & Tiered File System(HDTFS) that supports disk-tape storage, taking full advantage of the fast disk access speed and the advantages of large tape storage capacity, low price, and long storage period, to solve the high cost of horizontal expansion of HDFS clusters. The system provides users with a single global namespace, and avoids dependence on external metadata servers to access the data stored on tape. In addition, tape layer resources are managed internally so that users do not have to deal with complex tape storage. The experimental results show that this method can effectively solve the massive data storage of HEP Hadoop cluster.

        Speaker: Xiaoyu Liu (IHEP)
      • 26
        Improved Selective Background Monte Carlo Simulation at Belle II with Graph Attention Networks and Weighted Events

        When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and invested statistical methods including sampling and reweighting to deal with biases introduced by the filtering.

        Speaker: Boyang Yu
      • 27
        Machine Learning Techniques for selecting Forward Electrons $(2.5<\eta<3.2)$ with the ATLAS High Level Trigger

        The ATLAS detector at CERN measures proton proton collisions at the Large Hadron Collider (LHC) which allows us to test the limits of the Standard Model (SM) of particles physics. Forward moving electrons produced at these collisions are promising candidates for finding physics beyond the SM. However, the ATLAS detector is not construed to measure forward leptons with pseudorapidity $\eta$ of more than 2.5 with high precision. The ATLAS performance for forward leptons can be improved by enhancing the trigger system. This system selects events of interest in order to not overwhelm the data storage with the information of around 1.7 billion collisions per second. First studies using the Neural Ringer algorithm for selecting forward electrons with $2.5<\eta<3.2$ show promising results. The Neural Ringer using machine learning to analyse detector information to distinguish electromagnetic from hadronic signatures, is being presented. Additionally, its performance on simulated ATLAS Monte Carlo samples in improving the high level trigger for forward electrons will be shown.

        Speaker: Meinrad Moritz Schefer (Universitaet Bern (CH))
      • 28
        Monitoring CMS experiment data and infrastructure for next generation of LHC run

        As CMS starts the Run 3 data taking, the experiment’s data management software tools along with the monitoring infrastructure have undergone significant upgrades to cope up with the conditions expected in the coming years. The challenges of an efficient, real-time monitoring for the performance of the computing infrastructure or for data distribution are being met using state-of-the-art technologies that are continuously evolving. In this talk, we describe how we set up monitoring pipelines based on a combination of technologies, such as Kubernetes, Spark/Hadoop and other open-source software stacks. We show how the choice of these components is critical for this new generation of services and infrastructure for CMS data management and monitoring. We also discuss how some of the developed monitoring services such as data management monitoring, CPU efficiency monitoring, data-set access and transfers metrics, have been instrumental for taking strategic decisions and increasing the physics harvest through maximal utilization of computing resources available to us.

        Speaker: Ceyhun Uzunoglu (CERN)
      • 29
        Parametrized simulation of the micro-RWELL response with PARSIFAL software

        PARSIFAL (PARametrized SImulation) is a software tool originally implemented to reproduce the complete response of a triple-GEM detector to the passage of a charged particle, taking into account the involved physical processes by their simple parametrization and thus in a very fast way.
        Robust and reliable software, such as GARFIELD++, is widely used to simulate the transport of electrons and ions in the gas and all their interactions step by step, but it is CPU-time consuming. The implementation of PARSIFAL code was driven by the need to reduce the processing time, while maintaining the precision of a full simulation.
        The software must be initialized with some parameters that can be extracted from the GARFIELD++ simulation, which must be run once-and-for-all. Then it can be run independently to provide a reliable simulation, from the ionization, to diffusion, multiplication, signal induction and electronics, only by sampling from a set of functions which describe the physical effects and depend on the input parameters.
        The code has been thoroughly tested on triple-GEM detectors and the simulation was finely tuned to experimental data collected at testbeam.
        Recently, PARSIFAL has been extended to another detector in the MPGD family, the micro-RWELL, thanks to the modular structure of the code. The main difference in the treatment of the physical processes is the introduction of the resistive plane and its effect on the formation of the signal. For this purpose, the charge spread on the resistive layer has been described following the work of M. S. Dixit and A. Rankin (NIM A518 (2004) 721-727, NIM A566 (2006) 281-285) and the electronics readout (APV-25) was added to the description.
        A fine tuning of the simulation is ongoing to reproduce the experimental data collected during testbeams. A similar strategy already validated for the triple-GEM case is used: the variables of interest for the comparison of the experimental data with simulated results are the cluster charge, cluster size and the position resolution obtained by charge centroid and micro-TPC reconstruction algorithms. In this case, special attention must be paid to the tuning of the resistivity of the resistive layer.
        An illustration of the general code, setting the focus on this latest implementation and the first comparison with experimental data from testbeam are the subject of this contribution.

        Speaker: Lia Lavezzi (Universita e INFN Torino (IT))
      • 30
        Progress towards an improved particle flow algorithm at CMS with machine learning

        The particle-flow (PF) algorithm is of central importance to event reconstruction at the CMS detector, and has been a focus of developments in light of planned Phase-2 running conditions with an increased pileup and detector granularity. Current rule-based implementations rely on extrapolating tracks to the calorimeters, correlating them with calorimeter clusters, subtracting charged energy and creating neutral particles from significant energy deposits. Such rule-based algorithms can be difficult to extend and may be computationally inefficient under high detector occupancy, while also being challenging to port to heterogeneous architectures in full detail.

        In recent years, end-to-end machine learning approaches for event reconstruction have been proposed, including for PF at CMS, with the possible advantage of directly optimising for the physical quantities of interest, being highly reconfigurable to new conditions, while also being a natural fit for deployment on heterogeneous accelerators.

        One of the proposed approaches for machine-learned particle-flow (MLPF) reconstruction relies on graph neural networks to infer the full particle content of an event from the tracks and calorimeter clusters based on a training on simulated samples, and has been recently implemented in CMS as a possible future reconstruction R&D direction to fully map out the characteristics of such an approach in a realistic setting.

        We discuss progress in CMS towards an improved implementation of the MLPF reconstruction, now optimised on generator-level particle information for the first time to our knowledge, thus paving the way to potentially improving the detector response in terms of physical quantities of interest. We show detailed physics validation with respect to the current PF algorithm in terms of high-level physical quantities such as jet and MET resolution. Furthermore, we discuss progress towards deploying the MLPF algorithm in the CMS software framework on heterogeneous platforms, performing large-scale hyperparameter optimization using HPC systems, as well as the possibilities of making use of explainable artificial intelligence (XAI) to interpret the output.

        Speaker: Farouk Mokhtar (Univ. of California San Diego (US))
      • 31
        Secrets Management for CMSWEB

        Secrets Management is a process where we manage secrets, like certificates, database credentials, tokens, and API keys in a secure and centralized way. In the present CMSWEB (the portfolio of CMS internal IT services) infrastructure, only the operators maintain all services and cluster secrets in a secure place. However, if all relevant persons with secrets are away, then we are left with no choice but to contact them to get secrets in case of emergency needs.

        In order to overcome this issue, we performed an R&D study for the management of secrets and explored various strategies such as Hashicorp Vault, Github credential manager, and SOPS/age. In this talk, we’ll discuss the process by which CMS investigated these strategies and perform a feasibility analysis of them. We will also underline why CMS chose SOPS as a solution, reviewing how the features of SOPS with age satisfy our needs. We will also discuss how other experiments could adopt our solution.

        Speaker: Muhammad Imran (National Centre for Physics (PK))
      • 32
        Stability of the CMS Submission Infrastructure for the LHC Run 3

        The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. CMS Tier-0 tasks, such as data repacking and prompt reconstruction, critical for data-taking operations, are executed on a collection of computing resources at CERN, also managed by the CMS Submission Infrastructure.

        All this computing power is harnessed via a number of federated resource pools, supervised by HTCondor and GlideinWMS services. Elements such as pilot factories, job schedulers and connection brokers are deployed in HA mode across several “availability zones”, providing stability to our services via hardware redundancy and numerous failover mechanisms.

        Given the upcoming start of the LHC Run 3, the Submission Infrastructure stability has been recently tested in a series of controlled exercises, performed without interruption of our services. These tests have demonstrated the resilience of our systems, and additionally provided useful information in order to further refine our monitoring and alarming system.

        This contribution will describe the main elements in the CMS Submission Infrastructure design and deployment, along with the performed failover exercises, proving that our systems are ready to serve their critical role in support of CMS activities.

        Speaker: Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas)
      • 33
        The adaptation of a deep learning model to locating primary vertices in the ATLAS experiment

        Over the past several years, a deep learning model based on convolutional neural networks has been developed to find proton-proton collision points (also known as primary vertices, or PVs) in Run 3 LHCb data. By converting the three-dimensional space of particle hits and tracks into a one-dimensional kernel density estimator (KDE) along the direction of the beamline and using the KDE as an input feature into a neural network, the model has achieved an efficiency of 98% with a low false positive rate. The success of this method motivates its extension to other experiments, including ATLAS. Although LHCb is a forward spectrometer and ATLAS is a central detector, ATLAS has the necessary characteristics to compute KDEs analogous to the LHCb detector. While the ATLAS detector will benefit from higher precision, the expected number of visible PVs per event will be approximately 10 times that for LHCb, resulting in only slightly altered KDEs. The KDE and a few related input features are fed into the same neural network architectures used to achieve the results for LHCb. We present the development of the input feature and initial results across different network architectures. The results serve as a proof-of-principle that a deep neural network can achieve high efficiency and low false positive rates for finding vertices in ATLAS data.

        Speaker: Elliott Kauffman (Duke University (US))
      • 34
        Transparent expansion of a WLCG compute site using HPC resources

        Restarting the LHC again after more than 3 years of shutdown, unprecedented amounts of data are expected to be recorded. Even with the WLCG providing a tremendous amount of compute resources to process this data, local resources will have to be used for additional compute power. This, however, makes the landscape in which computing takes place more heterogeneous.

        In this contribution, we present a solution for dynamically integrating non-HEP resources into existing infrastructures using the COBalD/TARDIS resource manager. By providing all resources through conventional CEs as single point-of-entry, the use of these external resources becomes completely transparent for experiments and users.

        In addition, experiences with an existing setup, operated in production since more than a year, extending the German Tier 2 WLCG site operated at RWTH Aachen University with a local HPC cluster will be discussed.

        Speaker: Ralf Florian Von Cube (KIT - Karlsruhe Institute of Technology (DE))
      • 35
        Transparent extension of INFN-T1 with heterogeneous computing architectures

        The INFN-CNAF Tier-1 is engaged for years in a continuous effort to integrate its computing centre with more tipologies of computing resources. In particular, the challenge of providing opportunistic access to nonstandard CPU architectures, such as PowerPC or hardware accelerators (GPUs) has been actively exploited. In this work, we describe a solution to transparently integrate access to ppc64 CPUs as also GPUs. This solution has been tested to transparently extend the INFN-T1 Grid computing centre with Power9 based machines and V100 GPUs from the Marconi 100 HPC cluster managed by CINECA. We also discuss further possible improvements and how this will meet requirements and future plans for the new tecnopolo centre, where the CNAF Tier-1 will be hosted soon.

        Speaker: Stefano Dal Pra (Universita e INFN, Bologna (IT))
    • Plenary: II Sala Europa (Villa Romanazzi Carducci)

      Sala Europa

      Villa Romanazzi Carducci

      Conveners: Dr Jerome LAURET (Brookhaven National Laboratory), Dr Maria Girone (CERN)
      • 36
        Generative Models for 
Fast (Calorimeter) Simulation

        Simulation in High Energy Physics (HEP) places a heavy burden on the available computing resources and is expected to become a major bottleneck for the upcoming high luminosity phase of the LHC and for future Higgs factories, motivating a concerted effort to develop computationally efficient solutions. Methods based on generative machine learning methods hold promise to alleviate the computational strain produced by simulation while providing the physical accuracy required of a surrogate simulator.

        In this contribution, an overview of a growing body of work focused on simulating showers in highly granular calorimeters will be reported, which is making significant steps towards realistic fast simulation tools based on deep generative models. Progress on the simulation of both electromagnetic and hadronic showers will be presented, with a focus on the high degree of physical fidelity and computational performance achieved. Additional steps taken to address the challenges faced when broadening the scope of these simulators, such as those posed by multi-parameter conditioning, will also be discussed.

        Speaker: Sascha Daniel Diefenbacher (Hamburg University (DE))
      • 37
        Scientific Software and Computing in the HL-LHC, EIC, and Future Collider Era

        A bright future awaits particle physics. The LHC Run 3 just started, characterised by the most energetic beams ever created by humankind and the most sophisticated detectors. In the next few years we will accomplish the most precise measurements to challenge our present understanding of nature that will, potentially, lead us to prestigious discoveries. However, Run 3 is just the beginning. A rich programme is ahead of us at the HL-LHC, the EIC, and at future colliders, like the FCC. These programs imply a large effort and substantial funding, for example to develop future detector and accelerator technologies, to construct new experiments and facilities, or expanding the scope of the existing ones. This contribution is about the software and computing that will lead us to the full exploitation of such infrastructure, the software and computing that will empower us to make important strides in humanity's understanding of the universe. The HL-LHC, EIC and FCC eras will be taken in consideration in this contribution. We will discuss the role of education, innovation and technology in our preparation for the future. We will also review the current state of the art, discuss ongoing technology evolutions, for instance in hardware and programming languages, and extrapolate most relevant trends into the next decades. Moreover, we'll identify the areas where our efforts could be focussed to boost the progression of particle physics software and computing, as well as the steps we can take to take advantage of veritable revolutions.

        Speaker: Danilo Piparo (CERN)
      • 38
        Towards extreme-scale agent-based simulation with BioDynaMo

        Agent-based modeling is a versatile methodology to model complex systems and gain insights into fields as diverse as biology, sociology, economics, finance, and more. However, existing simulation platforms do not always take full advantage of modern hardware and therefore limit the size and complexity of the models that can be simulated.
        This talk presents the BioDynaMo platform designed to alleviate these issues, enable large-scale agent-based simulations, and reduce time-to-insight. We will examine BioDynaMo's modular software design and underlying performance optimizations that enable simulations with billions of agents in various research fields.

        Speaker: Lukas Breitwieser (CERN, ETH Zurich)
    • 1:00 PM
      Lunch break Sala Scuderia (Villa Romanazzi)

      Sala Scuderia

      Villa Romanazzi

    • Track 1: Computing Technology for Physics Research Sala Federico II (Villa Romanazzi)

      Sala Federico II

      Villa Romanazzi

      Conveners: Baidyanath Kundu (Princeton University (US)), Diego Ciangottini (INFN, Perugia (IT))
      • 39
        Optimizing the ATLAS Geant4 detector simulation software

        The ATLAS experiment at the LHC relies critically on simulated event samples produced by the full Geant4 detector simulation software (FullSim). FullSim was the major CPU consumer during the last data-taking year in 2018 and it is expected to be still significant in the HL-LHC era [1, 2]. In September 2020 ATLAS formed a Geant4 Optimization Task Force to optimize the computational performance of FullSim for the Run 3 Monte Carlo campaign. This contribution summarizes the already implemented and upcoming improvements. These include improved features from the core Geant4 software, optimal options in the simulation configuration, simplifications in geometry and magnetic field description and technical improvements in the way ATLAS simulation code interfaces with Geant4. Overall, more than 50% higher throughput is achieved, compared to the baseline simulation configuration used during Run 2.

        [1]: ATLAS Collaboration, “ATLAS HL-LHC Computing Conceptual Design Report”, CERN-LHCC-2020-015.
        [2]: ATLAS Collaboration, “ATLAS Software and Computing HL-LHC Roadmap”, CERN-LHCC-2022-005.

        Speaker: Evangelos Kourlitis (Argonne National Laboratory (US))
      • 40
        The Software Quality Assurance programme of the ASTRI Mini-Array project

        The ASTRI Mini-Array is a gamma-ray experiment led by Istituto Nazionale di Astrofisica with the partnership of the Instituto de Astrofisica de Canarias, Fundacion Galileo Galilei, Universidade de Sao Paulo (Brazil) and North-West University (South Africa). The ASTRI Mini-Array will consist of nine innovative Imaging Atmospheric Cherenkov Telescopes that are being installed at the Teide Astronomical Observatory (~2400 m a.s.l.) in Tenerife (Canary Islands, Spain). The ASTRI Mini-Array software will cover the entire life cycle of the experiment, including scheduling, operations and data dissemination. The on-site control software will allow the operator to communicate remotely to the array (including automated reaction to critical environmental conditions). Due to the high-speed (10 Gbit/s) networking connection available between Canary Islands and Italy, all data will be delivered every night to the ASTRI dedicated Data Center in Rome for their processing and dissemination. The ASTRI team made experience with ASTRI-Horn, the first Italian dual-mirror Cherenkov telescope, prototype of the ASTRI Mini-Array telescopes. Exploiting lessons learned from ASTRI-Horn, we decided to adopt an iterative incremental model for the software in order to provide more software releases according to the project schedule. Due to this software peculiarity, we have implemented a Quality Assurance (QA) programme specific for the software, which defines the strategy and the organization for the management of the quality control. In this contribution we present the layout and the contents of the ASTRI Mini-Array QA software programme, describing the organization adopted for its management and reporting some examples of how it has been applied so far.

        Speaker: Vito Conforti
      • 41
        Next generation task scheduler for ATLAS software framework

        Experiments at the CERN High-Luminosity Large Hadron Collider (HL-LHC) will produce hundreds of Petabytes of data per year. Efficient processing of this dataset represents a significant human resource and technical challenge. Today, ATLAS data processing applications run in multi-threaded mode, using Intel TBB for thread management, which allows efficient utilization of all available CPU cores on the computing resources. However, modern HPC systems and high-end computing clusters are increasingly based on heterogeneous architectures, usually a combination of CPU and accelerators (e.g., GPU, FPGA). To run ATLAS software on these machines efficiently, we started developing a distributed, fine-grained, vertically integrated task scheduling software system. A first simplified implementation of such a system called Raythena was developed in late 2019. It is based on Ray - a high-performance distributed execution platform developed by Riselab at UC Berkeley. Raythena leverages the ATLAS event-service architecture for efficient utilization of CPU resources on HPC systems by dynamically assigning fine-grained workloads (individual events or event ranges) to ATLAS data-processing applications running simultaneously on multiple HPC compute nodes.

        The main purpose of the Raythena project was to gain the experience of developing real-life applications with the Ray platform. However, in order to achieve our main objective, we need to design a new system capable of utilizing heterogeneous computing resources in a distributed environment. To accomplish this, we have started to evaluate HPX as an alternative to TBB/Ray. HPX is a C++ library for concurrency and parallelism developed by the Stellar group, which exposes a uniform, standards-oriented API for programming parallel, distributed, and heterogeneous applications.

        This presentation will describe the preliminary results of the evaluation of HPX for implementation of the task scheduler for ATLAS data-processing applications aimed to enable cross-node scheduling in heterogeneous systems that offer a mixture of CPU and GPU architectures. We present the prototype applications implemented using HPX and the preliminary results of performance studies of these applications.

        Speaker: Beojan Stanislaus (Lawrence Berkeley National Lab. (US))
      • 42
        GPU acceleration of Monte Carlo simulations: particle physics methods applied to medicine

        GPU acceleration has been successfully utilised in particle physics for real time analysis and simulation, in this study, we investigate the potential benefits for medical physics applications by analysing performance, development effort, and availability. We selected a software developer with no high performance computing experience to parallelise and accelerate a stand-alone Monte Carlo simulation consisting of electron single coulomb scattering. Such simulations contribute to real-time dose estimation for real-time adaptive radiotherapy, a new and emerging cancer treatment that heavily relies on high performance computing. As a proof of principle, we implement a single scattering process of electrons in a homogeneous material with pencil beam at constant initial energy. We compared performance gain offered by GPU acceleration against an optimised CPU implementation and evaluated it by computing 100M histories of a 128 keV electron interacting in water. We also evaluated 1B histories to measure the scalability. The results show that when comparing the multi-core CPU implementation running with 24 cores, a speedup of 808x (100M) and 1727x (1B), which corresponds to a 320x and 648x cost-equivalent speedup. The results on both architectures were statistically equivalent.The successful implementation and measured acceleration combined with the low level of expertise needed for obtaining such speedup is a promising first step for the use of GPU acceleration in a context such as real-time adaptive radiotherapy where there are strict performance and time requirements.

        Speaker: Marco Barbone
      • 43
        The LHCb simulation software: Gauss and its Gaussino core framework

        The LHCb experiment underwent a major upgrade for data taking with higher luminosity in Run 3 of the LHC. New software that exploits modern technologies in the underlying LHCb core software framework, is part of this upgrade. The LHCb simulation framework, Gauss, is adapted accordingly to cope with the increase in the amount of simulated data required for Run 3 analyses. An additional constraint rises from the fact that Gauss also relies on external simulation libraries.
        The new version of Gauss, based on a newly-developed, experiment-agnostic core framework where the generic simulation components have been encapsulated, is called Gaussino. This simulation framework allows easier prototyping and testing of new technologies where only the core elements are affected. Gaussino provides a plug&play mechanism for modelling collisions and interfacing generators like Pythia and EvtGen. It relies on Gaudi for general functionalities and the Geant4 toolkit for particle transport, combining their specific multi-threaded approaches. A fast simulation interface to replace the Geant4 physics processes with a palette of fast simulation models for a given sub-detector, including new deep learning based options, is the most recent addition. Geometry layouts can be provided through DD4Hep or experiment-specific software. A new, built-in mechanism to define simple volumes at configuration time can ease the development cycle.
        In this contribution, will describe the structure and functionality of Gaussino, as well as its more recent developments and performance. We will also show how the new version of Gauss exploits the Gaussino infrastructure to match the requirements of the simulation(s) of the LHCb experiment.

        Speaker: Gloria Corti (CERN)
    • Track 2: Data Analysis - Algorithms and Tools Sala Europa (Villa Romanazzi)

      Sala Europa

      Villa Romanazzi

      Conveners: Adriano Di Florio (Politecnico e INFN, Bari), Sophie Berkman
      • 44
        Long Short-Term Memory Networks and Bayesian Inference for Time-evolving Systems: an Industrial Case

        Since the last decade, the so-called Fourth Industrial Revolution is
        ongoing. It is a profound transformation in industry, where new tech-
        nologies such as smart automation, large-scale machine-to-machine com-
        munication, and the internet of things are largely changing traditional
        manufacturing and industrial practices. The analysis of the huge amount
        of data, collected in all modern industrial plants, not only has greatly
        benefited from modern tools of artificial intelligence, but has also spurred
        the development of new ones. In this context, we present a new approach,
        based on the combined use of a Long Short-Term Memory (LSTM) neu-
        ral network and Bayesian inference, for the predictive maintenance of an
        industrial plant. SPE and Hotelling metrics, assessing the degree of com-
        patibility between the time-evolving industrial data and the output of the
        LSTM, trained on a reference period of good working condition, are used
        to update the Bayesian probability of a failure of the plant. This method
        has successfully been applied to a real industrial case and the results are
        presented and discussed. Finally, it is important to highlight that, although
        developed to tackle a precise industrial need, the presented approach is
        general and can be applied to a plethora of other scenarios.

        Speaker: Prof. Davide Pagano (Universita di Brescia (IT))
      • 45
        Affine Parametric Neural Networks for High-Energy Physics

        Signal-background classification is a central problem in High-Energy Physics (HEP), that plays a major role for the discovery of new fundamental particles. The recent Parametric Neural Network (pNN) is able to leverage multiple signal mass hypotheses as an additional input feature to effectively replace a whole set of individual neural classifiers, each providing (in principle) the best response for the corresponding mass hypothesis. In this work we aim at deepening the understanding of pNNs in light of real-world usage. We discovered several peculiarities of parametric networks, providing intuition, metrics, and guidelines to them. We further propose the affine parametrization scheme, resulting in a new parameterized architecture: the affine parametric neural network (AffinePNN); along with many other generally applicable improvements, like the balanced training procedure, and the background's mass distribution. Finally, we extensively and empirically evaluate our models on the HEPMASS dataset, along its imbalanced version (HEPMASS-IMB) provided by us, to further validate our approach. Presented results are in terms of the impact of the proposed design decisions, classification performance, and interpolation capability.

        Speaker: Luca Anzalone (Universita e INFN, Bologna (IT))
      • 46
        Learning full-likelihoods of LHC results with Normalizing Flows.

        The publication of full likelihood functions (LFs) of LHC results is vital for a long-lasting and profitable legacy of the LHC. Although major steps have been put forward in this direction, the systematic publication of LFs remains a big challenge in High Energy Physics (HEP) as such distributions are usually quite complex and high-dimensional. Thus, we propose to describe LFs with Normalizing Flows (NFs); a powerful class of expressive generative networks that provide density estimation by construction. In this talk, we show that NFs are able to accurately model the complex high-dimensional LFs found in HEP, in some cases even with relatively small training samples. This approach opens the possibility of compact and efficient characterisations of the LFs derived from LHC searches, SM measurements, phenomenological studies, etc.

        Speaker: Humberto Reyes-González (University of Genoa)
      • 47
        Hunting for signals using Gaussian Process regression

        We present a novel computational approach for extracting weak signals, whose exact location and width may be unknown, from complex background distributions with an arbitrary functional form. We focus on datasets that can be naturally presented as binned integer counts, demonstrating our approach on the datasets from the Large Hadron Collider. Our approach is based on Gaussian Process (GP) regression - a powerful and flexible machine learning technique that allowed us to model the background without specifying its functional form explicitly, and to separate the background and signal contributions in a robust and reproducible manner. Unlike functional fits, our GP-regression-based approach does not need to be constantly updated as more data becomes available. We discuss how to select the GP kernel type, considering trade-offs between kernel complexity and its ability to capture the features of the background distribution. We show that our GP framework can be used to detect the Higgs boson resonance in the data with more statistical significance than a polynomial fit specifically tailored to the dataset. Finally, we use Markov Chain Monte Carlo (MCMC) sampling to confirm the statistical significance of the extracted Higgs signature.

        Speaker: Abhijith Gandrakota (Fermi National Accelerator Lab. (US))
      • 48
        End-to-end multi-particle reconstruction in high occupancy imaging calorimeters with graph neural networks

        We present an end-to-end reconstruction algorithm to build particle candidates from detector hits in next-generation granular calorimeters similar to that foreseen for the high-luminosity upgrade of the CMS detector. The algorithm exploits a distance-weighted graph neural network, trained with object condensation, a graph segmentation technique. Through a single-shot approach, the reconstruction task is paired with energy regression. We describe the reconstruction performance in terms of efficiency as well as in terms of energy resolution. In addition, we show the jet reconstruction performance of our method and discuss its inference computational cost. To our knowledge, this work is the first-ever example of single-shot calorimetric reconstruction of (1000) particles in high-luminosity conditions with 200 pileup.

        Speaker: Philipp Zehetner (Ludwig Maximilians Universitat (DE))
    • Track 3: Computations in Theoretical Physics: Techniques and Methods Sala A+A1 (Villa Romanazzi)

      Sala A+A1

      Villa Romanazzi

      Conveners: Dr Barry Dillon (University of Heidelberg), Domenico Pomarico (INFN Sezione di Bari)
      • 49
        Speeding up Madgraph5_aMC@NLO through CPU vectorization and GPU offloading: towards a first alpha release

        The matrix element (ME) calculation in any Monte Carlo physics event generator is an ideal fit for implementing data parallelism with lockstep processing on GPUs and on CPU vector registers. For complex physics processes where the ME calculation is the computational bottleneck of event generation workflows, this can lead to very large overall speedups by efficiently exploiting these hardware architectures, which are now largely underutilized in HEP. In this contribution, we will present the latest status of our work on the reengineering of the Madgraph5_aMC@NLO event generator for these architectures. The new implementations of the ME calculation in vectorized C++, in CUDA and in the ALPAKA, KOKKOS and SYCL portability frameworks will be described in detail, as well as their integration into the existing MadEvent framework to keep the same overall look-and-feel of the user interface. Performance numbers will be reported both for the ME calculation alone and for the overall production workflow for unweighted event generation. First experience with an alpha release of the software supporting LHC LO processes, which is expected by the time of the ACAT2022 conference, will also be discussed.

        Speaker: Andrea Valassi (CERN)
      • 50
        Developments in Performance and Portability of BlockGen

        For more than a decade Monte Carlo (MC) event generators with the current matrix element algorithms have been used for generating hard scattering events on CPU platforms, with excellent flexibility and good efficiency.
        While the HL-LHC is approaching and precision requirements are becoming more demanding, many studies have been made to solve the bottleneck in the current MC event generator toolchains. The novel family of fast matrix element algorithms (BlockGen) shown in this report, is one of the new developments that are more suitable for GPU acceleration.
        We report the development experience of porting Blockgen using Kokkos. Moreover, we discuss the performance of the Kokkos version in comparison with the dedicated GPU version in CUDA.

        Speakers: Rui Wang (Argonne National Laboratory (US)), Taylor Childers (Argonne National Laboratory (US))
      • 51
        Performance of modern color decompositions for standard candle LHC tree amplitudes

        For more than a decade the current generation of fully automated, matrix element generators has provided hard scattering events with excellent flexibility and good efficiency.
        However, as recent studies have shown, they are a major bottleneck in the established Monte Carlo event generator toolchains. With the advent of the HL-LHC and ever rising precision requirements, future developments will need to focus on computational performance, especially at intermediate to large jet multiplicities.
        We present the novel BlockGen family of fast matrix element algorithms that are amenable for GPU acceleration, making use of modern, minimal color decompositions. Moreover, we discuss the performance achieved for standard candle processes such as V+jets and $t\bar{t}$+jets production.

        Speaker: Max Knobbe
      • 52
        Accelerating LHC event generation with simplified pilot runs and fast PDFs

        High-precision calculations are an indispensable ingredient to the success of the LHC physics programme, yet their poor computing efficiency has been a growing cause for concern, threatening to become a paralysing bottleneck in the coming years. We present solutions to eliminate the apprehension by focussing on two major components of general purpose Monte Carlo event generators: The evaluation of parton-distribution functions along with the generation of perturbative matrix elements. We show that for the cost-driving event samples employed by the ATLAS experiment to model omnipresent irreducible Standard Model backgrounds, such as weak boson+jets as well as top-quark-pair production, these components dominate the overall run time by up to 80%. We demonstrate that a reduction of the computing footprint of LHAPDF and SHERPA by factors of around 50 can be achieved for multi-leg NLO event generation, thereby smashing one of the major milestones set by the HSF event generator working group whilst paving the way towards affordable state-of-the-art event simulation in the HL-LHC era.

        Speaker: Christian Gutschow (UCL (UK))
    • Poster session with coffee break Poster Area (Floor -1) (Villa Romanazzi)

      Poster Area (Floor -1)

      Villa Romanazzi

      • 53
        A comparison of HEPSPEC benchmark performance on ATLAS Grid-Sites versus ideal conditions

        The goal of this study is to understand the observed differences in ATLAS software performance, when comparing results measured under ideal laboratory conditions with those from ATLAS computing resources on the Worldwide LHC Computing Grid (WLCG). The laboratory results are based on the full simulation of a single ttbar event and use dedicated, local hardware. In order to have a common and reproducible base to which to compare, thousands of identical ttbar full simulation benchmark jobs were submitted to hundreds of Grid sites using the HammerCloud infrastructure. The impact of the heterogeneous hardware of the Grid sites and the performance difference of different hardware generations is analysed in detail, and a direct, in depth comparison of jobs performed on identical CPU types is also done. The choice of the physics sample used in the benchmark is validated by comparing the performance on each Grid site measured with HammerCloud, weighted by its contribution to the total ATLAS full simulation production output.

        Speaker: Michael Boehler (Albert Ludwigs Universitaet Freiburg (DE))
      • 54
        A Deep Learning based algorithm for PID study with cluster counting

        Ionization of matters by charged particles are the main mechanism for particle identification in gaseous detectors. Traditionally, the ionization is measured by the total energy loss (dE/dx). The concept of cluster counting, which measures the number of clusters per track length (dN/dx), was proposed in the 1970s. The dN/dx measurement can avoid many sources of fluctuations from the dE/dx measurement, which in the end can potentially have a resolution two times better than the dE/dx.

        The dN/dx measurement requires highly efficient reconstruction algorithm. One need to determine the number of peaks associated with the primary electrons in the induced current waveform in a single detection unit. The main challenge of the algorithm is to handle the highly pileup situations of the single peaks and to discriminate the primary peaks from the secondary electrons and noises. A machine learning based algorithm is developed for the cluster counting problem. The algorithm consists of a peak finding algorithm, which aims to find all peaks in the waveform, based on the Recurrent Neural Network (RNN). And a clustering algorithm, which is to determine the number of primary peaks, based on the Convolutional Neural Network (CNN).

        In the talk, the basic idea of cluster counting and the reconstruction algorithm based on machine learning will be presented.

        Speaker: Dr Guang Zhao (Institute of High Energy Physics)
      • 55
        A distributed infrastructure for interactive analysis: the experience at INFN

        The challenges expected for the HL-LHC era, both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of re-thinking their computing models at many levels. In fact a big chunk of the R&D efforts of the CMS experiment have been focused on optimizing the computing and storage resource utilization for the data analysis, and Run3 could provide a perfect benchmark to make studies on new solutions in a realistic scenario. The work that will be shown is focused on the integration and validation phase of an interactive environment for data analysis with the peculiarity of providing a seamless scaling over grid resources at Italian T2s, and possibly opportunistic providers such as HPC. In this approach the integration of new resources has been proved to be exceptionally easy in terms of requirements, thus computing power can be included dynamically in a very effective way. The presentation will firstly focus on an overview of the architectural pillars and the integration challenges. Then the results of a first set of performance measurements will be presented, thanks to a first real user CMS analysis built on top of Root RDataFrame ecosystem that has been successfully executed over such an infrastructure.

        Speaker: Diego Ciangottini (INFN, Perugia (IT))
      • 56
        A FPGA Implementation of the Hough Transform tracking algorithm for the Phase-II upgrade of ATLAS

        The High Energy Physics world will face challenging trigger requests in the next decade. In particular the luminosity increase to 5-7.5 x 1034 cm-2 s-1 at LHC will push the major experiments as ATLAS to exploit the online tracking for their inner detector to reach 10 kHz of events from 1 MHz of Calorimeter and Muon Spectrometer trigger. The project described here is a proposal for a tuned Hough Transform algorithm implementation on FPGA high-end technology, versatile to adapt different tracking situations. The platform developed allows to study different dataset from a software “emulating” the firmware and consequently to the hardware performance and to generate input dataset from ATLAS simulation. Xilinx FPGA have been destined to this implementation, exploiting up to now the VC709 commercial board and its PCI Express Generation 3 technology. The system provides the features to possibly process a 200 pile up event of ATLAS Run4 in the order of 10 µs averagely, with the possibility to run two events at a time. Best efficiency reached are simulated to be > 95 % for single muon tracking. The project plans to be proposed for the Event Filter TDAQ ATLAS Upgrade of Phase-II.

        Speaker: Fabrizio Alfonsi (Universita e INFN, Bologna (IT))
      • 57
        AI Data Quality Monitoring with Hydra

        Hydra is an AI system employing off-the-shelf computer vision technologies aimed at autonomously monitoring data quality. Data quality monitoring is an essential step in modern experimentation and Nuclear Physics is no exception. Certain failures can be identified through alarms (e.g. electrical heartbeats) while others are more subtle and often require expert knowledge to identify and diagnose. In the GlueX experiment at Jefferson Laboratory data quality monitoring is a multistep, human in the loop process that begins with shift crews looking at a litany of plots (e.g. occupancy plots) which indicate the performance of detector subsystems. With the sheer complexity of the systems and number of plots needing to be monitored subtle issues can be, and are, missed. During its time in production (over 2 years) Hydra has lightened the load of shift takers of GlueX by autonomously monitoring detector systems. This talk will describe the construction, training, and operation of the Hydra system in GlueX as well as the ongoing work to develop and deploy the system with other experiments at Jefferson Laboratory and beyond.

        Speaker: Thomas Britton
      • 58
        Applications of supercomputer Tianhe-II in BESIII

        High energy physics experiments are pushing forward the precision measurements and searching for new physics beyond standard model. It is urgent to simulate and generate mass data to meet requirements from physics. It is one of the most popular areas to make good use of existing power of supercomputers for high energy physics computing. Taking the BESIII experiment as an illustration, we deploy the offline software BOSS into the top-tier supercomputer "Tianhe-II" with the help of Singularity. With very limited internet connection bandwidth and without root privilege, we synchronize and maintain the simulation software up to date through CVMFS successfully, and an acceleration rate in a comparison of HPC and HTC is realized for the same large-scale task. There are two creative ideas to be shared in the community: on one hand, common users constantly meet problems in the real-time internet connection and the conflict of loading locker. We solve these two problems by deployment a squid server and using fuse in memory in each computing node. On the other hand, we provide a MPI python interface for high throughput parallel computation in TianheII. Meanwhile, the program to deal with data output is also specially aligned so that there is no queue issue in the I/O task. The acceleration rate in simulation reaches 80% so far, as we have done the simulation tests up to 15 K processes in parallel.

        Speaker: Biying Hu (Sun Yat-sen University)
      • 59
        AtlFast3: Fast Simulation in ATLAS for Run 3 and beyond

        AtlFast3 is the next generation of high precision fast simulation in ATLAS that is being deployed by the collaboration and was successfully used for the simulation of 7 billion events in Run 2 data taking conditions. AtlFast3 combines a parametrization-based approach known as FastCaloSimV2 and a machine-learning based tool that exploits Generative Adversarial Networks (FastCaloGAN) for the simulation of hadrons.

        For the purpose of Run 3, the parametrization of AtlFast3 was fully reworked and many active developments are ongoing to further enhance the quality of fast simulation in ATLAS. This talk will give a brief overview of AtlFast3 with focus on FastCaloSimV2 and outline several improvements with respect to the previous simulator tool AFII. Furthermore, recent advancements in the parametrised simulation, such as the development of a dedicated tune of electromagnetic shower shapes to data are presented.

        Speaker: Rui Zhang (University of Wisconsin Madison (US))
      • 60
        CMS Tracker Alignment: Legacy results from LHC Run 2 and first results from Run 3

        The inner tracking system of the CMS experiment, consisting of the silicon pixel and strip detectors, is designed to provide a precise measurement of the momentum of charged particles and to perform the primary and secondary vertex reconstruction. The movements of the individual substructures of the tracker detectors are driven by the change in the operating conditions during data taking. Frequent updates in the detector geometry are therefore needed to describe accurately the position, orientation, and curvature of the tracker modules.

        The procedure in which new parameters of the tracker geometry are determined is referred to as the alignment of the tracker. The latter is performed regularly during data taking using reconstructed tracks from both collisions and cosmic rays data, and it is further refined after the end of data-taking. The tracker alignment performance corresponding to the ultimate accuracy of the alignment calibration for the legacy reprocessing of the CMS Run 2 data will be presented. The data-driven methods used to derive the alignment parameters and the set of validations that monitor the performance of the physics observables will be reviewed. The first results obtained with the data taken during the year 2021 and the most recent set of results from LHC Run 3 will be presented.

        Speaker: Antonio Vagnerini (Università di Torino)
      • 61
        CMS tracking performance in Run 2 and early Run 3 data using the tag-and-probe technique

        Accurate reconstruction of charged particle trajectories and measurement of their parameters (tracking) is one of the major challenges of the CMS experiment. A precise and efficient tracking is one of the critical components of the CMS physics program as it impacts the ability to reconstruct the physics objects needed to understand proton-proton collisions at the LHC. In this work, we present the tracking performance measured in data where the tag and-probe technique was applied to $Z\longrightarrow \mu^{+}\mu^{-}$ di-muon resonances for all reconstructed muon trajectories and the subset of trajectories in which the CMS Tracker is used to seed the measurement. The performance is assessed using LHC Run 2 at $\sqrt{s}$ = 13 TeV and early LHC Run 3 data at $\sqrt{s}$ = 13.6 TeV.

        Speakers: Brunella D'Anzi (Universita e INFN, Bari (IT)), CMS Collaboration
      • 62
        Commissioning CMS online reconstruction with GPUs

        Building on top of the multithreading functionality that was introduced in Run-2, the CMS software framework (CMSSW) has been extended in Run-3 to offload part of the physics reconstruction to NVIDIA GPUs. The first application of this new feature is the High Level Trigger (HLT): the new computing farm installed at the beginning of Run-3 is composed of 200 nodes, and for the first time each one is equipped with two AMD Milan CPUs and two NVIDIA T4 GPUs. In order to guarantee that the HLT can run on machines without any GPU accelerators - for example as part of the large scale Monte Carlo production running on the grid - the HLT reconstruction has been implemented both for NVIDIA GPUs and for traditional CPUs.

        CMS has undertaken a comprehensive validation and commissioning activity to ensure the successful operations of the new HLT farm and the reproducibility of the physics results while using either of the two implementations: some have taken place offline, on dedicated Tier-2 centres equipped with NVIDIA GPUs; other activities ran online during the LHC commissioning period, after installing GPUs on few of the nodes from the Run-2 HLT farm. The final steps were the optimisation of the HLT configuration, after the installation of the new HLT farm.

        This contribution will describe the steps taken to validate the GPU-based reconstruction and commission the new HLT farm, leading to the successful data taking activities after the LHC Run-3 start up.

        Speakers: CMS collaboration, Marc Huwiler (University of Zurich (CH))
      • 63
        Custom event sample augmentations for ATLAS analysis data

        High Energy Physics (HEP) has been using column-wise data stored in synchronized containers, such as most prominently ROOT’s TTree, for decades. These containers have proven to be very powerful as they combine row-wise association capabilities needed by most HEP event processing frameworks (e.g. Athena) with column-wise storage, which typically results in better compression and more efficient support for many analysis use-cases. The downside, however, is that all events (rows) need to contain the same attributes and therefore extending the list of items to be stored, even if needed only for a subsample of events, can be costly in storage and lead to data duplication.
        The ATLAS experiment has developed navigational infrastructure to allow storing custom data extensions for subsample of events in separate, but synchronized containers. These extensions can easily be added to ATLAS standard data products (such as DAOD-PHYS or PHYSLITE) avoiding duplication of those core data products, while limiting their size increase. As a proof of principle, a prototype based on the Long Lived Particle search is implemented. Preliminary results concerning the event-size as well as reading/writing performance implications associated with this prototype will be presented.
        Augmented data as described above are stored within the same file as the core data. Storing them in dedicated files will be investigated in future, as this could provide more flexibility to store augmentations separate from core data, e.g. certain sites may only want a subset of several augmentations or augmentations can be archived to disk once their analysis is complete.

        Speaker: Lukas Alexander Heinrich (CERN)
      • 64
        Data Calibration and Processing at Belle II

        The Belle II experiment has been collecting data since 2019 at the second generation e+/e- B-factory SuperKEKB in Tsukuba, Japan. The goal of the experiment is to explore new physics via high precision measurement in flavor physics. This is achieved by collecting a large amount of data that needs to be calibrated promptly for fast reconstruction and recalibrated thoroughly for the final reprocessing. To fully automate the calibration process a Python plugin package, b2cal, had been developed based on the open-source Apache Airflow package using Directed Acyclic Graphs (DAGs) to describe the ordering of processes and Flask to provide administration and job submission web pages. Prompt processing and reprocessing are performed at different calibration centers (BNL and DESY, respectively). After calibration, the raw data are reconstructed on the GRID to an analysis-oriented format (mDST), also stored on the GRID, and delivered to the collaborations. This talk will describe the whole procedure, from raw data calibration to mDST production.

        Speaker: Stefano Lacaprara (INFN sezione di Padova)
      • 65
        Design and implementation of computational storage system based on EOS for HEP data processing

        Computing in high energy physics is one kind of typical data-intensive applications, especially some data analysis , which require access to a large amount of data. The traditional computing system adopts the "computing-storage" separation mode, which leads to large data volume move during the computing process, and and also increase transmission delay and network load. Therefore, it can effectively alleviate this situation by pushing down some data-intensive tasks from computing node to storage node. The philosophy is that bringing computing as close to the source of data as possible in order to reduce latency and bandwidth use. Generally, storage nodes have computing resources like CPUs, necessary for deploying distributed file system. However, the computing power in storage node is often ignored. This paper designed and implemented a computational storage system based on CERN Open Storage (EOS). The system presents transparently the computational storage functions through standard POSIX file system interface, such as open, read and write. A plugin implemented in EOS storage node (FST) will execute the specified algorithm or program when it finds the special arguments in filename, for example "&CSS=decode". The plugin can read and write file locally in FST, then register new-generated file into EOS name node (MGM). The paper finally give some test results showing that the computational storage mode performs faster and supports more parallel computing tasks than the traditional mode in some applications like raw data decode for LHAASO experiment. Computational storage mode reduces computation time by 37% in single task execution and 72% in the case of 40 tasks in parallel compared with traditional mode.

        Speakers: Xiaoyu Liu (Central China Normal University CCNU (CN)), Xiaoyu Liu (Institute of High Energy Physics, CAS)
      • 66
        Enabling continuous speedup of CMS Event Reconstruction through continuous benchmarking

        The outstanding performances obtained by the CMS experiment during Run1 and Run2 represent a great achievement of seamless hardware and software integration. Among the different software parts, the CMS offline reconstruction software is essential for translating the data acquired by the detectors into concrete objects that can be easily handled by the analyzers. The CMS offline reconstruction software needs to be reliable and fast. The long shutdown 2 (LS2) elapsed between LHC Run2 and Run3 has been instrumental in the optimization of the CMS offline reconstruction software and for the introduction of new algorithms reaching a continuous CPU speedup. In order to reach these goals, a continuous benchmarking pipeline has been implemented; CPU timing and memory profiling, using the igprof tool, are performed on a regular basis to monitor the footprint of the new developments and identify the possible areas of performance improvement. The current status and achievement obtained by a continuous benchmarking of CMS experiment offline reconstruction software are described here.

        Speaker: Claudio Caputo (Universite Catholique de Louvain (UCL) (BE))
      • 67
        Evolution of the CMS Submission Infrastructure to support heterogeneous resources in the LHC Run 3

        The landscape of computing power available for the CMS experiment is rapidly evolving, from a scenario dominated by x86 processors deployed at WLCG sites, towards a more diverse mixture of Grid, HPC, and Cloud facilities incorporating a higher fraction of non-CPU components, such as GPUs. Using these facilities’ heterogeneous resources efficiently to process the vast amounts of data to be collected in the LHC Run3 and beyond, in the HL-LHC era, is key to CMS’s achieving its scientific goals.

        The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. The Submission Infrastructure, together with other elements in the CMS workload management, has been modified in its strategies and enlarged in its scope to make use of these new resources.

        In this evolution, key questions such as the optimal level of granularity in the description of the resources, or how to prioritize workflows in this new resource mix must be taken into consideration. In addition, access to many of these resources is considered opportunistic by CMS, thus each resource provider may also play a key role in defining particular allocation policies, diverse from the up-to-now dominant system of pledges. All these matters must be addressed in order to ensure the efficient allocation of resources and matchmaking to tasks to maximize their use by CMS.

        This contribution will describe the evolution of the CMS Submission Infrastructure towards a full integration and support of heterogeneous resources according to CMS needs. In addition, a study of the pool of GPUs already available to CMS Offline Computing will be presented, including a survey of their diversity in relation to CMS workloads, and the scalability reach of the infrastructure to support them.

        Speaker: Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas)
      • 68
        Fast track seed selection for track following in the Inner Detector Trigger track reconstruction

        During ATLAS Run 2, in the online track reconstruction algorithm of the Inner Detector (ID), a large proportion of the CPU time was dedicated to the fast track finding. With the proposed HL-LHC upgrade, where the event pile-up is predicted to reach <μ>=200, track finding will see a further large increase in CPU usage. Moreover, only a small subset of Pixel-only seeds is accepted after the fast track finding procedure, essentially discarding the CPU time used on rejected seeds. Therefore, a computationally cheap track candidate seed pre-selection procedure based on approximate track following was designed, which is described in this report. The algorithm uses a parabolic track approximation in the plane perpendicular to the beamline, a combinatorial Kalman filter simplified by a reference-related coordinate system to find the best track candidates. For such candidates, a set of numerical features are created to classify seeds using machine learning techniques, such as Support Vector Machines (SVM) or kernel-based methods. The algorithm was tuned for high identification and rejection of bad seeds, while ensuring no significant loss of track finding efficiency. Current studies focus on implementing the algorithm into the Athena framework for online seed pre-selection, which could be used during Run 3 or potentially be adapted for the ITk geometry for Run 4 of the HL-LHC.

        Speaker: Andrius Vaitkus (University of London (GB))
      • 69
        Faster simulated track reconstruction in the ATLAS Fast Chain

        The production of simulated datasets for use by physics analyses consumes a large fraction of ATLAS computing resources, a problem that will only get worse as increases in the instantaneous luminosity provided by the LHC lead to more collisions per bunch crossing (pile-up). One of the more resource-intensive steps in the Monte Carlo production is reconstructing the tracks in the ATLAS Inner Detector (ID), which takes up about 60% of the total detector reconstruction time [1]. This talk discusses a novel technique called track overlay, which substantially speeds up the ID reconstruction. In track overlay the pile-up ID tracks are reconstructed ahead of time and overlaid onto the ID tracks from the simulated hard-scatter event. We present our implementation of this track overlay approach as part of the ATLAS Fast Chain simulation, as well as a method for deciding in which cases it is possible to use track overlay in the reconstruction of simulated data without performance degradation.

        [1] ATL-PHYS-PUB-2021-012 (60% refers to Run3, mu=50, including large-radius tracking, p11)

        Speaker: William Axel Leight (University of Massachusetts Amherst)
      • 70
        HDTFS:Cost-effective Hadoop Distributed & Tiered File System for High Energy Physics

        With the scale and complexity of High Energy Physics(HEP) experiments increase, researchers are facing the challenge of large-scale data processing. In terms of storage, HDFS, a distributed file system that supports the "data-centric" processing model, has been widely used in academia and industry. This file system can support Spark and other distributed data localization calculations, researching the application of Hadoop Distributed File System(HDFS) in the field of HEP is the basis for ensuring the application of upper-layer computing in this field. However, HDFS expand the cluster capacity by adding cluster nodes, this way cannot meet the high cost-effective system requirements for the persistence and backup process of massive HEP experimental data. In response to the above problems, researching Hadoop Distributed & Tiered File System(HDTFS) that supports disk-tape storage, taking full advantage of the fast disk access speed and the advantages of large tape storage capacity, low price, and long storage period, to solve the high cost of horizontal expansion of HDFS clusters. The system provides users with a single global namespace, and avoids dependence on external metadata servers to access the data stored on tape. In addition, tape layer resources are managed internally so that users do not have to deal with complex tape storage. The experimental results show that this method can effectively solve the massive data storage of HEP Hadoop cluster.

        Speaker: Xiaoyu Liu (IHEP)
      • 71
        Improved Selective Background Monte Carlo Simulation at Belle II with Graph Attention Networks and Weighted Events

        When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and invested statistical methods including sampling and reweighting to deal with biases introduced by the filtering.

        Speaker: Boyang Yu
      • 72
        Machine Learning for Real-Time Processing of ATLAS Liquid Argon Calorimeter Signals with FPGAs

        The Phase-II upgrade of the LHC will increase its instantaneous luminosity by a factor of 7 leading to the High Luminosity LHC (HL-LHC). At the HL-LHC, the number of proton-proton collisions in one bunch crossing (called pileup) increases significantly, putting more stringent requirements on the LHC detectors electronics and real-time data processing capabilities.

        The ATLAS Liquid Argon (LAr) calorimeter measures the energy of particles produced in LHC collisions. This calorimeter has also trigger capabilities to identify interesting events. In order to enhance the ATLAS detector physics discovery potential, in the blurred environment created by the pileup, an excellent resolution of the deposited energy and an accurate detection of the deposited time is crucial.

        The computation of the deposited energy is performed in real-time using dedicated data acquisition electronic boards based on FPGAs. FPGAs are chosen for their capacity to treat large amount of data with very low latency. The computation of the deposited energy is currently done using optimal filtering algorithms that assume a nominal pulse shape of the electronic signal. These filter algorithms are adapted to the ideal situation with very limited pileup and no overlap of the electronic pulses in the detector. However, with the increased luminosity and pileup, the performance of the optimal filter algorithms decreases significantly and no further extension nor tuning of these algorithms could recover the lost performance.

        The back-end electronic boards for the Phase-II upgrade of the LAr calorimeter will use the next high-end generation of INTEL FPGAs with increased processing power and memory. This is a unique opportunity to develop the necessary tools, enabling the use of more complex algorithms on these boards. We developed several neural networks (NNs) with significant performance improvements with respect to the optimal filtering algorithms. The main challenge is to efficiently implement these NNs into the dedicated data acquisition electronics. Special effort was dedicated to minimising the needed computational power while optimising the NNs architectures.

        Five NN algorithms based on CNN, RNN, and LSTM architectures will be presented. The improvement of the energy resolution and the accuracy on the deposited time compared to the legacy filter algorithms, especially for overlapping pulses, will be discussed. The implementation of these networks in firmware will be shown. Two implementation categories in VHDL and Quartus HLS code are considered. The implementation results on Stratix 10 INTEL FPGAs, including the resource usage, the latency, and operation frequency will be reported. Approximations in the firmware implementations, including the use of fixed-point precision arithmetic and lookup tables for activation functions, will be discussed. Implementations including time multiplexing to reduce resource usage will be presented. We will show that two of these NNs implementations are viable solutions that fit the stringent data processing requirements on the latency (O(100ns)) and bandwidth (O(1Tb/s) per FPGA) needed for the ATLAS detector operation.

        Speaker: Steffen Stärz (McGill University, (CA))
      • 73
        Machine Learning Techniques for selecting Forward Electrons $(2.5<\eta<3.2)$ with the ATLAS High Level Trigger

        The ATLAS detector at CERN measures proton proton collisions at the Large Hadron Collider (LHC) which allows us to test the limits of the Standard Model (SM) of particles physics. Forward moving electrons produced at these collisions are promising candidates for finding physics beyond the SM. However, the ATLAS detector is not construed to measure forward leptons with pseudorapidity $\eta$ of more than 2.5 with high precision. The ATLAS performance for forward leptons can be improved by enhancing the trigger system. This system selects events of interest in order to not overwhelm the data storage with the information of around 1.7 billion collisions per second. First studies using the Neural Ringer algorithm for selecting forward electrons with $2.5<\eta<3.2$ show promising results. The Neural Ringer using machine learning to analyse detector information to distinguish electromagnetic from hadronic signatures, is being presented. Additionally, its performance on simulated ATLAS Monte Carlo samples in improving the high level trigger for forward electrons will be shown.

        Speaker: Meinrad Moritz Schefer (Universitaet Bern (CH))
      • 74
        Monitoring CMS experiment data and infrastructure for next generation of LHC run

        As CMS starts the Run 3 data taking, the experiment’s data management software tools along with the monitoring infrastructure have undergone significant upgrades to cope up with the conditions expected in the coming years. The challenges of an efficient, real-time monitoring for the performance of the computing infrastructure or for data distribution are being met using state-of-the-art technologies that are continuously evolving. In this talk, we describe how we set up monitoring pipelines based on a combination of technologies, such as Kubernetes, Spark/Hadoop and other open-source software stacks. We show how the choice of these components is critical for this new generation of services and infrastructure for CMS data management and monitoring. We also discuss how some of the developed monitoring services such as data management monitoring, CPU efficiency monitoring, data-set access and transfers metrics, have been instrumental for taking strategic decisions and increasing the physics harvest through maximal utilization of computing resources available to us.

        Speaker: Ceyhun Uzunoglu (CERN)
      • 75
        Parametrized simulation of the micro-RWELL response with PARSIFAL software

        PARSIFAL (PARametrized SImulation) is a software tool originally implemented to reproduce the complete response of a triple-GEM detector to the passage of a charged particle, taking into account the involved physical processes by their simple parametrization and thus in a very fast way.
        Robust and reliable software, such as GARFIELD++, is widely used to simulate the transport of electrons and ions in the gas and all their interactions step by step, but it is CPU-time consuming. The implementation of PARSIFAL code was driven by the need to reduce the processing time, while maintaining the precision of a full simulation.
        The software must be initialized with some parameters that can be extracted from the GARFIELD++ simulation, which must be run once-and-for-all. Then it can be run independently to provide a reliable simulation, from the ionization, to diffusion, multiplication, signal induction and electronics, only by sampling from a set of functions which describe the physical effects and depend on the input parameters.
        The code has been thoroughly tested on triple-GEM detectors and the simulation was finely tuned to experimental data collected at testbeam.
        Recently, PARSIFAL has been extended to another detector in the MPGD family, the micro-RWELL, thanks to the modular structure of the code. The main difference in the treatment of the physical processes is the introduction of the resistive plane and its effect on the formation of the signal. For this purpose, the charge spread on the resistive layer has been described following the work of M. S. Dixit and A. Rankin (NIM A518 (2004) 721-727, NIM A566 (2006) 281-285) and the electronics readout (APV-25) was added to the description.
        A fine tuning of the simulation is ongoing to reproduce the experimental data collected during testbeams. A similar strategy already validated for the triple-GEM case is used: the variables of interest for the comparison of the experimental data with simulated results are the cluster charge, cluster size and the position resolution obtained by charge centroid and micro-TPC reconstruction algorithms. In this case, special attention must be paid to the tuning of the resistivity of the resistive layer.
        An illustration of the general code, setting the focus on this latest implementation and the first comparison with experimental data from testbeam are the subject of this contribution.

        Speaker: Lia Lavezzi (Universita e INFN Torino (IT))
      • 76
        Progress towards an improved particle flow algorithm at CMS with machine learning

        The particle-flow (PF) algorithm is of central importance to event reconstruction at the CMS detector, and has been a focus of developments in light of planned Phase-2 running conditions with an increased pileup and detector granularity. Current rule-based implementations rely on extrapolating tracks to the calorimeters, correlating them with calorimeter clusters, subtracting charged energy and creating neutral particles from significant energy deposits. Such rule-based algorithms can be difficult to extend and may be computationally inefficient under high detector occupancy, while also being challenging to port to heterogeneous architectures in full detail.

        In recent years, end-to-end machine learning approaches for event reconstruction have been proposed, including for PF at CMS, with the possible advantage of directly optimising for the physical quantities of interest, being highly reconfigurable to new conditions, while also being a natural fit for deployment on heterogeneous accelerators.

        One of the proposed approaches for machine-learned particle-flow (MLPF) reconstruction relies on graph neural networks to infer the full particle content of an event from the tracks and calorimeter clusters based on a training on simulated samples, and has been recently implemented in CMS as a possible future reconstruction R&D direction to fully map out the characteristics of such an approach in a realistic setting.

        We discuss progress in CMS towards an improved implementation of the MLPF reconstruction, now optimised on generator-level particle information for the first time to our knowledge, thus paving the way to potentially improving the detector response in terms of physical quantities of interest. We show detailed physics validation with respect to the current PF algorithm in terms of high-level physical quantities such as jet and MET resolution. Furthermore, we discuss progress towards deploying the MLPF algorithm in the CMS software framework on heterogeneous platforms, performing large-scale hyperparameter optimization using HPC systems, as well as the possibilities of making use of explainable artificial intelligence (XAI) to interpret the output.

        Speaker: Farouk Mokhtar (Univ. of California San Diego (US))
      • 77
        Secrets Management for CMSWEB

        Secrets Management is a process where we manage secrets, like certificates, database credentials, tokens, and API keys in a secure and centralized way. In the present CMSWEB (the portfolio of CMS internal IT services) infrastructure, only the operators maintain all services and cluster secrets in a secure place. However, if all relevant persons with secrets are away, then we are left with no choice but to contact them to get secrets in case of emergency needs.

        In order to overcome this issue, we performed an R&D study for the management of secrets and explored various strategies such as Hashicorp Vault, Github credential manager, and SOPS/age. In this talk, we’ll discuss the process by which CMS investigated these strategies and perform a feasibility analysis of them. We will also underline why CMS chose SOPS as a solution, reviewing how the features of SOPS with age satisfy our needs. We will also discuss how other experiments could adopt our solution.

        Speaker: Muhammad Imran (National Centre for Physics (PK))
      • 78
        Stability of the CMS Submission Infrastructure for the LHC Run 3

        The CMS Submission Infrastructure is the main computing resource provisioning system for CMS workflows, including data processing, simulation and analysis. It currently aggregates nearly 400k CPU cores distributed worldwide from Grid, HPC and cloud providers. CMS Tier-0 tasks, such as data repacking and prompt reconstruction, critical for data-taking operations, are executed on a collection of computing resources at CERN, also managed by the CMS Submission Infrastructure.

        All this computing power is harnessed via a number of federated resource pools, supervised by HTCondor and GlideinWMS services. Elements such as pilot factories, job schedulers and connection brokers are deployed in HA mode across several “availability zones”, providing stability to our services via hardware redundancy and numerous failover mechanisms.

        Given the upcoming start of the LHC Run 3, the Submission Infrastructure stability has been recently tested in a series of controlled exercises, performed without interruption of our services. These tests have demonstrated the resilience of our systems, and additionally provided useful information in order to further refine our monitoring and alarming system.

        This contribution will describe the main elements in the CMS Submission Infrastructure design and deployment, along with the performed failover exercises, proving that our systems are ready to serve their critical role in support of CMS activities.

        Speaker: Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas)
      • 79
        The adaptation of a deep learning model to locating primary vertices in the CMS and ATLAS experiments

        Over the past several years, a deep learning model based on convolutional neural networks has been developed to find proton-proton collision points (also known as primary vertices, or PVs) in Run 3 LHCb data. By converting the three-dimensional space of particle hits and tracks into a one-dimensional kernel density estimator (KDE) along the direction of the beamline and using the KDE as an input feature into a neural network, the model has achieved an efficiency of 98% with a low false positive rate. The success of this method motivates its extension to other experiments, including ATLAS and CMS. Although LHCb is a forward spectrometer and ATLAS and CMS are central detectors, both ATLAS and CMS have the necessary characteristics to compute KDEs analogous to the LHCb detector. While the ATLAS and CMS detectors will benefit from higher precision, the expected number of visible PVs per event will be approximately 10 times that for LHCb, resulting in only slightly altered KDEs. The KDE and a few related input features are fed into the same neural network architectures used to achieve the results for LHCb. We present the development of the input feature and initial results across different network architectures. The results serve as a proof-of-principle that a deep neural network can achieve high efficiency and low false positive rates for finding vertices in ATLAS and CMS data.

        Speaker: Elliott Kauffman (Duke University (US))
      • 80
        Transparent expansion of a WLCG compute site using HPC resources

        Restarting the LHC again after more than 3 years of shutdown, unprecedented amounts of data are expected to be recorded. Even with the WLCG providing a tremendous amount of compute resources to process this data, local resources will have to be used for additional compute power. This, however, makes the landscape in which computing takes place more heterogeneous.

        In this contribution, we present a solution for dynamically integrating non-HEP resources into existing infrastructures using the COBalD/TARDIS resource manager. By providing all resources through conventional CEs as single point-of-entry, the use of these external resources becomes completely transparent for experiments and users.

        In addition, experiences with an existing setup, operated in production since more than a year, extending the German Tier 2 WLCG site operated at RWTH Aachen University with a local HPC cluster will be discussed.

        Speaker: Ralf Florian Von Cube (KIT - Karlsruhe Institute of Technology (DE))
      • 81
        Transparent extension of INFN-T1 with heterogeneous computing architectures

        The INFN-CNAF Tier-1 is engaged for years in a continuous effort to integrate its computing centre with more tipologies of computing resources. In particular, the challenge of providing opportunistic access to nonstandard CPU architectures, such as PowerPC or hardware accelerators (GPUs) has been actively exploited. In this work, we describe a solution to transparently integrate access to ppc64 CPUs as also GPUs. This solution has been tested to transparently extend the INFN-T1 Grid computing centre with Power9 based machines and V100 GPUs from the Marconi 100 HPC cluster managed by CINECA. We also discuss further possible improvements and how this will meet requirements and future plans for the new tecnopolo centre, where the CNAF Tier-1 will be hosted soon.

        Speaker: Stefano Dal Pra (Universita e INFN, Bologna (IT))
    • Track 1: Computing Technology for Physics Research: 1 Sala Federico II (Villa Romanazzi)

      Sala Federico II

      Villa Romanazzi

      Conveners: Daniele Cesini (Universita e INFN, Bologna (IT)), Marica Antonacci (INFN)
      • 82
        The journey towards HEPscore, the HEP-specific CPU benchmark for WLCG

        HEPscore is a CPU benchmark, based on HEP applications, that the HEPiX Working Group is proposing as a replacement of the currently used HEPSpec06 benchmark, adopted in WLCG for procurement, computing resource pledges and performance studies.
        In 2019, we presented at ACAT the motivations for building a benchmark for the HEP community based on HEP applications. The process from the conception to the implementation and validation of this objective has been inspiring and challenging. In the spirit of the HEP community, it has involved many contributions from software developers, data analysts, experts of the experiments, representatives of several WLCG computing centres, as well as the WLCG HEPscore Deployment Task Force.
        In this contribution, we review this long journey and in particular the technological solutions selected, such as containerization of the HEP applications and cvmfs snapshotting. We update the community on the readiness status of HEPscore, the HEP application mix selected to build HEPscore and the deployment plans for 2023. We describe the current campaign of measurements performed on multiple WLCG sites, intended to study the performance of eleven HEP applications on more than 50 different computer systems.
        Finally, we also cover how to extend the HEPscore adoption to the benchmarking of heterogeneous resources, and how it can include workloads for physics analysis and Machine Learning algorithms.

        Speaker: Domenico Giordano (CERN)
      • 83
        CPU-level resources allocation for optimal execution of multi-process physics code

        During the LHC LS2, the ALICE experiment has undergone a major upgrade of the data acquisition model, evolving from a trigger-based model to a continuous readout. The upgrade allows for an increase in the number of recorded events by a factor of 100 and in the volume of generated data by a factor of 10. The entire experiment software stack has been completely redesigned and rewritten to adapt to the new requirements and to make optimal use of storage and CPU resources. The architecture of the new processing software relies on running parallel processes on multiple processor cores and using large shared memory areas for exchanging data between them.

        Without mechanisms that guarantee job resource isolation, the deployment of multi-process jobs can result in a usage that exceeds those originally requested and allocated. Internally, jobs may launch as many processes as defined in their workflow, significantly higher than the number of allocated CPU cores. This freedom of execution can be limited by mechanisms like cgroups, already employed by some Grid sites, however these are a minority. If jobs are allowed to run unconstrained, they may interfere with each other in terms of the simultaneous utilization of the resources. Constrainment mechanisms in this context improve the fairness of resource utilization, both between ALICE jobs and towards other users in general.

        The efficient use of the worker nodes' cache memory is closely related to the CPU cores executing the job. An important aspect to consider is the host architecture and the cache topology, i.e. cache levels, size and hierarchical connection to individual cores. Memory usage patterns of running tasks, the memory and cache topologies and the chosen CPU cores to constrain the job to influence the overall efficiency of the execution, in terms of useful work done by unit of time.

        This paper presents an analysis of the impact of different CPU pinning strategies on the efficiency of the execution of simulation tasks. The evaluation of the different configurations is performed by extracting a set of metrics tightly related to job turnaround and efficient resource utilization. The results are presented both for the execution of a single job on an idle machine and for whole node saturation, analyzing the interference between jobs. Different host architectures are studied for a global and robust assessment.

        Speaker: Marta Bertran Ferrer (CERN)
      • 84
        ML-based tool for RPC currents quality monitoring

        The CMS experiment has 1056 Resistive Plate Chambers (RPCs) in its muon system. Monitoring their currents is the first essential step towards maintaining the stability of the CMS RPC detector performance. An automated monitoring tool to carry out this task has been developed. It utilises the ability of Machine Learning (ML) methods in the modelling of the behavior of the current of these chambers. Two types of ML approaches are used: Generalized Linear Models and Autoencoders. In the GLM case, a set of parameters such as environmental conditions, LHC parameters and working point are used to characterize the behavior of the current. In the autoencoder case, the set of currents for all of the high-voltage channels of the RPC system are used as input and the autoencoder network is trained to reproduce these inputs on the output neurons. Both approaches show very good predictive capabilities, with accuracy of the order of 1-2 μA. These predictive capabilities are the basis for the monitoring tool, which is going to be tested during Run 3. All the developed tools are integrated in a framework that can be easily accessed and controlled by a specially developed Web User Interface that allows the end user to work with the monitoring tool in a simple manner.

        Speaker: Elton Shumka (University of Sofia - St. Kliment Ohridski (BG))
      • 85
        EJFAT: Towards Intelligent Compute Destination Load Balancing

        To increase the science rate for high data rates/volumes, JLab is partnering with ESnet for development of an AI/ML directed dynamic Compute Work Load Balancer (CWLB) of UDP streamed data. The CWLB is an FPGA featuring dynamically configurable, low fixed latency, destination switching and high throughput. The CLWB effectively provides seamless integration of edge / core computing to support direct experimental data processing for immediate use by JLab science programs and others such as the EIC as well as data centers of the future. The ESnet/JLaB FPGA Accelerated Transport (EJFAT) project is targeting near future projects requiring high throughput and low latency for both hot and cooled data for both running experiment data acquisition systems and data center use cases.

        The essential function of the CWLB data plane is to redirect so designated data channel streams sharing a common data event designation to selectable destination hosts as a function of data event id, and target host ports as a function of data channel id. Thus is effected a form of hierarchical horizontal scaling at two levels; the first across compute host machines data event by data event for a type of pipe-lined processing for a series of events and secondly across ports on a compute host so that different data id channels may be assigned to different processors for parallelized further processing, e.g., reassembly, event reconstruction, physics harvesting, etc.

        An EJFAT control plane running external to the CLWB and using both network and compute farm telemetry, effects AI directed and predictive resource allocation, capacity assessment, and scheduling of compute farm resources in order to dynamically reconfigure the CLWB in-situ as the operating context and conditions require.

        Speaker: michael goodrich
    • Track 2: Data Analysis - Algorithms and Tools Sala Europa (Villa Romanazzi)

      Sala Europa

      Villa Romanazzi

      Conveners: Adriano Di Florio (Politecnico e INFN, Bari), Enrico Guiraud (EP-SFT, CERN)
      • 86
        Performance study of the CLUE algorithm with the alpaka library

        CLUE (CLUsters of Energy) is a fast, fully-parallelizable clustering algorithm developed to optimize such a crucial step in the event reconstruction chain of future high granularity calorimeters. The main drawback of having an unprecedentedly high segmentation in this kind of detectors is a huge computation load that, in case of the CMS, must be reduced to fit the harsh requirements of the Phase-2 High Level Trigger.

        With the adoption of alpaka as performance portability library in CMSSW, the CLUE algorithm has been tested on multiple accelerators and hybrid platforms. This work presents the latest results obtained with the alpaka implementation of CLUE, which can fully exploit the available hardware on each machine and fulfill the task with high performance.

        Speaker: Tony Di Pilato (CASUS - Center for Advanced Systems Understanding (DE))
      • 87
        Neural Estimation of Energy Mover’s Distance for Clustering

        We propose a novel neural architecture that enforces an upper bound on the Lipschitz constant of the neural network (by constraining the norm of its gradient with respect to the inputs). This architecture was useful in developing new algorithms for the LHCb trigger which have robustness guarantees as well as powerful inductive biases leveraging the neural network’s ability to be monotonic in any subset of features. A new and interesting direction for this architecture is that it can also be used in the estimation of the Wasserstein metric (or the Earth Mover’s Distance) in optimal transport using the Kantorovich-Rubinstein duality. In this talk, I will describe how such architectures can be leveraged for developing new clustering algorithms using the Energy Mover’s Distance. Clustering using optimal transport generalizes all previous well-known clustering algorithms in HEP (anti-kt, Cambridge-Aachen, etc.) to arbitrary geometries and offers new flexibility in dealing with effects such as pile-up and unconventional topologies. I will also talk in detail about how this flexibility can be used to develop new algorithms which are more suitable for the Electron-Ion Collider setting than conventional ones.

        Speaker: Ouail Kitouni (Massachusetts Inst. of Technology (US))
      • 88
        Particle Transformer for Jet Tagging

        Jet tagging is a critical yet challenging classification task in particle physics. While deep learning has transformed jet tagging and significantly improved performance, the lack of a large-scale public dataset impedes further enhancement. In this work, we present JetClass, a new comprehensive dataset for jet tagging. The JetClass dataset consists of 100 M jets, about two orders of magnitude larger than existing public datasets. A total of 10 types of jets are simulated, including several types unexplored for tagging so far. Based on the large dataset, we propose a new Transformer-based architecture for jet tagging, called Particle Transformer (ParT). By incorporating pairwise particle interactions in the attention mechanism, ParT achieves higher tagging performance than a plain Transformer and surpasses the previous state-of-the-art, ParticleNet, by a large margin. The pre-trained ParT models, once fine-tuned, also substantially enhance the performance on two widely adopted jet tagging benchmarks.
        https://arxiv.org/abs/2202.03772

        Speaker: Sitian Qian (Peking University (CN))
      • 89
        Boost-Invariant Polynomials: an efficient and interpretable approach to jet tagging

        Besides modern architectures designed via geometric deep learning achieving high accuracies via Lorentz group invariance, this process involves high amounts of computation. Moreover, the framework is restricted to a particular classification scheme and lacks interpretability.
        To tackle this issue, we present BIP, an efficient and computationally cheap framework to build rotational, permutation, and boost in the jet mean axis invariances. Moreover, we show the versatility of our approach to obtaining state-of-the-art range accuracies in both supervised and unsupervised jet tagging by using several out-of-the-box classifiers.

        Speakers: Mr Ilyes Batatia (Engineering Laboratory, University of Cambridge), Mr Jose M Munoz (EIA University)
    • Track 3: Computations in Theoretical Physics: Techniques and Methods Sala A+A1 (Villa Romanazzi)

      Sala A+A1

      Villa Romanazzi

      Conveners: Daniel Maitre (University of Durham (GB)), Marcello Maggi (Universita e INFN, Bari (IT))
      • 90
        Studying Hadronization by Machine Learning Techniques

        Hadronization is a non-perturbative process, which theoretical description can not be deduced from first principles. Modeling hadron formation requires several assumptions and various phenomenological approaches. Utilizing state-of-the-art Computer Vision and Deep Learning algorithms, it is eventually possible to train neural networks to learn non-linear and non-perturbative features of the physical processes.

        Here, I would like to present the latest results of two deep neural networks, by investigating global and kinematical quantities, indeed jet- and event-shape variables. The widely used Lund string fragmentation model is applied as a baseline in √s=7 TeV proton-proton collisions to predict the most relevant observables at further LHC energies. Non-liear QCD scaling properties were also identified and validated by experimental data.

        [1] G. Bíró, B. Tankó-Bartalis, G.G. Barnaföldi; arXiv:2111.15655

        Speaker: Gabor Biro (Wigner Research Centre for Physics (Wigner RCP) (HU))
      • 91
        Invertible Networks for the Matrix Element Method

        For many years, the matrix element method has been considered the perfect approach to LHC inference. We show how conditional invertible neural networks can be used to unfold detector effects and initial-state QCD radiation, to provide the hard-scattering information for this method. We illustrate our approach for the CP-violating phase of the top Yukawa coupling in associated Higgs and single-top production.

        Speaker: Theo Heimel (Heidelberg University)
      • 92
        Quantum-Inspired Machine Learning

        Learning tasks are implemented via mappings of the sampled data set, including both the classical and the quantum framework. The quantum-inspired approach mimics the support vector machine mapping in a high-dimensional feature space, yielded by the qubit encoding. In our application such scheme is framed in the formulation of a least-squares problem for the minimization of the mean squared error cost function, implemented by means of measurements. The ability of quantum algorithms to manage a high number of parameters will characterize their analysis capability for complex systems, like the targeted biomedical framework.

        Speaker: Domenico Pomarico (INFN Sezione di Bari)
    • 93
      Technical Seminar: Accelerated Innovation for a More Sustainable and Open HPC-AI Sala Europa (Villa Romanazzi)

      Sala Europa

      Villa Romanazzi

      Speaker: Andrea Luiselli (Intel Corporation Italia SPA (IT))
    • Welcome cocktail Sala Scuderia (Villa Romanazzi)

      Sala Scuderia

      Villa Romanazzi

    • Plenary Sala Europa (Villa Romanazzi Carducci)

      Sala Europa

      Villa Romanazzi Carducci

      Conveners: David Britton (University of Glasgow (GB)), Lucia Silvestris (Universita e INFN, Bari (IT))
      • 94
        Updates from the organizers
        Speakers: Axel Naumann (CERN), Lucia Silvestris (Universita e INFN, Bari (IT))
      • 95
        Machine Learning in the Search for New Fundamental Physics

        As the search for new fundamental phenomena at modern particle colliders is a complex and multifaceted task dealing with high-dimensional data, it is not surprising that machine learning based techniques are quickly becoming a widely used tool for many aspects of searches. On the one hand, classical strategies are being supercharged by ever more sophisticated tagging algorithms; on the other hand, new paradigms — such as searching for anomalies in a data-driven way — are being proposed. This talk will review some key developments and consider which steps might be needed to maximise the discovery potential of particle physics experiments.

        Speaker: Gregor Kasieczka (Hamburg University (DE))
      • 96
        Practical Quantum Computing for Scientific Applications

        Trapped ion is the leading candidate for realizing practically useful quantum computers, as the system features highest performance quantum computational operations. Introduction of advanced integration technologies has provided an opportunity to convert a complex atomic physics experiment into a stand-alone programmable quantum computer. In this talk, I will discuss recent technological progress that changed the perception of a trapped ion system as a scalable quantum computer and enabled commercially viable quantum computer. I will also discuss several application areas where quantum computers can make a practical contribution to the computational frontier in scientific applications.

        Speaker: Jungsang Kim
      • 97
        Microform and Macromolecules: Archiving digital data on analog or biological storage media

        Today, we live in a data-driven society. For decades, we wanted fast storage devices that can quickly deliver data, and storage technologies evolved to meet this requirement. As data-driven decision making becomes an integral part of enterprises, we are increasingly faced with a new need-–one for cheap, long-term storage devices that can safely store the data we generate for tens or hundreds of years to meet legal and regulatory compliance requirements.
        In this talk, we will first explore recent trends in the storage hardware landscape that show that all current storage media face fundamental limitations that threaten our ability to store, much less process, the data we generate over long time frames. We will then focus on unconventional biological and analog media that have received quite some attention recently--synthetic Deoxyribonucleic acid (DNA) and film. After highlighting the pros and cons of using each as a digital storage media, I will present our recent work in the EU-funded Future and Emerging Technologies (FET) project OligoArchive, that focuses on overcoming challenges in using such media to build a deep archival tier for data management systems.

        Speaker: Raja Appuswamy (Eurecom)
    • Poster session with coffee break Area Poster (Floor -1) (Villa Romanazzi)

      Area Poster (Floor -1)

      Villa Romanazzi

      • 98
        A graph neural network for B decays reconstruction at Belle II

        Over the past few years, intriguing deviations from the Standard Model predictions have been reported in measurements of angular observables and branching fractions of $B$ meson decays, suggesting the existence of a new interaction that acts differently on the three lepton families. The Belle II experiment has unique features that allow to study $B$ meson decays with invisible particles in the final state, in particular neutrinos. It is possible to deduce the presence of such particles from the energy-momentum imbalance obtained after reconstructing the companion $B$ meson produced in the event. This task is complicated by the thousands of possible final states $B$ mesons can decay into, and is currently performed at Belle II by the Full Event Interpretation (FEI) software, an algorithm based on Boosted Decision Trees and limited to specific, hard-coded decay processes.
        In recent years, graph neural networks have proven to be very effective tools to describe relations in physical systems, with applications in a range of fields. Particle decays can be naturally represented in the form of rooted, acyclic tree graphs, with nodes corresponding to particles and edges representing the parent-child relations between them. In this work, we present a graph neural network approach to generically reconstruct $B$ decays at Belle II by exploiting the information from the detected final state particles, without formulating any prior assumption about the nature of the decay. This task is performed by reconstructing the Lowest Common Ancestor matrix, a novel representation, equivalent to the adjacency matrix, that allows reconstruction of the decay from the final state particles alone. Preliminary results show that the graph neural network approach outperform the FEI by a factor of at least 3.

        Speaker: Jacopo Cerasoli (CNRS - IPHC)
      • 99
        Application of Unity for detector modeling in BESIII

        Detector modeling and visualization are essential in the life cycle of a High Energy Physics (HEP) experiment. Unity is a professional multi-media creation software that has the advantages of rich visualization effects and easy deployment on various platforms. In this work, we applied the method of detector transformation to convert the BESIII detector description from the offline software framework into the 3D detector modeling in Unity. By matching the geometric units with detector identifiers, the new event display system based on Unity can be developed for BESIII. The potential for further application development into virtual reality will also be introduced.

        Speaker: Zhijun Li (Sun Yat-Sen University (CN))
      • 100
        Awkward Arrays to RDataFrame and back

        Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis.

        In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source. This view is generated on demand and the data is not copied. The column readers are generated based on the run-time type of the views. The readers are passed to a generated source derived from ROOT::RDF::RDataSource.

        The ak.from_rdataframe function converts the selected columns as native Awkward Arrays.

        We discuss the details of the implementation exploiting JIT techniques. We present examples of analysis of data stored in Awkward Arrays via a high-level interface of an RDataFrame.

        We show a few examples of the column definition, applying user-defined filters written in C++, and plotting or extracting the columnar data as Awkward Arrays.

        We discuss current limitations and future plans.

        Speaker: Ianna Osborne (Princeton University)
      • 101
        CERNLIB status

        We present a revived version of the CERNLIB, the basis for software
        ecosystems of most of the pre-LHC HEP experiments. The efforts to
        consolidate the CERNLIB are part of the activities of the Data Preservation
        for High Energy Physics collaboration to preserve data and software of
        the past HEP experiments.

        The presented version is based on the CERNLIB version 2006 with numerous
        patches made for the compatibility with modern compilers and operating systems.
        The code is available publicly in the CERN GitLab repository with all
        the development history starting from the early 1990s. The updates also
        include a re-implementation of the build system in cmake to make CERNLIB
        compliant with the current best practices and to increase the chances of
        preserving the code in a compilable state for the decades to come.

        The revived CERNLIB project also includes an updated documentation, which we
        believe is a cornerstone for any preserved software depending on it.

        Speaker: Andrii Verbytskyi (Max Planck Society (DE))
      • 102
        Comparing and improving hybrid deep learning algorithms for identifying and locating primary vertices

        Identifying and locating proton-proton collisions in LHC experiments (known as primary vertices or PVs) has been the topic of numerous conference talks in the past few years (2019-2021). Efforts to search for a variety of potential architectures have yielded potential candidates for PV-finder. The UNet model, for example, has achieved an efficiency of 98% with a low false-positive rate. These results can be obtained with numerous other neural network architectures. It also converges faster than any previous model. While this does not answer the question of how the algorithm learns, it does provide some useful insights into the open question. We present the results from this architectural study of different algorithms and their performance in locating PVs for LHCb data. The goal is to demonstrate progress in developing a performant architecture and evaluate different algorithms' learning.

        Speaker: Simon Akar (University of Cincinnati (US))
      • 103
        Continuous Integration for the FairRoot Software Stack

        The FairRoot software stack is a toolset for the simulation, reconstruction, and analysis of high energy particle physics experiments (currently used i.e. at FAIR/GSI, and CERN). In this work we give insight into recent improvements of Continuous Integration (CI) for this software stack. CI is a modern software engineering method to efficiently assure software quality. We discuss relevant development workflows and how they were improved through automation. Furthermore, we present our infrastructure detailing its hardware and software design choices. The entire toolchain is composed of free and open source software. Finally, this work concludes with lessons learned from an operational as well as a user perspective and outlines ideas for future improvements.

        Speakers: Dennis Klein (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE)), Dr Christian Tacke (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
      • 104
        Data Management interfaces for CMS experiment: building an improved user experience

        After a successful adoption of Rucio following its inception in 2018 as the new data management system, a subsequent step is to advertise this to the users among other stakeholders. In this perspective, one of the objectives is to keep improving the tooling around Rucio. As Rucio introduces a new data management paradigm w.r.t the previous model, we begin by tackling the challenges arising from such a shift in the data model, while trying to alleviate the impact on users. Thus we focus on building a monitoring system capable of answering questions that do not naturally fit the current paradigm while also providing new features and services for the users to naturally push further the adoption and the benefits of the new implementation. In this regard, we present the process of development and evolution path of a set of new interfaces dedicated to the extension of the current monitoring infrastructure and the integration of a user-dedicated CLI capable of granting users an almost seamless transition and enhancement for their daily data management activity. We try to maintain minimum dependencies and ensure decoupling to these tools making them of potential use for other experiments. These will form a set of extensions to the Rucio API that is intended at automating a series of most frequent use cases. Eventually enhancing the user experience and lowering the barriers for newcomers.

        Speaker: Rahul Chauhan (CERN)
      • 105
        Data Quality Monitoring for the JUNO Experiment

        In High Energy Physics (HEP) experiment, Data Quality Monitoring (DQM) system is crucial to ensure the correct and smooth operation of the experimental apparatus during the data taking. DQM at Jiangmen Underground Neutrino Observatory (JUNO) will reconstruct raw data directly from JUNO Data Acquisition (DAQ) system and use event visualization tools to show the detector performance for high quality data taking. The strategy of the JUNO DQM, as well as its design and performance will be presented.

        Speaker: Kaixuan Huang
      • 106
        Distributed data processing pipelines in ALFA

        The common ALICE-FAIR software framework ALFA offers a platform for simulation, reconstruction and analysis of particle physics experiments. FairMQ is a module of ALFA that provides building blocks for distributed data processing pipelines, composed out of components communicating via message passing. FairMQ integrates and efficiently utilizes standard industry data transport technologies, while hiding the transport details behind an abstract interface. In this work we present the latest developments in FairMQ, focusing on the new and improved features of the transport layer, primarily the shared memory transport and the generic interface features. Furthermore, we present the new control and configuration facilities, that allow programmatically controlling a group of FairMQ components. Additionally, new debugging and monitoring tools are highlighted. Finally, we outline how these tools are used by the ALICE experiment.

        Speaker: Alexey Rybalchenko (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
      • 107
        Evaluating Generative Adversarial Networks for particle hit generation in a cylindrical drift chamber using Fréchet Inception Distance

        We evaluate two Generative Adversarial Network (GAN) models developed by the COherent Muon to Electron Transition (COMET) collaboration to generate sequences of particle hits in a Cylindrical Drift Chamber (CDC). The models are first evaluated by measuring the similarity between distributions of particle-level, physical features. We then measure the Effectively Unbiased Fréchet Inception Distance (FID) between distributions of high-dimensional representations obtained with: InceptionV3; then a version of InceptionV3 fine-tuned for event classification; and a 3D Convolutional Neural Network that has been specifically designed for event classification. We also normalize the obtained FID values by the FID for two sets of real samples, setting the scores for different representations on the same scale. This novel relative FID metric is used to compare our GAN models to state-of-the-art natural image generative models.

        Speakers: Irene Andreou, Noam Mouelle (Imperial College London)
      • 108
        Event Display Development for Mu2e using Eve-7

        The Mu2e experiment will search for the CLFV neutrinoless coherent conversion of muon to electron, in the field of an Aluminium nucleus. A custom offline event display has been developed for Mu2e using TEve, a ROOT based 3-D event visualisation framework. Event displays are crucial for monitoring and debugging during live data taking as well as for public outreach. A custom GUI allows event selection and navigation. Reconstructed data like the tracks, hits and clusters can be displayed within the detector geometries upon GUI request. True Monte Carlo trajectory of particles traversing the muon beam line, obtained directly from Geant4 can also be displayed. Tracks are coloured according to their particle ID and users can select the trajectories to be displayed. Reconstructed tracks are refined using a Kalman filter. The resulting tracks can be displayed alongside truth information, allowing visualisation of the track resolution. The user can remove/add data based on energy deposited in a detector or arrival time. This is a prototype and an online event display, is currently under-development using Eve-7 which allows remote access for live data taking and lets multiple users to simultaneously view and interact with the display.

        Speaker: Namitha Chithirasreemadam (University of Pisa)
      • 109
        Experience in SYCL/oneAPI for event reconstruction at the CMS experiment

        The CMS software framework (CMSSW) has been recently extended to perform part of the physics reconstruction with NVIDIA GPUs. To avoid writing a different implementations of the code for each back-end the decision was to use a performance portability library and so Alpaka has been chosen as the solution for Run-3.
        In the meantime different studies have been performed to test the track reconstruction and clustering algorithms on different back-ends like CUDA and Alpaka.
        With the idea of exploring new solutions, INTEL GPUs have been considered as a new possible back-end and their implementation is currently under development.
        This is achieved using SYCL, that is a cross-platform abstraction C++ programming model for heterogeneous computing. It allows developers to reuse code across different hardware and also perform custom tuning for a specific accelerator. The SYCL implementation used is the Data Parallel C++ library (DPC++) in the Intel oneAPI Toolkit.

        In this work, we will present the performance of physics reconstruction algorithms on different hardware. Strengths and weaknesses of this heterogeneous programming model will also be presented.

        Speaker: Aurora Perego (Universita & INFN, Milano-Bicocca (IT))
      • 110
        Exploring the use of accelerators for lossless data compression in CMS

        The CMS collaboration has a growing interest in the use of heterogeneous computing and accelerators to reduce the costs and improve the efficiency of the online and offline data processing: online, the High Level Trigger is fully equipped with NVIDIA GPUs; offline, a growing fraction of the computing power is coming from GPU-equipped HPC centres. One of the topics where accelerators could be used for both online and offline processing is data compression.

        In the past decade a number of research papers exploring the use of GPUs for lossless data compression have appeared in academic literature, but very few practical application have emerged. In the industry, NVIDIA has recently published the nvcomp GPU-accelerated data compression library, based on closed-source implementations of standard and dedicated algorithms. Other platforms, like the IBM Power 9 processors, offer dedicated hardware for the acceleration of data compression tasks.

        In this work we review the recent developments on the use of accelerators for data compression. After summarising the recent academic research, we will measure the performance of representative open- and closed-source algorithms over CMS data, and compare it with the CPU-only algorithms currently used by ROOT and CMS (lz4, zlib, zstd).

        Speaker: Stefan Rua (Aalto University)
      • 111
        General shower simulation MetaHEP in key4hep framework

        Description of development of cascades of particles in a calorimeter of a high energy physics experiment relies on precise simulation of particle interactions with matter. It is inherently slow and constitutes a challenge for HEP experiments. Furthermore, with the upcoming high luminosity upgrade of the Large Hadron Collider and a much increased data production rate, the amount of required simulated events will increase accordingly. Several research directions investigated the use of Machine Learning (ML) based models to accelerate particular calorimeter response simulation. These models typically require a large amount of data and time for training, and the result is a specifically tuned simulation. Meanwhile, meta-learning has emerged in ML community as a fast learning algorithm using small training datasets. In this contribution, we present MetaHEP, a meta-learning approach to accelerate shower simulation in different calorimeters using very high granular data. We show its application using a calorimeter proposed for the Future Circular Collider (FCC-ee) and integration into key4hep framework.

        Speaker: Dalila Salamani (CERN)
      • 112
        Implementation of generic SoA data structure in the CMS software

        GPU applications require a structure of array (SoA) layout for the data to achieve good memory access performance. During the development of the CMS Pixel reconstruction for GPUs, the Patatrack developers crafted various techniques to optimise the data placement in memory and its access inside GPU kernels. The work presented here gathers, automates and extends those patterns, and offers a simplified and consistent programming interface.

        The work automates the creation of SoA structures, fulfilling technical requirements like cache line alignment, while optionally providing alignment and cache hinting to the compiler and range checking. Protection of read-only products of the CMS software framework (CMSSW) is also ensured with constant versions of the SoA. A compact description of the SoA is provided to minimize the size of data passed to GPU kernels. Finally, the user interface is designed to be as simple as possible, providing an AoS-like semantic allowing compact and readable notation in the code.

        The result of porting of CMSSW to SoA will be presented, along with performance measurements.

        Speaker: Eric Cano (CERN)
      • 113
        Improving robustness of jet tagging algorithms with adversarial training

        In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst’s perspective, obtaining highest possible performance is desirable, but recently, some focus has been laid on studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier‘s vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. This contribution presents different approaches using a set of attacks with varying complexity. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account. Additional cross-checks against other, physics-inspired mismodeling scenarios are performed and give rise to the presumption that adversarially trained models can cope better with simulation artifacts or subtle detector effects.

        Speakers: Annika Stein (Rheinisch Westfaelische Tech. Hoch. (DE)), Spandan Mondal (RWTH Aachen (DE))
      • 114
        JETFLOW: Generating jets with Normalizing Flows using the jet mass as condition and constraint

        In this study, jets with up to 30 particles are modelled using Normalizing Flows with Rational Quadratic Spline coupling layers. The invariant mass of the jet is a powerful global feature to control whether the flow-generated data contains the same high-level correlations as the training data. The use of normalizing flows without conditioning shows that they lack the expressive power to do this. Using the mass as a condition for the coupling transformation enhances the model's performance on all tracked metrics. In addition, we demonstrate how to sample the original mass distribution with the use of the empirical cumulative distribution function and we
        study the usefulness of including an additional mass constraint in the loss term. On the JetNet dataset, our model shows state-of-the-art performance combined with a general model and stable training.

        Speaker: Benno Kach (Deutsches Elektronen-Synchrotron (DE))
      • 115
        Machine learning techniques for data quality monitoring at the CMS detector

        The CMS experiment employs an extensive data quality monitoring (DQM) and data certification (DC) procedure. Currently, this approach consists mainly of the visual inspection of reference histograms which summarize the status and performance of the detector. Recent developments in several of the CMS subsystems have shown the potential of computer-assisted DQM and DC using autoencoders, spotting detector anomalies with high accuracy and a much finer time granularity than previously accessible. We will discuss a case study for the CMS pixel tracker, as well as the development of a common infrastructure to host computer-assisted DQM and DC workflows. This infrastructure facilitates accessing the input histograms, provides tools for preprocessing, training and validating, and generates an overview of potential detector anomalies.

        Speaker: Rosamaria Venditti (Universita e INFN, Bari (IT))
      • 116
        Machine learning-based vertex reconstruction for reactor neutrinos in JUNO

        Jiangmen Underground Neutrino Observatory (JUNO), located at the southern part of China, will be the world’s largest liquid scintillator(LS) detector. Equipped with 20 kton LS, 17623 20-inch PMTs and 25600 3-inch PMTs in the central detector, JUNO will provide a unique apparatus to probe the mysteries of neutrinos, particularly the neutrino mass ordering puzzle. One of the challenges for JUNO is the high precision vertex reconstruction for reactor neutrino events. This talk will present machine learning-based vertex reconstruction in JUNO, particularly the comparison of different machine learning models as well as the optimization of the model inputs for better reconstruction performance.

        Speaker: Wuming Luo (Institute of High Energy Physics, Chinese Academy of Science)
      • 117
        Particle Flow Reconstruction on Heterogeneous Architecture for CMS

        The Particle Flow (PF) algorithm, used for a majority of CMS data analyses for event reconstruction, provides a comprehensive list of final-state state particle candidates and enables efficient identification and mitigation methods for simultaneous proton-proton collisions (pileup). The higher instantaneous luminosity expected during the upcoming LHC Run 3 will impose challenges for CMS event reconstruction. This will be amplified in the HL-LHC era, where luminosity and pileup rates are expected to be significantly higher. One of the approaches CMS is investigating to cope with this challenge is to adopt the heterogeneous computing architectures and accelerate event reconstruction. In this talk, we will discuss the effort to adopt the PF reconstruction to take advantage of GPU accelerators.

        We will discuss the design and implementation of PF clustering for the CMS Electromagnetic and Hadronic Calorimeters using Cuda, including optimizations of the PF algorithm. The physics validation and performance of the GPU-accelerated algorithms will be demonstrated by comparing these to the CPU-based implementation.

        Speaker: Felice Pantaleo (CERN)
      • 118
        Preliminary Results of Vectorization of Density Functional Theory calculations in Geant4/V for amino acids

        Density Functional Theory (DFT) is an extended ab initio method used for calculating the electronic properties of molecules. Considering Hartree Fock methods, the DFT offers appropriate approximations regarding the time calculations. Recently, the DFT method has been used for discovering and analyzing protein interactions by means of calculating the free energies of these macro-molecules from short to large scales. However, calculating the ground-state energy by DFT for many-body systems of molecules as proteins, in a reasonable time with enough accuracy, is still a very challenging and intensive task for the CPU’s resources.
        On the other hand, Geant4 is a toolkit for simulating the effects of energy through matter and the nature of materials with a wide range of specialized methods that include DNA and protein exploration. Unfortunately, the execution time to obtain an effective protein analysis is still a strong restriction for CPU processors. In this sense, the GeantV project searches to exploit the vectorization of CPUs, designed to tackle the problem of intensive charge of calculus at the cores of CPUs. In this work, we present the preliminary results of the partial implementation of the DFT in the Geant4 framework and the vectorized GeantV project. We show the advantages and the partial methods used for vectorizing several sub-routines in the calculus of ground-state energy for some amino acids and some molecules.

        Speaker: Oscar Roberto Chaparro Amaro (Instituto Politécnico Nacional. Centro de Investigación en Computación)
      • 119
        Supporting multiple hardware architectures at CMS: the integration and validation of Power9

        Computing resources in the Worldwide LHC Computing Grid (WLCG) have been based entirely on the x86 architecture for more than two decades. In the near future, however, heterogeneous non-x86 resources, such as ARM, POWER and Risc-V, will become a substantial fraction of the resources that will be provided to the LHC experiments, due to their presence in existing and planned world-class HPC installations. The CMS experiment, one of the four large detectors at the LHC, has started to prepare for this situation, with the CMS software stack (CMSSW) already compiled for multiple architectures. In order to allow for a production use, the tools for workload management and job distribution need to be extended to be able to exploit heterogeneous architectures.

        Profiting from the opportunity to exploit the first sizable IBM Power9 allocation available on Marconi100 HPC system at CINECA, CMS developed all the needed modifications to the CMS workload management system. After a successful proof of concept, a full physics validation has been performed in order to bring the system in production. The experiences are of very high value, when it comes to commissioning of the similar (even larger) Summit HPC system at Oak Ridge, where CMS is also expecting a resource allocation. Moreover the compute power of those systems is being provided also via GPUs and this represents an extremely valuable opportunity to exploit the offloading capability already implemented in CMSSW.

        The status of the current integration including the exploitation of the GPUs, the results of the validation as well as the future plans will be shown and discussed.

        Speaker: Daniele Spiga (Universita e INFN, Perugia (IT))
      • 120
        The CMS Roadmap towards HL-LHC Software and Computing

        The Phase-2 upgrade of CMS, coupled with the projected performance of the HL-LHC, shows great promise in terms of discovery potential. However, the increased granularity of the CMS detector and the higher complexity of the collision events generated by the accelerator pose challenges in the areas of data acquisition, processing, simulation, and analysis. These challenges cannot be solved solely by increments in the computing resources available to CMS, but must be accompanied by major improvements of the computing model and computing software tools, as well as data processing software and common software tools. We present aspects of our roadmap for those improvements, focusing on the plans to reduce storage and CPU needs as well as take advantage of heterogeneous platforms, such as the ones equipped with GPUs, and High Performance Computing Centers. We describe the most prominent research and development activities being carried out in the experiment, demonstrating their potential effectiveness in either mitigating risks or quantitatively reducing computing resource needs on the road to the HL-LHC.

        Speaker: Danilo Piparo (CERN)
      • 121
        The Level-1 Global Trigger for Phase-2: Algorithms, configuration and integration in the CMS offline framework

        The CMS Level-1 Trigger, for its operation during Phase-2 of LHC, will undergo a significant upgrade and redesign. The new trigger system, based on multiple families of custom boards, equipped with Xilinx Ultrascale Plus FPGAs and interconnected with high speed optical links at 25 Gb/s, will exploit more detailed information from the detector subsystems (calorimeter, muon systems, tracker). In contrast to its implementation during Phase-1, information from the CMS tracker is now also available at the Level-1 Trigger and can be used for particle flow algorithms. The final stage of the Level-1 Trigger, called Global Trigger (GT), will receive more than 20 different trigger object collections from upstream systems and will be able to evaluate a menu of more than 1000 cut-based algorithms distributed over 12 boards. These algorithms may not only apply conditions on parameters such as momentum or angle of a particle, but can also do arithmetic calculations, like the invariant mass of a suspected mother particle of interest or the angle between two particles. The Global Trigger is designed as a modular system, with an easily re-configurable algorithm unit, to meet the demand of high flexibility required for shifting trigger strategies during Phase-2 operation of the LHC. The algorithms themselves are kept highly configurable and tools are provided to allow their study from within the CMS offline software framework (CMSSW) without the need for knowledge of the underlying firmware implementation. To allow the reproducible translation of the physicist-designed trigger menu to VHDL for use in the hardware trigger, a tool has been developed that converts the Python-based configuration used by CMSSW to VHDL. In addition to cut-based algorithms, neural net algorithms are being developed and integrated into the Global Trigger framework. To make use of these algorithms in hardware, the HLS4ML framework is used, which transpiles pre-trained neural nets, generated in the most commonly used software frameworks, into firmware code. A prototype firmware for a single Global Trigger board has been developed, which includes the de-multiplexing logic, conversion to an internal common object format and distribution of the data over all Super Logic Regions. In this framework 312 algorithms are implemented at a clock speed of 480MHz. The prototype has been thoroughly tested and verified with the bit-wise compatible C++ emulator. In this contribution we present the Phase-2 Global Trigger with an emphasis on the Global Trigger algorithms, their implementation in hardware, configuration with Python and the novel integration within the CMS offline software framework (CMSSW).

        Speaker: Elias Leutgeb (Technische Universitaet Wien (AT))
      • 122
        Trigger Rate Monitoring Tools at CMS

        With the start of run 3 in 2022, the LHC has entered a new period, now delivering higher energy and luminosity proton beams to the Compact Muon Solenoid (CMS) experiment. These increases make it critical to maintain and upgrade the tools and methods used to monitor the rate at which data is collected (the trigger rate). Software tools have been developed to allow for automated rate monitoring, and we present several upgrades to these software tools, which maintain and expand on their functionality. These trigger rate monitoring tools allow for real-time monitoring including alerts which go out to on-call experts in the case of abnormalities. Fits are produced from previously collected data and extrapolate the behaviors of the triggers as a function of pile-up (the average number of particle interactions per bunch-crossing). These fits allow for visualization and statistical analysis of the behavior of the triggers and are displayed on the online monitoring system (OMS). The rate monitoring code can also be used for offline data certification and more complex trigger analysis. This presentation will show some of the upgrades to this software with an emphasis on the automation for easier and consistent upgrades and fixes to the software, and the increased interactivity with the users.

        Speaker: John Lawrence (University of Notre Dame (US))
      • 123
        Updates on the Low-Level Abstraction of Memory Access

        Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program.
        The low-level abstraction of memory access (LLAMA) is a C++ library that provides a zero-runtime-overhead abstraction layer, underneath which memory layouts can be freely exchanged, focusing on multidimensional arrays of nested, structured data.
        It provides a framework for defining and switching custom memory mappings at compile time to define data layouts, data access and access instrumentation, making LLAMA an ideal tool to tackle memory-related optimization challenges in heterogeneous computing.
        After its scientific debut, several improvements and extensions have been added to LLAMA. This includes compile-time array extents for zero memory overhead, support for computations during memory access, new mappings (e.g. int/float bit-packing or byte-swapping) and more. This contribution provides an overview of the LLAMA library, its recent development and an outlook of future activities.

        Speaker: Bernhard Manfred Gruber (Technische Universitaet Dresden (DE))
      • 124
        Variational AutoEncoders for Anomaly Detection in VBS events within an EFT framework

        We present a machine-learning based method to detect deviations from a reference model, in an almost independent way with respect to the theory assumed to describe the new physics responsible for the discrepancies.

        The analysis is based on an Effective Field Theory (EFT) approach: under this hypothesis the Lagrangian of the system can be written as an infinite expansion of terms, where the first ones are those from the Standard Model (SM) Lagrangian and the following terms are higher dimension operators. The presence of the EFT operators impacts the distributions of the observables by producing deviations from the shapes expected when the SM Lagrangian alone is considered .

        We use a Variational AutoEncoder (VAE) trained on SM processes to identify EFT contributions as anomalies. While SM events are expected to be reconstructed properly, events generated taking into account EFT contributions are expected to be poorly reconstructed, thus accumulating in the tails of the loss function distribution. Since the training of the model does not depend on any specific new physics signature, the proposed strategy does not make specific assumptions on its nature. In order to improve the discrimination performances, we introduced a DNN classifier that distinguishes between EFT and SM events based on the values of the reconstruction and regularization losses of the model. In this second model a cross entropy term is added to the usual loss of the VAE, optimizing at the same time the reconstruction of the input variables and the classification. This procedure ensures that the model is optimized for discrimination, with a small price in terms of model independency due to the use of one of the 15 operators from the EFT model in the training.

        In this talk we will discuss in detail the above-mentioned methods using generator level VBS events produced at LHC and assuming, in order to compute the significance of possible new physics contributions, an integrated luminosity of $350 fb^{-1}$.

        Speaker: Giulia Lavizzari
      • 125
        XRootD caching for Belle II

        The Belle II experiment at the second generation e+/e- B-factory SuperKEKB has been collecting data since 2019 and aims to accumulate 50 times more data than the first generation experiment, Belle.
        To efficiently process these steadily growing datasets of recorded and
        simulated data that end up on the order of 100 PB and to support
        Grid-based analysis workflows using the DIRAC Workload Management
        System, an XRootD-based caching architecture is presented.
        The presented mechanism decreases job waiting time for often-used datasets by transparently adding copies of these files at smaller sites without managed storage.
        The described architecture seamlessly integrates local storage services and supports the use of dynamic computing resources with minimal deployment effort.
        This is especially useful in environments with many institutions providing comparatively small numbers of cores and limited personpower.

        This talk will describe the implemented cache at GridKa, a main computing centre for Belle II, as well as its performance and upcoming opportunities for caching for Belle II.

        Speaker: Moritz David Bauer
    • Plenary Sala Europa (Villa Romanazzi Carducci)

      Sala Europa

      Villa Romanazzi Carducci

      Conveners: Domenico Elia (INFN Bari), Michael Poat
      • 126
        Machine learning for phase space integration with SHERPA

        Simulated event samples from Monte-Carlo event generators (MCEGs) are a backbone of the LHC physics programme.
        However, for Run III, and in particular for the HL-LHC era, computing budgets are becoming increasingly constrained, while at the same time the push to higher accuracies
        is making event generation significantly more expensive.
        Modern ML techniques can help with the effort of creating such costly samples in two ways.

        One way is to use inference models to try to learn the event distribution of the entire MCEG toolchain, or parts of it, such that events can then be generated with those \emph{replacement models}
        in a fraction of the time a full MCEG would require.
        This ansatz is however intrinsically constrained by the available training data.
        Another way, and this is the one discussed in this talk, is to keep the MCEG,
        and to use ML \emph{assistant models} to increase the efficiency of certain performance bottlenecks.

        One of those bottlenecks is the sampling of the high-dimensional phase space of complex processes,
        for which a given distribution must be approximated as closely as possible.
        This is indeed a very generic problem, such that methods can be explored that have been developed
        in entirely different fields of physics or even outside of physics.

        In this talk I will discuss the potential to increase the phase space sampling efficiency
        using the methods of Neural Importance Sampling and Nested Sampling,
        and of neural network surrogates of the integrand to increase the efficiency of event unweighting.
        The application of these methods within the \textsc{Sherpa} generator framework is then reviewed.

        Speaker: Enrico Bothmann (University of Göttingen)
      • 127
        Towards Zero-Waste Computing

        “Computation” has become a massive part of our daily lives; even more so, in science, a lot of experiments and analysis rely on massive computation. Under the assumption that computation is cheap, and time-to-result is the only relevant metric for all of us, we currently use computational resources at record-low efficiency.
        In this talk, I argue this approach is an unacceptable waste of computing resources. I further define the goal of zero-waste computing and discuss how performance engineering methods and techniques can facilitate this goal. By means of a couple of case-studies, I will also demonstrate performance engineering at work, proving how efficiency and time-to-result can co-exist.

        Speaker: Ana Lucia Varbanescu
      • 128
        Lattice QCD on supercomputers with Chinese CPU

        Lattice QCD is ab initio approach for QCD and plays an indispensable role in understanding the low energy properties of the strong interaction. Last four decades have witnessed the rapid development of the lattice QCD numerical calculation along with the progress of the high performance computing (HPC) techniques. Lattice QCD becomes one of the most resource-consuming HPC fields. China has built several native supercomputers with different hardware architectures,
        such as Sunway series, Tianhe series and Sunrising-1 etc., which provide potentially massive HPC resources for lattice QCD studies.
        This talk will give a brief introduction to the code developing and the performance of lattice QCD software on these strikingly different computing systems.

        Speaker: Ying CHEN
    • 1:00 PM
      Lunch break Sala Scuderia (Villa Romanazzi)

      Sala Scuderia

      Villa Romanazzi

    • Track 1: Computing Technology for Physics Research Sala Federico II (Villa Romanazzi)

      Sala Federico II

      Villa Romanazzi

      Conveners: Michael Poat, Marica Antonacci (INFN)
      • 129
        Challenges and opportunities in migrating the CNAF datacenter to the Bologna Tecnopolo

        The INFN Tier1 data center is currently located in the premises of the Physics Department of the University of Bologna, where CNAF is also located. Soon it will be moved to the “Tecnopolo”, the new facility for research, innovation, and technological development in the same city area; it will follow the installation of Leonardo, the pre-exascale supercomputing machine managed by CINECA, co-financed as part of the EuroHPC Joint Undertaking.
        The construction of the new CNAF data center will consist of two phases, corresponding to the computing requirements of LHC: Phase 1, starting from 2023, will involve an IT power of 3 MW, and Phase 2, starting from 2025, involving an IT power up to 10 MW.
        The primary goal of the new data center is to cope with the computing requirements of the data taking of the HL-LHC experiments, in the time spanning from 2026 to 2040, providing, at the same time, computing services for several other INFN experiments, projects, and activities of interest, being they currently in operation, under construction, in advanced design, or even not yet defined. The co-location with Leonardo will also open new scenarios, with a close integration between the two systems able to share dynamically resources.
        In this presentation we will describe the new center design, with a particular focus on the status of the migration, its schedule, and the technical challenges we have to face moving the data center without service interruption. On top of this, we will analyze the opportunities that the new infrastructure will open in the context of the PNRR (National Plan for Resilience and Recovery) funding and strategic plans, within and beyond the High Energy Physics domain.

        Speakers: Daniele Cesini (Universita e INFN, Bologna (IT)), Luca dell'Agnello (INFN), Dr Tommaso Boccali (INFN Sezione di Pisa)
      • 130
        A cloud-based computing infrastructure for the HERD cosmic-ray experiment

        The HERD experiment will perform direct cosmic-ray detection at the highest ever reached energies, thanks to an innovative design that maximizes the acceptance, and its placement on the future Chinese Space Station which will allow for an extended observation period."

        Significant computing and storage resources are foreseen to be needed in order to cope with the necessities of a large community driving a big experimental device with an energy reach above PeV for hadrons and multi-TeV for electrons and positrons. For example, at PeV energies Monte Carlo simulations require a massive amount of computing power, and very large simulated data sets are needed for detector performance studies like electron-proton rejection.

        The HERD computing infrastructure is currently being investigated and prototyped in order to provide a flexible, robust and easy to use cloud-based computing and storage platform. It is based on
        technical solutions originally developed by the "Dynamic On Demand Analysis Service" (DODAS) framework in the context of projects such as INDIGO-DataCloud, EOSC-hub and XDC. It allows to seamlessly access both commercial and institutional cloud resources, in order to efficiently make use of opportunistic resources to cope with high-demand periods (like full dataset reprocessings and specialized Monte Carlo productions), as well transparently integrate with with on-premise computing resources managed by an HTCondor batch system. The cloud platform also allows for an easy and efficient deployment of services for the collaboration like calendar, document server, code repository etc. making use of available, free open source solutions. Finally, an Indigo-IAM instance provides a Single-Sign-On service for access control for the whole infrastructure.

        An overview of the current status and of the future perspectives will be presented.

        Speaker: Nicola Mori (INFN Florence)
      • 131
        The new GPU-based HPC cluster at ReCaS-Bari

        The ReCaS-Bari datacenter enriches its service portfolio providing a new HPC/GPU cluster for Bari University and INFN users. This new service is the best solution for complex applications requiring a massively parallel processing architecture. The cluster is equipped with cutting edge Nvidia GPUs, like V100 and A100, suitable for those applications able to use all the available parallel hardware. Artificial intelligence, complex model simulation (weather and earthquake forecasts, molecular dynamics and galaxy formation) and all high precision floating-point based applications are possible candidates to be executed on the new service. The cluster is composed of 10 machines with a total computing resource equals to 1755 cores, 13.7 TB RAM, 55 TB local disk and 38 high performance GPUs (18 Nvidia A100 and 20 Nvidia V100). Each node can access the ReCaS-Bari distributed storage based on GPFS equals to 8.3 PB. Applications are executed only within Docker containers, conferring to the HPC/GPU cluster features like easy application configuration and execution, reliability, flexibility and security. Currently, users are able to choose among different ready-to-use services like remote IDEs (Jupyter Notebook and RStudio), by which execute GPU based applications, or a job orchestration to whom submit complex workflow represented as DAG (Directed Acyclic Graphs). The user service portfolio is in evolution. If the provided user services do not cover the user needs, user-defined Docker containers can be executed on the Cluster. Long running services and job submission are managed with Marathon and Chronos respectively, two frameworks running along with Apache Mesos. These three tools add high availability, fault tolerant and security additional to the native capacity to manage all compute resources and user requests. The implemented technological solution allows users to continue to access their own data both from HTC cluster (based on HTCondor) and from HPC/GPU Cluster, based on Mesos.
        The first phase, where local beta-testers used the cluster, concluded successfully. The service is now ready to join the national INFN-Cloud federation. Leveraging the INDIGO PaaS orchestrator, provides multiple ready-to-used frameworks and services (ML_INFN, Apache Spark, JupyterLab, …), a stable and secure authentication layer, a simple web dashboard that can be used to deploy services on top of and an heterogeneous set of resources. The evolution of the service, where a performance evaluation of Kubernetes as replacement of Apache Mesos, is in the pipeline.
        In this contribution will be presented and discussed resources and technological solutions related to the HPC/GPU Cluster in the ReCaS-Bari data center and the most important applications running on the cluster.

        Speaker: Gioacchino Vino (INFN Bari (IT))
      • 132
        Power Efficiency in HEP (x86 vs. arm)

        The power consumption of computing is coming under intense scrutiny worldwide, driven both by concerns about the carbon footprint, and by rapidly rising energy costs.
        ARM chips, widely used in mobile devices due to their power efficiency, are not currently in widespread use as capacity hardware on the Worldwide LHC Computing Grid.
        However, the LHC experiments are increasingly able to compile their workloads on the ARM architecture to take advantage of various HPC facilities (e.g., ATLAS, CMS).

        The work described in this paper attempts to compare the energy consumption of various workloads on two almost identical machines, one with an arm64 CPU and the other with a standard AMD x86_64 CPU, operating in identical conditions.
        This builds on our initial study of two rather dissimilar machines, located at different UK Universities, which produced some interesting, but at times contradictory, results, showing the need to control the comparison more closely.

        The set of benchmarks used include CPU intensive, memory intensive, and I/O bound tasks, ranging from simple scripts, through compiled C programs, to typical HEP workloads (full ATLAS simulations).
        We also plan to test the most recent HEPscore containerized jobs, which are actively being developed to match LHC Run3 conditions and can already target different architectures.

        The results compare both the power consumption and execution time of the same workload on the two different architectures (arm64 and x86_64).
        This will help inform Grid sites whether there are any scenarios where power efficiency can be improved for LHC computing by deploying ARM-based hardware.

        Speaker: Emanuele Simili
    • Track 2: Data Analysis - Algorithms and Tools Sala Europa (Villa Romanazzi)

      Sala Europa

      Villa Romanazzi

      Conveners: Sophie Berkman, Tony Di Pilato (CASUS - Center for Advanced Systems Understanding (DE))
      • 133
        Simultaneous track finding and track fitting by the Deep Neural Network at BESIII

        Track fitting and track hit classification are highly relevant, hence these two approaches could benefit each other. For example, if we know the underlying parameters of a track, then track hits associated with the track can be easily identified. On the other hand, if we know the hits of a track, then we can get underlying parameters by fitting them. Most existing works take the second scheme by classifying track hits and then estimating track parameters.
        Inspired by the above observations and the success of multi-task training, we propose a unified framework to address track fitting and track hit classification simultaneously in an end-to-end fashion. The method takes hits from multiple tracks as inputs, where each hit holds 4-dimensional features, including 2D position, hitting time, and deposit charge. We feed these inputs to a backbone network to extract per-hit features. Then the network is divided into two branches. One branch is a reconstruction branch, which estimates the parameters of each track and its existence. The other branch is a track segmentation branch, which takes learned features of PointNet++ and tracks features to determine a hit-wise track assignment. In essence, we can assign each track hit to its potential track to classify track hits. This method allows us to predict the track parameters of a track candidate while conducting per-track hit classification. This study leverages the simulated multi-track samples of the BESIII drift chamber. Preliminary results indicate our framework is able to categorize hits of different tracks and the candidate track parameters simultaneously.

        Speaker: Yao Zhang
      • 134
        Hierarchical Graph Neural Networks for Particle Track Reconstruction

        Graph Neural Networks (GNN) have recently attained competitive particle track reconstruction performance compared to traditional approaches such as combinatorial Kalman filters. In this work, we implement a version of Hierarchical Graph Neural Networks (HGNN) for track reconstruction, which creates the hierarchy dynamically. The HGNN creates “supernodes” by pooling nodes into clusters, and builds a “supergraph” which enables message passing among supernodes. A new differentiable pooling algorithm that can maintain the sparsity and produce variable number of supernodes is proposed to facilitate the hierarchy construction. We perform an apples-to-apples comparison between the Interaction Network (IN) and HGNN on track finding performance using node embedding metric learning, which shows that in general HGNNs are more robust against imperfectly constructed input graphs, and more powerful in recognizing long-distance patterns. Equipped with soft assignment, HGNN also allows assigning a given hit to multiple track candidates. The HGNN model can be used as a node-supernode pair classifier, where supernodes are considered to be track candidates. Under this regime, the pair-classifying HGNN is even more powerful than the node embedding HGNN. We show that the HGNN can not only improve upon the performance of common GNN architectures on embedding and clustering problems but also opens up other approaches for GNNs in high energy physics.

        Speaker: Ryan Liu (University of California, Berkeley)
      • 135
        Particle Tracking with Noisy Intermediate-Scale Quantum Computers

        Particle track reconstruction poses a key computing challenge for future collider experiments. Quantum computing carries the potential for exponential speedups and the rapid progress in quantum hardware might make it possible to address the problem of particle tracking in the near future. The solution of the tracking problem can be encoded in the ground state of a Quadratic Unconstrained Binary Optimization. In our study, sets of three hits in the detector are grouped into triplets. True triplets are part of trajectories of particles, while false triplets are random combinations of three hits. By approximating the ground state, the Variational Quantum Eigensolver algorithm aims at identifying true triplets. Different circuits and optimizers are tested for small instances of the tracking problem with up to 23 triplets. Precision and recall are determined in a noiseless simulation and the effects of readout errors are studied. It is planned to repeat the experiments on real hardware and to combine the solutions of small instances to address the full-scale tracking problem.

        Speaker: Mr Tim Schwägerl (Humboldt University of Berlin and DESY (DE))
      • 136
        Standalone track reconstruction in LHCb's SciFi detector for the GPU-based High Level Trigger

        As part of the Run 3 upgrade, the LHCb experiment has switched to a two stage event trigger, fully implemented in software. The first stage of this trigger, running in real time at the collision rate of 30MHz, is entirely implemented on commercial off-the-shelf GPUs and performs a partial reconstruction of the events.
        We developed a novel strategy for this reconstruction, starting with two independent tracking algorithms, in the VELO and SciFi detectors, forming track segments which are then matched and merged to form full tracks, suitable for selecting events at LHCb. A key point enabling this sequence is the SciFi tracking algorithm, which was implemented for GPU with special care in order to meet the throughput requirements of a real time trigger.
        Developing such algorithm is challenging due to the high number of track hypothesis that needs to be tested. We discuss how this challenge was overcome by using the GPU architecture efficiently and how the efficiency of the new sequence is compared to the current baseline reconstruction.

        Speaker: Arthur Hennequin (Massachusetts Inst. of Technology (US))
      • 137
        Navigation, field integration and track parameter transport through detectors using GPUs and CPUs within the ACTS R&D project

        The use of hardware acceleration, particularly of GPGPUs is one promising strategy for coping with the computing demands in the upcoming high luminosity era of the LHC and beyond. Track reconstruction, in particular, suffers from exploding combinatorics and thus could greatly profit from the massively parallel nature of GPGPUs and other accelerators. However, classical pattern recognition algorithms and their current implementations, albeit very successfully deployed in the CPU based software of current LHC experiments, show several shortcomings when adapted to modern accelerator architectures; the geometry, for example, is often characterized by runtime-polymorphic shapes, which are incompatible with common heterogeneous programming platforms. In addition, field integration modules need efficient access to the magnetic field on a variety of devices, and adaptive Runge-Kutta methods may cause thread divergence.

        In order to investigate whether state-of-the-art CPU based track reconstruction software can be adapted to run efficiently on GPUs, the ACTS project has launched a dedicated R&D program aiming to develop a demonstrator that mirrors the current track reconstruction chain based on seed finding followed by a combinatorial Kalman filter available in the ACTS suite. We demonstrate the implementation and performance of a core component of this chain: the propagation of track parameters and their associated covariances through a non-homogenous magnetic field including the navigation through a highly complex geometry with different shapes together with the application of material effects when passing through detector material. This demonstrator showcases the usage of the detray library for geometry description and navigation, the covfie library for an efficient description and interpolation of a complex magnetic field on different hardware backends, a dedicated algebra plugin that allows using different math implemenations, and is based on the vecmem library, which has been developed to handle memory resources on host and device. We demonstrate that it is possible to perform this task using single-source code across multiple devices, and we compare the performance of this heterogeneous reconstruction chain to existing CPU-based code in the ACTS project.

        Speakers: Andreas Salzburger (CERN), Beomki Yeo, Joana Niermann (Georg August Universitaet Goettingen (DE))
    • Track 3: Computations in Theoretical Physics: Techniques and Methods Sala A+A1 (Villa Romanazzi)

      Sala A+A1

      Villa Romanazzi

      Conveners: Leonardo Cosmai, Ryan Moodie (Turin University)
      • 138
        Loop integral computation in the Euclidean or physical kinematical region using numerical integration and extrapolation

        The computation of loop integrals is required in high energy physics to account for higher-order corrections of the interaction cross section in perturbative quantum field theory. Depending on internal masses and external momenta, loop integrals may suffer from singularities where the integrand denominator vanishes at the boundaries, and/or in the interior of the integration domain (for physical kinematics).

        In previous work we implemented iterated integration numerically using one- or low-dimensional adaptive integration algorithms in subsequent coordinate directions, enabling intensive subdivision in the vicinity of singularities. To handle a threshold singularity originating from a vanishing denominator in the interior of the domain, we add a term (for example, $i\delta$) in the denominator, and perform a nonlinear extrapolation to a sequence of integrals obtained for a (geometrically) decreasing sequence of $\delta.$

        In addition this may give rise to UV singularities, treated by dimensional regularization, where the space-time dimension $n = 4$ is replaced by $n = 4-2\varepsilon$ for a sequence of $\varepsilon$ values, and a linear extrapolation is applied as $\varepsilon$ tends to zero. Presence of both types of singularities may warrant a double extrapolation. In this paper we will devise and apply a strategy for loop integral computations by combining these methods as needed for a set of Feynman diagrams. In view of the compute-intensive nature, the code is further multi-threaded to run in a shared memory environment.

        Speaker: Dr Elise de Doncker (Western Michigan University)
      • 139
        Loop Amplitudes from Precision Networks

        Evaluating loop amplitudes is a time-consuming part of LHC event generation. For di-photon production with jets we show that simple, Bayesian networks can learn such amplitudes and model their uncertainties reliably. A boosted training of the Bayesian network further improves the uncertainty estimate and the network precision in critical phase space regions. In general, boosted network training of Bayesian networks allows us to move between fit-like and interpolation-like regimes of network training.

        Speaker: Anja Butter
      • 140
        Emulation of high multiplicity NLO k-factors

        Evaluation of one-loop matrix elements is computationally expensive and makes up a large proportion of time during event generation. We present a neural network emulator that builds in the factorisation properties of matrix elements which accurately reproduces the NLO k-factors for electron-position annihilation into up to 5 jets.

        We show that our emulator retains good performance for high multiplicities and that there is a significant speed advantage over more traditional loop provider tools.

        Speaker: HENRY TRUONG
      • 141
        Anomaly searches for new physics at the LHC

        In this talk I will give an overview of our recent progress in developing anomaly detection methods for finding new physics at the LHC. I will discuss how we define anomalies in this context, and the deep learning tools that we can use to find them. I will also discuss how self-supervised representation learning techniques can be used to enhance anomaly detection methods.

        Speaker: Barry Dillon (University of Heidelberg)
      • 142
        First results of Local Unitarity at N3LO

        Local Unitarity provides an order-by-order representation of perturbative cross-sections that realises at the local level the cancellation of final-state collinear and soft singularities predicted by the KLN theorem. The representation is obtained by manipulating the real and virtual interference diagrams contributing to transition probabilities using general local identities. As a consequence, the Local Unitarity representation can be directly integrated using Monte Carlo methods and without the need of infrared counter-terms. I will present first results from this new approach with examples up to N3LO accuracy. I will conclude by giving an outlook on future generalisations of the method applicable to hadronic collisions.

        Speaker: Mr Zeno Capatti (ETH Zürich)
    • Poster session with coffee break Area Poster (Floor -1) (Villa Romanazzi)

      Area Poster (Floor -1)

      Villa Romanazzi

      • 143
        A graph neural network for B decays reconstruction at Belle II

        Over the past few years, intriguing deviations from the Standard Model predictions have been reported in measurements of angular observables and branching fractions of $B$ meson decays, suggesting the existence of a new interaction that acts differently on the three lepton families. The Belle II experiment has unique features that allow to study $B$ meson decays with invisible particles in the final state, in particular neutrinos. It is possible to deduce the presence of such particles from the energy-momentum imbalance obtained after reconstructing the companion $B$ meson produced in the event. This task is complicated by the thousands of possible final states $B$ mesons can decay into, and is currently performed at Belle II by the Full Event Interpretation (FEI) software, an algorithm based on Boosted Decision Trees and limited to specific, hard-coded decay processes.
        In recent years, graph neural networks have proven to be very effective tools to describe relations in physical systems, with applications in a range of fields. Particle decays can be naturally represented in the form of rooted, acyclic tree graphs, with nodes corresponding to particles and edges representing the parent-child relations between them. In this work, we present a graph neural network approach to generically reconstruct $B$ decays at Belle II by exploiting the information from the detected final state particles, without formulating any prior assumption about the nature of the decay. This task is performed by reconstructing the Lowest Common Ancestor matrix, a novel representation, equivalent to the adjacency matrix, that allows reconstruction of the decay from the final state particles alone. Preliminary results show that the graph neural network approach outperform the FEI by a factor of at least 3.

        Speaker: Jacopo Cerasoli (CNRS - IPHC)
      • 144
        Application of Unity for detector modeling in BESIII

        Detector modeling and visualization are essential in the life cycle of a High Energy Physics (HEP) experiment. Unity is a professional multi-media creation software that has the advantages of rich visualization effects and easy deployment on various platforms. In this work, we applied the method of detector transformation to convert the BESIII detector description from the offline software framework into the 3D detector modeling in Unity. By matching the geometric units with detector identifiers, the new event display system based on Unity can be developed for BESIII. The potential for further application development into virtual reality will also be introduced.

        Speaker: Zhijun Li (Sun Yat-Sen University (CN))
      • 145
        Automatic differentiation of binned likelihoods with RooFit and Clad

        RooFit is a toolkit for statistical modeling and fitting used by most experiments in particle physics. Just as data sets from next-generation experiments grow, processing requirements for physics analysis become more computationally demanding, necessitating performance optimizations for RooFit. One possibility to speed-up minimization and add stability is the use of automatic differentiation (AD). Unlike for numerical differentiation, the computation cost scales linearly with the number of parameters, making AD particularly appealing for statistical models with many parameters. In this talk, we report on one possible way to implement AD in RooFit. Our approach is to add a facility to generate C++ code for a full RooFit model automatically. Unlike the original RooFit model, this generated code is free of virtual function calls and other RooFit-specific overhead. In particular, this code is then used to produce the gradient automatically with Clad. Clad is a source transformation AD tool implemented as a plugin to the clang compiler, which automatically generates the derivative code for input C++ functions. We show results demonstrating the improvements observed when applying this code generation strategy to HistFactory and other commonly used RooFit models. HistFactory is the subcomponent of RooFit that implements binned likelihood models with probability densities based on histogram templates. These models frequently have a very large number of free parameters, and are thus an interesting first target for AD support in RooFit.

        Speaker: Garima Singh (Princeton University (US))
      • 146
        Awkward Arrays to RDataFrame and back

        Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis.

        In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source. This view is generated on demand and the data is not copied. The column readers are generated based on the run-time type of the views. The readers are passed to a generated source derived from ROOT::RDF::RDataSource.

        The ak.from_rdataframe function converts the selected columns as native Awkward Arrays.

        We discuss the details of the implementation exploiting JIT techniques. We present examples of analysis of data stored in Awkward Arrays via a high-level interface of an RDataFrame.

        We show a few examples of the column definition, applying user-defined filters written in C++, and plotting or extracting the columnar data as Awkward Arrays.

        We discuss current limitations and future plans.

        Speaker: Ianna Osborne (Princeton University)
      • 147
        CERNLIB status

        We present a revived version of the CERNLIB, the basis for software
        ecosystems of most of the pre-LHC HEP experiments. The efforts to
        consolidate the CERNLIB are part of the activities of the Data Preservation
        for High Energy Physics collaboration to preserve data and software of
        the past HEP experiments.

        The presented version is based on the CERNLIB version 2006 with numerous
        patches made for the compatibility with modern compilers and operating systems.
        The code is available publicly in the CERN GitLab repository with all
        the development history starting from the early 1990s. The updates also
        include a re-implementation of the build system in cmake to make CERNLIB
        compliant with the current best practices and to increase the chances of
        preserving the code in a compilable state for the decades to come.

        The revived CERNLIB project also includes an updated documentation, which we
        believe is a cornerstone for any preserved software depending on it.

        Speaker: Andrii Verbytskyi (Max Planck Society (DE))
      • 148
        Comparing and improving hybrid deep learning algorithms for identifying and locating primary vertices

        Identifying and locating proton-proton collisions in LHC experiments (known as primary vertices or PVs) has been the topic of numerous conference talks in the past few years (2019-2021). Efforts to search for a variety of potential architectures have yielded potential candidates for PV-finder. The UNet model, for example, has achieved an efficiency of 98% with a low false-positive rate. These results can be obtained with numerous other neural network architectures. It also converges faster than any previous model. While this does not answer the question of how the algorithm learns, it does provide some useful insights into the open question. We present the results from this architectural study of different algorithms and their performance in locating PVs for LHCb data. The goal is to demonstrate progress in developing a performant architecture and evaluate different algorithms' learning.

        Speaker: Simon Akar (University of Cincinnati (US))
      • 149
        Continuous Integration for the FairRoot Software Stack

        The FairRoot software stack is a toolset for the simulation, reconstruction, and analysis of high energy particle physics experiments (currently used i.e. at FAIR/GSI, and CERN). In this work we give insight into recent improvements of Continuous Integration (CI) for this software stack. CI is a modern software engineering method to efficiently assure software quality. We discuss relevant development workflows and how they were improved through automation. Furthermore, we present our infrastructure detailing its hardware and software design choices. The entire toolchain is composed of free and open source software. Finally, this work concludes with lessons learned from an operational as well as a user perspective and outlines ideas for future improvements.

        Speakers: Dennis Klein (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE)), Dr Christian Tacke (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
      • 150
        Data Management interfaces for CMS experiment: building an improved user experience

        After a successful adoption of Rucio following its inception in 2018 as the new data management system, a subsequent step is to advertise this to the users among other stakeholders. In this perspective, one of the objectives is to keep improving the tooling around Rucio. As Rucio introduces a new data management paradigm w.r.t the previous model, we begin by tackling the challenges arising from such a shift in the data model, while trying to alleviate the impact on users. Thus we focus on building a monitoring system capable of answering questions that do not naturally fit the current paradigm while also providing new features and services for the users to naturally push further the adoption and the benefits of the new implementation. In this regard, we present the process of development and evolution path of a set of new interfaces dedicated to the extension of the current monitoring infrastructure and the integration of a user-dedicated CLI capable of granting users an almost seamless transition and enhancement for their daily data management activity. We try to maintain minimum dependencies and ensure decoupling to these tools making them of potential use for other experiments. These will form a set of extensions to the Rucio API that is intended at automating a series of most frequent use cases. Eventually enhancing the user experience and lowering the barriers for newcomers.

        Speaker: Rahul Chauhan (CERN)
      • 151
        Data Quality Monitoring for the JUNO Experiment

        In High Energy Physics (HEP) experiment, Data Quality Monitoring (DQM) system is crucial to ensure the correct and smooth operation of the experimental apparatus during the data taking. DQM at Jiangmen Underground Neutrino Observatory (JUNO) will reconstruct raw data directly from JUNO Data Acquisition (DAQ) system and use event visualization tools to show the detector performance for high quality data taking. The strategy of the JUNO DQM, as well as its design and performance will be presented.

        Speaker: Kaixuan Huang
      • 152
        Distributed data processing pipelines in ALFA

        The common ALICE-FAIR software framework ALFA offers a platform for simulation, reconstruction and analysis of particle physics experiments. FairMQ is a module of ALFA that provides building blocks for distributed data processing pipelines, composed out of components communicating via message passing. FairMQ integrates and efficiently utilizes standard industry data transport technologies, while hiding the transport details behind an abstract interface. In this work we present the latest developments in FairMQ, focusing on the new and improved features of the transport layer, primarily the shared memory transport and the generic interface features. Furthermore, we present the new control and configuration facilities, that allow programmatically controlling a group of FairMQ components. Additionally, new debugging and monitoring tools are highlighted. Finally, we outline how these tools are used by the ALICE experiment.

        Speaker: Alexey Rybalchenko (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
      • 153
        Evaluating Generative Adversarial Networks for particle hit generation in a cylindrical drift chamber using Fréchet Inception Distance

        We evaluate two Generative Adversarial Network (GAN) models developed by the COherent Muon to Electron Transition (COMET) collaboration to generate sequences of particle hits in a Cylindrical Drift Chamber (CDC). The models are first evaluated by measuring the similarity between distributions of particle-level, physical features. We then measure the Effectively Unbiased Fréchet Inception Distance (FID) between distributions of high-dimensional representations obtained with: InceptionV3; then a version of InceptionV3 fine-tuned for event classification; and a 3D Convolutional Neural Network that has been specifically designed for event classification. We also normalize the obtained FID values by the FID for two sets of real samples, setting the scores for different representations on the same scale. This novel relative FID metric is used to compare our GAN models to state-of-the-art natural image generative models.

        Speakers: Irene Andreou, Noam Mouelle (Imperial College London)
      • 154
        Event Display Development for Mu2e using Eve-7

        The Mu2e experiment will search for the CLFV neutrinoless coherent conversion of muon to electron, in the field of an Aluminium nucleus. A custom offline event display has been developed for Mu2e using TEve, a ROOT based 3-D event visualisation framework. Event displays are crucial for monitoring and debugging during live data taking as well as for public outreach. A custom GUI allows event selection and navigation. Reconstructed data like the tracks, hits and clusters can be displayed within the detector geometries upon GUI request. True Monte Carlo trajectory of particles traversing the muon beam line, obtained directly from Geant4 can also be displayed. Tracks are coloured according to their particle ID and users can select the trajectories to be displayed. Reconstructed tracks are refined using a Kalman filter. The resulting tracks can be displayed alongside truth information, allowing visualisation of the track resolution. The user can remove/add data based on energy deposited in a detector or arrival time. This is a prototype and an online event display, is currently under-development using Eve-7 which allows remote access for live data taking and lets multiple users to simultaneously view and interact with the display.

        Speaker: Namitha Chithirasreemadam (University of Pisa)
      • 155
        Experience in SYCL/oneAPI for event reconstruction at the CMS experiment

        The CMS software framework (CMSSW) has been recently extended to perform part of the physics reconstruction with NVIDIA GPUs. To avoid writing a different implementations of the code for each back-end the decision was to use a performance portability library and so Alpaka has been chosen as the solution for Run-3.
        In the meantime different studies have been performed to test the track reconstruction and clustering algorithms on different back-ends like CUDA and Alpaka.
        With the idea of exploring new solutions, INTEL GPUs have been considered as a new possible back-end and their implementation is currently under development.
        This is achieved using SYCL, that is a cross-platform abstraction C++ programming model for heterogeneous computing. It allows developers to reuse code across different hardware and also perform custom tuning for a specific accelerator. The SYCL implementation used is the Data Parallel C++ library (DPC++) in the Intel oneAPI Toolkit.

        In this work, we will present the performance of physics reconstruction algorithms on different hardware. Strengths and weaknesses of this heterogeneous programming model will also be presented.

        Speaker: Aurora Perego (Universita & INFN, Milano-Bicocca (IT))
      • 156
        Exploring the use of accelerators for lossless data compression in CMS

        The CMS collaboration has a growing interest in the use of heterogeneous computing and accelerators to reduce the costs and improve the efficiency of the online and offline data processing: online, the High Level Trigger is fully equipped with NVIDIA GPUs; offline, a growing fraction of the computing power is coming from GPU-equipped HPC centres. One of the topics where accelerators could be used for both online and offline processing is data compression.

        In the past decade a number of research papers exploring the use of GPUs for lossless data compression have appeared in academic literature, but very few practical application have emerged. In the industry, NVIDIA has recently published the nvcomp GPU-accelerated data compression library, based on closed-source implementations of standard and dedicated algorithms. Other platforms, like the IBM Power 9 processors, offer dedicated hardware for the acceleration of data compression tasks.

        In this work we review the recent developments on the use of accelerators for data compression. After summarising the recent academic research, we will measure the performance of representative open- and closed-source algorithms over CMS data, and compare it with the CPU-only algorithms currently used by ROOT and CMS (lz4, zlib, zstd).

        Speaker: Stefan Rua (Aalto University)
      • 157
        General shower simulation MetaHEP in key4hep framework

        Description of development of cascades of particles in a calorimeter of a high energy physics experiment relies on precise simulation of particle interactions with matter. It is inherently slow and constitutes a challenge for HEP experiments. Furthermore, with the upcoming high luminosity upgrade of the Large Hadron Collider and a much increased data production rate, the amount of required simulated events will increase accordingly. Several research directions investigated the use of Machine Learning (ML) based models to accelerate particular calorimeter response simulation. These models typically require a large amount of data and time for training, and the result is a specifically tuned simulation. Meanwhile, meta-learning has emerged in ML community as a fast learning algorithm using small training datasets. In this contribution, we present MetaHEP, a meta-learning approach to accelerate shower simulation in different calorimeters using very high granular data. We show its application using a calorimeter proposed for the Future Circular Collider (FCC-ee) and integration into key4hep framework.

        Speaker: Dalila Salamani (CERN)
      • 158
        Implementation of generic SoA data structure in the CMS software

        GPU applications require a structure of array (SoA) layout for the data to achieve good memory access performance. During the development of the CMS Pixel reconstruction for GPUs, the Patatrack developers crafted various techniques to optimise the data placement in memory and its access inside GPU kernels. The work presented here gathers, automates and extends those patterns, and offers a simplified and consistent programming interface.

        The work automates the creation of SoA structures, fulfilling technical requirements like cache line alignment, while optionally providing alignment and cache hinting to the compiler and range checking. Protection of read-only products of the CMS software framework (CMSSW) is also ensured with constant versions of the SoA. A compact description of the SoA is provided to minimize the size of data passed to GPU kernels. Finally, the user interface is designed to be as simple as possible, providing an AoS-like semantic allowing compact and readable notation in the code.

        The result of porting of CMSSW to SoA will be presented, along with performance measurements.

        Speaker: Eric Cano (CERN)
      • 159
        Improving robustness of jet tagging algorithms with adversarial training

        In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst’s perspective, obtaining highest possible performance is desirable, but recently, some focus has been laid on studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier‘s vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. This contribution presents different approaches using a set of attacks with varying complexity. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account. Additional cross-checks against other, physics-inspired mismodeling scenarios are performed and give rise to the presumption that adversarially trained models can cope better with simulation artifacts or subtle detector effects.

        Speakers: Annika Stein (Rheinisch Westfaelische Tech. Hoch. (DE)), Spandan Mondal (RWTH Aachen (DE))
      • 160
        JETFLOW: Generating jets with Normalizing Flows using the jet mass as condition and constraint

        In this study, jets with up to 30 particles are modelled using Normalizing Flows with Rational Quadratic Spline coupling layers. The invariant mass of the jet is a powerful global feature to control whether the flow-generated data contains the same high-level correlations as the training data. The use of normalizing flows without conditioning shows that they lack the expressive power to do this. Using the mass as a condition for the coupling transformation enhances the model's performance on all tracked metrics. In addition, we demonstrate how to sample the original mass distribution with the use of the empirical cumulative distribution function and we
        study the usefulness of including an additional mass constraint in the loss term. On the JetNet dataset, our model shows state-of-the-art performance combined with a general model and stable training.

        Speaker: Benno Kach (Deutsches Elektronen-Synchrotron (DE))
      • 161
        Machine learning techniques for data quality monitoring at the CMS detector

        The CMS experiment employs an extensive data quality monitoring (DQM) and data certification (DC) procedure. Currently, this approach consists mainly of the visual inspection of reference histograms which summarize the status and performance of the detector. Recent developments in several of the CMS subsystems have shown the potential of computer-assisted DQM and DC using autoencoders, spotting detector anomalies with high accuracy and a much finer time granularity than previously accessible. We will discuss a case study for the CMS pixel tracker, as well as the development of a common infrastructure to host computer-assisted DQM and DC workflows. This infrastructure facilitates accessing the input histograms, provides tools for preprocessing, training and validating, and generates an overview of potential detector anomalies.

        Speaker: Rosamaria Venditti (Universita e INFN, Bari (IT))
      • 162
        Machine learning-based vertex reconstruction for reactor neutrinos in JUNO

        Jiangmen Underground Neutrino Observatory (JUNO), located at the southern part of China, will be the world’s largest liquid scintillator(LS) detector. Equipped with 20 kton LS, 17623 20-inch PMTs and 25600 3-inch PMTs in the central detector, JUNO will provide a unique apparatus to probe the mysteries of neutrinos, particularly the neutrino mass ordering puzzle. One of the challenges for JUNO is the high precision vertex reconstruction for reactor neutrino events. This talk will present machine learning-based vertex reconstruction in JUNO, particularly the comparison of different machine learning models as well as the optimization of the model inputs for better reconstruction performance.

        Speaker: Wuming Luo (Institute of High Energy Physics, Chinese Academy of Science)
      • 163
        Particle Flow Reconstruction on Heterogeneous Architecture for CMS

        The Particle Flow (PF) algorithm, used for a majority of CMS data analyses for event reconstruction, provides a comprehensive list of final-state state particle candidates and enables efficient identification and mitigation methods for simultaneous proton-proton collisions (pileup). The higher instantaneous luminosity expected during the upcoming LHC Run 3 will impose challenges for CMS event reconstruction. This will be amplified in the HL-LHC era, where luminosity and pileup rates are expected to be significantly higher. One of the approaches CMS is investigating to cope with this challenge is to adopt the heterogeneous computing architectures and accelerate event reconstruction. In this talk, we will discuss the effort to adopt the PF reconstruction to take advantage of GPU accelerators.

        We will discuss the design and implementation of PF clustering for the CMS Electromagnetic and Hadronic Calorimeters using Cuda, including optimizations of the PF algorithm. The physics validation and performance of the GPU-accelerated algorithms will be demonstrated by comparing these to the CPU-based implementation.

        Speaker: Felice Pantaleo (CERN)
      • 164
        Preliminary Results of Vectorization of Density Functional Theory calculations in Geant4/V for amino acids

        Density Functional Theory (DFT) is an extended ab initio method used for calculating the electronic properties of molecules. Considering Hartree Fock methods, the DFT offers appropriate approximations regarding the time calculations. Recently, the DFT method has been used for discovering and analyzing protein interactions by means of calculating the free energies of these macro-molecules from short to large scales. However, calculating the ground-state energy by DFT for many-body systems of molecules as proteins, in a reasonable time with enough accuracy, is still a very challenging and intensive task for the CPU’s resources.
        On the other hand, Geant4 is a toolkit for simulating the effects of energy through matter and the nature of materials with a wide range of specialized methods that include DNA and protein exploration. Unfortunately, the execution time to obtain an effective protein analysis is still a strong restriction for CPU processors. In this sense, the GeantV project searches to exploit the vectorization of CPUs, designed to tackle the problem of intensive charge of calculus at the cores of CPUs. In this work, we present the preliminary results of the partial implementation of the DFT in the Geant4 framework and the vectorized GeantV project. We show the advantages and the partial methods used for vectorizing several sub-routines in the calculus of ground-state energy for some amino acids and some molecules.

        Speaker: Oscar Roberto Chaparro Amaro (Instituto Politécnico Nacional. Centro de Investigación en Computación)
      • 165
        Supporting multiple hardware architectures at CMS: the integration and validation of Power9

        Computing resources in the Worldwide LHC Computing Grid (WLCG) have been based entirely on the x86 architecture for more than two decades. In the near future, however, heterogeneous non-x86 resources, such as ARM, POWER and Risc-V, will become a substantial fraction of the resources that will be provided to the LHC experiments, due to their presence in existing and planned world-class HPC installations. The CMS experiment, one of the four large detectors at the LHC, has started to prepare for this situation, with the CMS software stack (CMSSW) already compiled for multiple architectures. In order to allow for a production use, the tools for workload management and job distribution need to be extended to be able to exploit heterogeneous architectures.

        Profiting from the opportunity to exploit the first sizable IBM Power9 allocation available on Marconi100 HPC system at CINECA, CMS developed all the needed modifications to the CMS workload management system. After a successful proof of concept, a full physics validation has been performed in order to bring the system in production. The experiences are of very high value, when it comes to commissioning of the similar (even larger) Summit HPC system at Oak Ridge, where CMS is also expecting a resource allocation. Moreover the compute power of those systems is being provided also via GPUs and this represents an extremely valuable opportunity to exploit the offloading capability already implemented in CMSSW.

        The status of the current integration including the exploitation of the GPUs, the results of the validation as well as the future plans will be shown and discussed.

        Speaker: Daniele Spiga (Universita e INFN, Perugia (IT))
      • 166
        The Level-1 Global Trigger for Phase-2: Algorithms, configuration and integration in the CMS offline framework

        The CMS Level-1 Trigger, for its operation during Phase-2 of LHC, will undergo a significant upgrade and redesign. The new trigger system, based on multiple families of custom boards, equipped with Xilinx Ultrascale Plus FPGAs and interconnected with high speed optical links at 25 Gb/s, will exploit more detailed information from the detector subsystems (calorimeter, muon systems, tracker). In contrast to its implementation during Phase-1, information from the CMS tracker is now also available at the Level-1 Trigger and can be used for particle flow algorithms. The final stage of the Level-1 Trigger, called Global Trigger (GT), will receive more than 20 different trigger object collections from upstream systems and will be able to evaluate a menu of more than 1000 cut-based algorithms distributed over 12 boards. These algorithms may not only apply conditions on parameters such as momentum or angle of a particle, but can also do arithmetic calculations, like the invariant mass of a suspected mother particle of interest or the angle between two particles. The Global Trigger is designed as a modular system, with an easily re-configurable algorithm unit, to meet the demand of high flexibility required for shifting trigger strategies during Phase-2 operation of the LHC. The algorithms themselves are kept highly configurable and tools are provided to allow their study from within the CMS offline software framework (CMSSW) without the need for knowledge of the underlying firmware implementation. To allow the reproducible translation of the physicist-designed trigger menu to VHDL for use in the hardware trigger, a tool has been developed that converts the Python-based configuration used by CMSSW to VHDL. In addition to cut-based algorithms, neural net algorithms are being developed and integrated into the Global Trigger framework. To make use of these algorithms in hardware, the HLS4ML framework is used, which transpiles pre-trained neural nets, generated in the most commonly used software frameworks, into firmware code. A prototype firmware for a single Global Trigger board has been developed, which includes the de-multiplexing logic, conversion to an internal common object format and distribution of the data over all Super Logic Regions. In this framework 312 algorithms are implemented at a clock speed of 480MHz. The prototype has been thoroughly tested and verified with the bit-wise compatible C++ emulator. In this contribution we present the Phase-2 Global Trigger with an emphasis on the Global Trigger algorithms, their implementation in hardware, configuration with Python and the novel integration within the CMS offline software framework (CMSSW).

        Speaker: Elias Leutgeb (Technische Universitaet Wien (AT))
      • 167
        Trigger Rate Monitoring Tools at CMS

        With the start of run 3 in 2022, the LHC has entered a new period, now delivering higher energy and luminosity proton beams to the Compact Muon Solenoid (CMS) experiment. These increases make it critical to maintain and upgrade the tools and methods used to monitor the rate at which data is collected (the trigger rate). Software tools have been developed to allow for automated rate monitoring, and we present several upgrades to these software tools, which maintain and expand on their functionality. These trigger rate monitoring tools allow for real-time monitoring including alerts which go out to on-call experts in the case of abnormalities. Fits are produced from previously collected data and extrapolate the behaviors of the triggers as a function of pile-up (the average number of particle interactions per bunch-crossing). These fits allow for visualization and statistical analysis of the behavior of the triggers and are displayed on the online monitoring system (OMS). The rate monitoring code can also be used for offline data certification and more complex trigger analysis. This presentation will show some of the upgrades to this software with an emphasis on the automation for easier and consistent upgrades and fixes to the software, and the increased interactivity with the users.

        Speaker: John Lawrence (University of Notre Dame (US))
      • 168
        Updates on the Low-Level Abstraction of Memory Access

        Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program.
        The low-level abstraction of memory access (LLAMA) is a C++ library that provides a zero-runtime-overhead abstraction layer, underneath which memory layouts can be freely exchanged, focusing on multidimensional arrays of nested, structured data.
        It provides a framework for defining and switching custom memory mappings at compile time to define data layouts, data access and access instrumentation, making LLAMA an ideal tool to tackle memory-related optimization challenges in heterogeneous computing.
        After its scientific debut, several improvements and extensions have been added to LLAMA. This includes compile-time array extents for zero memory overhead, support for computations during memory access, new mappings (e.g. int/float bit-packing or byte-swapping) and more. This contribution provides an overview of the LLAMA library, its recent development and an outlook of future activities.

        Speaker: Bernhard Manfred Gruber (Technische Universitaet Dresden (DE))
      • 169
        Variational AutoEncoders for Anomaly Detection in VBS events within an EFT framework

        We present a machine-learning based method to detect deviations from a reference model, in an almost independent way with respect to the theory assumed to describe the new physics responsible for the discrepancies.

        The analysis is based on an Effective Field Theory (EFT) approach: under this hypothesis the Lagrangian of the system can be written as an infinite expansion of terms, where the first ones are those from the Standard Model (SM) Lagrangian and the following terms are higher dimension operators. The presence of the EFT operators impacts the distributions of the observables by producing deviations from the shapes expected when the SM Lagrangian alone is considered .

        We use a Variational AutoEncoder (VAE) trained on SM processes to identify EFT contributions as anomalies. While SM events are expected to be reconstructed properly, events generated taking into account EFT contributions are expected to be poorly reconstructed, thus accumulating in the tails of the loss function distribution. Since the training of the model does not depend on any specific new physics signature, the proposed strategy does not make specific assumptions on its nature. In order to improve the discrimination performances, we introduced a DNN classifier that distinguishes between EFT and SM events based on the values of the reconstruction and regularization losses of the model. In this second model a cross entropy term is added to the usual loss of the VAE, optimizing at the same time the reconstruction of the input variables and the classification. This procedure ensures that the model is optimized for discrimination, with a small price in terms of model independency due to the use of one of the 15 operators from the EFT model in the training.

        In this talk we will discuss in detail the above-mentioned methods using generator level VBS events produced at LHC and assuming, in order to compute the significance of possible new physics contributions, an integrated luminosity of $350 fb^{-1}$.

        Speaker: Giulia Lavizzari
      • 170
        XRootD caching for Belle II

        The Belle II experiment at the second generation e+/e- B-factory SuperKEKB has been collecting data since 2019 and aims to accumulate 50 times more data than the first generation experiment, Belle.
        To efficiently process these steadily growing datasets of recorded and
        simulated data that end up on the order of 100 PB and to support
        Grid-based analysis workflows using the DIRAC Workload Management
        System, an XRootD-based caching architecture is presented.
        The presented mechanism decreases job waiting time for often-used datasets by transparently adding copies of these files at smaller sites without managed storage.
        The described architecture seamlessly integrates local storage services and supports the use of dynamic computing resources with minimal deployment effort.
        This is especially useful in environments with many institutions providing comparatively small numbers of cores and limited personpower.

        This talk will describe the implemented cache at GridKa, a main computing centre for Belle II, as well as its performance and upcoming opportunities for caching for Belle II.

        Speaker: Moritz David Bauer
    • Track 1: Computing Technology for Physics Research Sala Federico II (Villa Romanazzi)

      Sala Federico II

      Villa Romanazzi

      Conveners: Dr Stefano Bagnasco (Istituto Nazionale di Fisica Nucleare, Torino), Gioacchino Vino (INFN Bari (IT))
      • 171
        Design and implementation of zstd compression algorithm for high energy physics experiment data processing based on FPGA

        With the continuous increase in the amount of large data generated and stored in various scientific fields ,such as cosmic ray detection, compression technology becomes more and more important in reducing the requirements for communication bandwidth and storage capacity. Zstandard, abbreviated as zstd, is a fast lossless compression algorithm. For zlib-level real-time compression scenarios, it can have a good compression ratio and a faster speed than similar algorithms. In this paper, we introduce the architecture of a new zstd compression kernel, and combine it with the root framework (an open-source data analysis framework used by high energy physics and others), and optimize the proposed architecture for the specific use case of lhaaso km2a data decode. The optimized kernel is implemented on Xilinx alveo U200 board.

        Speaker: Mr Xuyang Zhou
      • 172
        Precision Cascade: A novel algorithm for multi-precision extreme compression

        Lossy compression algorithms are incredibly useful due to powerful compression results. However, lossy compression has historically presented a trade-off between the retained precision and the resulting size of data compressed with a lossy algorithm. Previously, we introduced BLAST, a state-of-the-art compression algorithm developed by Accelogic. We presented results that demonstrated BLAST can achieve a compression factor that undeniably surpasses compression algorithms currently available in the ROOT framework. However, the leading concern of utilizing the lossy compression technique is the delayed realization that more precision is necessary. This precision may have been irretrievably lost in an effort to decrease storage size. Thus, there is immense value in retaining higher precision data in reserve. Though, in the era of exabyte computing, it becomes extremely inefficient and costly to duplicate data stored at different compressive precision values. A tiered cascade of stored precision optimizes data storage and resolves these fundamental concerns.

        Accelogic has developed a game-changing compression technique, known as “Precision Cascade”, which enables higher precision to be stored separately without duplicating information. With this novel method, varying levels of precision can be retrieved, potentially minimizing live storage space. Preliminary results from STAR and CMS demonstrate that multiple layers of precision can be stored and retrieved without significant penalty to the compression ratios and (de)compression speeds, when compared to the single-precision BLAST baseline.

        In this contribution, we will present the integration of Accelogic’s “Precision Cascade” into the ROOT framework, with the principal purpose of enabling high-energy physics experiments to leverage this state-of-the-art algorithm with minimal friction. We also present our progress in exploring storage reduction and speed performance with this new compression tool in realistic examples from both STAR and CMS experiments and feel we are ready to deliver the compression algorithm to the wider community.

        Speaker: Yueyang Ying (Massachusetts Inst. of Technology (US))
      • 173
        Portable Programming Model Exploration for LArTPC Simulation in a Heterogeneous Computing Environment: OpenMP vs. SYCL

        The evolution of the computing landscape has resulted in the proliferation of diverse hardware architectures, with different flavors of GPUs and other compute accelerators becoming more widely available. To facilitate the efficient use of these architectures in a heterogeneous computing environment, several programming models are available to enable portability and performance across different computing systems, such as Kokkos, SYCL, OpenMP and others. As part of the High Energy Physics Center for Computational Excellence (HEP-CCE) project, we investigate if and how these different programming models may be suitable for experimental HEP workflows through a few representative use cases. One of such use cases is the Liquid Argon Time Projection Chamber (LArTPC) simulation which is essential for LArTPC detector design, validation and data analysis. Following up on our previous investigations [1, 2] of using Kokkos to port LArTPC simulation in the Wire-Cell Toolkit (WCT) to GPUs, we have explored OpenMP and SYCL as potential portable programming models for WCT, with the goal to make diverse computing resources accessible to the LArTPC simulations. In this presentation, we will describe how we utilize relevant features of OpenMP and SYCL for the LArTPC simulation module in WCT. We will also show performance benchmark results on multi-core CPUs, NVIDIA and AMD GPUs for both the OpenMP and the SYCL implementations. Comparisons with different compilers will be given. Advantages and disadvantages of using OpenMP, SYCL and Kokkos in this particular use case will also be discussed.

        Speaker: Dr Meifeng Lin (Brookhaven National Laboratory (US))
      • 174
        Efficient and Accurate Automatic Python Bindings with Cppyy and Cling

        The simplicity of Python and the power of C++ provide a hard choice for a scientific software stack. There have been multiple developments to mitigate the hard language boundaries by implementing language bindings. The static nature of C++ and the dynamic nature of Python are problematic for bindings provided by library authors and in particular features such as template instantiations with user-defined types or more advanced memory management.

        The development of the C++ interpreter Cling has changed the way we can think of language bindings as it provides an incremental compilation infrastructure available at runtime. That is, Python can interrogate C++ on demand and fetch only the necessary information. This way of automatic binding provision requires no binding support by the library authors and offers better performance than Pybind11. This approach pioneered in ROOT with PyROOT and later was enhanced with its successor Cppyy. However, until now, Cppyy relied on the reflection layer of ROOT which is limited in terms of provided features and performance.

        In this talk we show how basing Cppyy purely on Cling yields better correctness, performance and installation simplicity. We illustrate more advanced language interoperability of Numba-accelerated Python code capable of calling C++ functionality via Cppyy. We outline a path forward for integrating the reflection layer in LLVM upstream which will contribute to the project sustainability and will foster greater user adoption. We demonstrate usage of Cppyy through Cling’s LLVM mainline version Clang-Repl.

        Speaker: Baidyanath Kundu (Princeton University (US))
    • Track 2: Data Analysis - Algorithms and Tools Sala Europa (Villa Romanazzi)

      Sala Europa

      Villa Romanazzi

      Conveners: Claudio Caputo (Universite Catholique de Louvain (UCL) (BE)), Gregor Kasieczka (Hamburg University (DE))
      • 175
        A method for inferring signal strength modifiers by conditional invertible neural networks

        The continuous growth in model complexity in high-energy physics (HEP) collider experiments demands increasingly time-consuming model fits. We show first results on the application of conditional invertible networks (cINNs) to this challenge. Specifically, we construct and train a cINN to learn the mapping from signal strength modifiers to observables and its inverse. The resulting network infers the posterior distribution of the signal strength modifiers rapidly and for low computational cost. We present performance indicators of such a setup including the treatment of systematic uncertainties and highlight the features of cINNs estimating a signal strength for HEP-data on simulations.

        Speaker: Mate Zoltan Farkas (Rheinisch Westfaelische Tech. Hoch. (DE))
      • 176
        Constraining Cosmological Parameters from Dark Matter Halo Abundance using Simulation-Based Inference

        Constraining cosmological parameters, such as the amount of dark matter and dark energy, to high precision requires very large quantities of data. Modern survey experiments like DES, LSST, and JWST, are acquiring these data sets. However, the volumes and complexities of these data – variety, systematics, etc. – show that traditional analysis methods are insufficient to exhaust the information contained in these survey data. Specifically, explicit likelihood-based inference as performed with MCMC likelihood fitting is prone to biases because the likelihoods are written as analytic expressions. This calls for a method that can simultaneously process large volumes of data and handle biases in an efficient manner. Simulation-based inference (SBI or likelihood-free inference) is rapidly gaining popularity for addressing diverse cosmological problems because of its ability to incorporate complex physical processes (statistical fluctuations of cluster properties) and observational effects (non-linear measurement errors) while generating the observables by forward simulations. In this work, we train a normalizing-flow-based machine learning algorithm embedded in the SBI framework on two datasets - generated by analytical forward models (via CosmoSIS) and N-body simulations (Quijote simulations suite). We use number counts and mean masses of dark matter halos to estimate posteriors of multiple cosmological parameters (e.g., Ωm, Ωb, h, ns, σ8). Our results show that the SBI method constrains the cosmological parameters within 2σ, which is comparable to the state-of-the-art MCMC-based inference methods, and results in a smaller bias for some parameters (h and ns) than MCMC. Furthermore, SBI trained on the Quijote simulations data permits a much shorter computational time when dealing with large datasets, compared to MCMC method.

        Speaker: Moonzarin Reza (Texas A&M University)
      • 177
        First performance measurements with the Analysis Grand Challenge

        The IRIS-HEP Analysis Grand Challenge (AGC) is designed to be a realistic environment for investigating how analysis methods scale to the demands of the HL-LHC. The analysis task is based on publicly available Open Data and allows for comparing usability and performance of different approaches and implementations. It includes all relevant workflow aspects from data delivery to statistical inference.

        The reference implementation for the AGC analysis task is heavily based on tools from the HEP Python ecosystem. It makes use of novel pieces of cyberinfrastructure and modern analysis facilities in order to address the data processing challenges of the HL-LHC.

        This contribution compares multiple different analysis implementations and studies their performance. Differences between the implementations include the use of multiple data delivery mechanisms and caching setups for the analysis facilities under investigation.

        Speaker: Oksana Shadura (University of Nebraska Lincoln (US))
      • 178
        The Federation - A novel machine learning technique applied on data from the Higgs Boson Machine Learning Challenge

        The Federation is a new machine learning technique for handling large amounts of data in a typical high-energy physics analysis. It utilizes Uniform Manifold Approximation and Projection (UMAP) to create an initial low-dimensional representation of a given data set, which is clustered by using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). These clusters can then be used for a federated learning approach, in which we separately train a classifier on the data of each individual cluster. As a requirement for this approach, we need to apply an imbalanced learning method to the data in the found clusters before the training. By using a Dynamic Classifier Selection method, the Federation can then make predictions for the whole data set. As a proof of concept for this novel technique, open data from the Higgs Boson Machine Learning Challenge is used and comparisons to results from established methods will be presented. We also investigated the issue of handling missing values and the jet-count feature for this data.

        Speaker: Maximilian Mucha (University of Bonn (DE))
    • Track 3: Computations in Theoretical Physics: Techniques and Methods Sala A+A1 (Villa Romanazzi)

      Sala A+A1

      Villa Romanazzi

      Conveners: Anke Biekoetter (IPPP Durham), Domenico Elia (INFN Bari)
      • 179
        Bridge between Classical & Quantum Machine Learning

        Tensor Networks (TN) are approximations of high-dimensional tensors designed to represent locally entangled quantum many-body systems efficiently. In this talk, we will discuss how to use TN to connect quantum mechanical concepts to machine learning techniques, thereby facilitating the improved interpretability of neural networks. As an application, we will use top jet classification against QCD jets and compare performance against state-of-the-art machine learning applications. Finally, we will discuss how to convert these models into Quantum Circuits to be compiled on a quantum device and show that classical TNs require exponentially large bond dimensions and higher Hilbert-space mapping to perform comparably to their quantum counterparts.

        Speaker: Jack Y. Araz (IPPP - Durham University)
      • 180
        Conditional Born machine for Monte Carlo events generation

        The potential exponential speed-up of quantum computing compared to classical computing makes it to a promising method for High Energy Physics (HEP) simulations at the LHC at CERN.
        Generative modeling is a promising task for near-term quantum devices, the probabilistic nature of quantum mechanics allows us to exploit a new class of generative models: quantum circuit Born machine (QCBM).
        These models use the stochastic nature of quantum measurement as random-like sources and have no classical analog.
        More specifically, they produce samples from the underlying distribution of a pure quantum state by measuring a parametrized quantum circuit with probability given by the Born rule
        This work presents an application of Born machines to Monte Carlo simulations and extends their reach to multivariate and conditional distributions.
        Even if generating multivariate distributions with Born machines has already been explored, we propose an alternative circuit design with a reduced connectivity, better suited for NISQ devices.
        Indeed, models are run on (noisy) simulators and IBM Quantum superconducting devices.
        More specifically, Born machines are used to generate muonic force carriers (MFC) events resulting from scattering processes between muons and the detector material in high-energy-physics colliders experiments. MFCs are bosons appearing in beyond the standard model theoretical frameworks, which are candidates for dark matter. Empirical evidences suggest that Born machines can reproduce the underlying distribution of datasets coming from Monte Carlo simulations, and are competitive with classical machine learning-based generative models of similar complexity.

        Speaker: Michele Grossi (CERN)
      • 181
        Quantum neural networks force fields generation

        Accurate molecular force fields are of paramount importance for the efficient implementation of molecular dynamics techniques at large scales. In the last decade, machine learning methods have demonstrated impressive performances in predicting accurate values for energy and forces when trained on finite size ensembles generated with ab initio techniques. At the same time, quantum computers have recently started to offer new viable computational paradigms to tackle such problems. On the one hand, quantum algorithms may notably be used to extend the reach of electronic structure calculations. On the other hand, quantum machine learning is also emerging as an alternative and promising path to quantum advantage. Here we follow this second route and establish a direct connection between classical and quantum solutions for learning neural network potentials. To this end, we design a quantum neural network architecture and apply it successfully to different molecules of growing complexity. The quantum models exhibit larger effective dimension with respect to classical counterparts and can reach competitive performances, thus pointing towards potential quantum advantages in natural science applications via quantum machine learning.

        Speaker: Oriel Orphee Moira Kiss (Universite de Geneve (CH))
    • Plenary Sala Europa (Villa Romanazzi Carducci)

      Sala Europa

      Villa Romanazzi Carducci

      Conveners: Leonardo Cosmai, Monique Werlen (EPFL - Ecole Polytechnique Federale Lausanne (CH)), Monique Werlen (EPFL - Ecole Polytechnique Federale Lausanne (CH))
      • 182
        Updates from the organizers
        Speakers: Axel Naumann (CERN), Lucia Silvestris (Universita e INFN, Bari (IT))
      • 183
        AI in the SKA Era: learning semantically meaningful classification targets for radio astronomy

        The expected volume of data from the new generation of scientific facilities such as the Square Kilometre Array (SKA) radio telescope has motivated the expanded use of semi-automatic and automatic machine learning algorithms for scientific discovery in astronomy. In this field, the robust and systematic use of machine learning faces a number of specific challenges, including both a lack of labelled data for training (paradoxically although we have too much data we also don't have enough) and an inheritance of abstracted and sometimes subjective classification terminology. In this talk I will discuss our recent work using language models to derive semantic features that can be mapped to astrophysical target classes using non-technical language. This method is domain-agnostic and publicly available, and we hope that it may also prove useful for other scientific fields where expert data labelling is otherwise costly.

        Speaker: Anna Scaife (University of Manchester)
      • 184
        How Good is the Standard Model?

        Strategies to detect data departures from a given reference model, with no prior bias on the nature of the new physical model responsible for the discrepancy might play a vital role in experimental programs where, like at the LHC, increasingly rich experimental data are accompanied by an increasingly blurred theoretical guidance in their interpretation. I will describe one such strategy that employs neural networks, leveraging their virtues as flexible function approximants, but builds its foundations directly on the canonical likelihood-ratio approach to hypothesis testing. The algorithm compares observations with an auxiliary set of reference-distributed events, possibly obtained with a Monte Carlo event generator. It returns a p-value, which measures the compatibility of the reference model with the data. It also identifies the most discrepant phase-space region of the dataset, to be selected for further investigation. Imperfections due to mismodelling in the reference dataset can be taken into account straightforwardly as nuisance parameters.

        Speaker: Andrea Wulzer (Universita e INFN, Padova (IT))
      • 185
        Achievements and challenges of high-precision Standard Model physics at future e+e-colliders

        The new concepts of future electron-positron colliders such as Future Circular Collider, International Linear Collider or Circular Electron-Positron Collider push the precision state-of-the-art in experimental measurements. The tremendous efforts of experimental physicists to test the immense predictive power of the Standard Model are limited by the intrinsic uncertainties in the currently available theoretical calculations.

        The bottleneck is due to the increasing complexity of Feynman integral calculations. In recent years, modern methods for the reduction and calculation of the Feynman integral have been developed. We will present some of the tools available on the market and highlight the advances in Feynman integral calculations.

        Speaker: Johann Usovitsch
    • Poster session with coffee break Area Poster (Floor -1) (Villa Romanazzi)

      Area Poster (Floor -1)

      Villa Romanazzi

      • 186
        A web based graphical user interface for X-ray computed tomography imaging

        The high-performance fourth-generation synchrotron radiation light source, e.g., the High Energy Photon Source (HEPS) has been proposed and built successively. The advent of beamlines at fourth-generation synchrotron sources and the advanced detector has made significant progress that push the demand for computing resource at the edge of current workstation capabilities. On the other hand, the vast data volume produced by specific experiments makes it difficult for users to take data away. In this case, on-site data analysis services are necessary both during and after experiments. On top of this, most synchrotron light source has shifted to prolonged remote operation because of the outbreak of a global pandemic, with the need for remote access to the online instrumental system during the experiments.
        A data analysis platform with a graphical user interface (GUI) accessible via the browser-based Jupyter notebook framework was developed to address the above requirements. It aims to provide an interactive and user-friendly tool for the analysis of X-ray synchrotron radiation CT data collected during experiments. This platform allows remote access and quick reconstruction of large datasets from synchrotron radiation CT experiments. Various techniques to subtract background, normalize signal, reconstruct slice, and post-process the image have been made available. Through containerization and container orchestration techniques, it allows the platform to operate on heterogeneous and different scale computing resources.
        This presentation will describe the design and status of the web-based data analysis platform for the CT imaging beamline of HEPS, as well as the future plan for this platform.

        Speaker: Yu Hu
      • 187
        Accelerating ROOT compression with Intel ISA-L library

        ROOT TTree has been widely used in the analysis and storage of various high-energy physical experiment data. The event data generated by the experiment is stored in TTree's bunch and further compressed and archived into a standard ROOT format file. At present, ROOT supports the compression storage of TBasket, the buffer of TBranch, using compression algorithms such as zlib, lzma, lz4, zstd, etc., and maximizes performance by using different compression algorithms in different scenarios, which is of great significance for the increasing amount of high-energy physical data. With the continuous improvement of hardware technology, it is possible to accelerate specific commonly used algorithms from the underlying hardware layer. In this article, by using ISA-L(The Intel Intelligent Storage Acceleration Library), the compression algorithm of ROOT is extended on the Intel X86 machine, enriching the options for ROOT data compression and further improving the comprehensive performance of TTree data compression. Performance tests on intel Xeon Silver 4215R CPUs indicate that the compression time using the ISA-L library is 25% higher than that of the ZSTD algorithm, and the compression rate is slightly better than ZSTD, but the decompression speed is slower than ZSTD. Adding ISA-L support to root allows users to choose more compression methods and effectively reduces compression time.

        Speaker: Yu Gao
      • 188
        Advancing Opportunistic Resource Management via Simulation

        Modern high energy physics experiments and similar compute intensive fields are pushing the limits of dedicated grid and cloud infrastructure. In the past years research into augmenting this dedicated infrastructure by integrating opportunistic resources, i.e. compute resources temporarily acquired from third party resource providers, has yielded various strategies to approach this challenge. However, work on this topic is usually driven by practical needs to use specific resource providers for production workflows; in this context, research is ad hoc and relies on impressions gained during unique situations of resource providers, resource demand and opportunistic resource management. Replicating or even preparing a specific situation to investigate opportunistic resource management is extremely challenging or even impossible. More importantly research in the field of opportunistic resource management is therefore extremely limited.

        We propose to tackle this challenge using simulation and to this end present the simulation framework LAPIS, a general purpose scheduling simulator offering programmatic control of resources. We demonstrate this approach by integrating LAPIS with the COBalD/TARDIS resource manager to investigate the behaviour of this resource manager in a simulated environment.

        Speaker: Max Fischer (Karlsruhe Institute of Technology)
      • 189
        AI/ML for PID in the Charged Pion Polarizability Experiment at Jefferson Lab}

        A precise measurement of the polarizability of the charged pion provides an important experimental test of our understanding of low-energy QCD. The goal of the Charged Pion Polarizability (CPP) experiment in Hall D at JLab, currently underway, is to make a precision measurement of this quantity through a high statistics study of the γγ → π+π− reaction near 2π threshold. The production of Bethe-Heitler electron and muon pairs present significant backgrounds, which demand high discrimination between e/π and μ/π to select a clean pion-pair signal. Two independent AI/ML projects were developed to classify μ/π and e/π respectively: a tensorflow-lite model (training in python, inference in C++) for μ/π, and the TMVA package from ROOT for e/π. A new detector, consisting of iron absorbers interspersed with multi-wire proportional chambers, was constructed to enhance the discrimination between muons and pions. Both models were deployed in real time data monitoring to verify good experimental conditions.

        Speaker: Andrew Schick
      • 190
        Auto-tuning capabilities of the ACTS track reconstruction suite

        The reconstruction of particle trajectories is a key challenge of particle physics experiments as it directly impacts particle reconstruction and physics performances. To reconstruct these trajectories, different reconstruction algorithms are used sequentially. Each of these algorithms use many configuration parameters that need to be fine-tuned to properly account for the detector/experimental setup, the available CPU budget and the desired physics performance. Examples for such parameters are cut values limiting the search space of the algorithm, approximations accounting for complex phenomenons or parameters controlling algorithm performance. Until now, these parameters had to be optimised by human experts which is inefficient and raises issues for the long term maintainability of such algorithms. Previous experiences with using machine learning for particle reconstruction (such as the TrackML challenge) have shown that they can be easily adapted to different experiments by learning directly from the data. We propose to bring the same approach to the classic track reconstruction algorithms by connecting them to an agent driven optimiser which will allow us to find the best set of input parameters using an iterative tuning approach. We have so far demonstrated this method on different track reconstruction algorithms within A Common Tracking Software (ACTS) framework using the Open Data Detector (ODD). These algorithms include the trajectory seed reconstruction and selection, the particle vertex reconstruction and the generation of simplified material map used for trajectory reconstruction. Finally, we present a development plan for a flexible integration of tunable parameters within the ACTS framework to bring this approach to all aspects of trajectory reconstruction.

        Speakers: Corentin Allaire (Université Paris-Saclay (FR)), Rocky Bala Garg (Stanford University (US))
      • 191
        Bayesian method for waveform analysis with GPU acceleration

        One way to improve the position and energy resolution in neutrino experiments, is to give parameters with high resolution to the reconstruction method. These parameters, the photon electron(PE) hit time and the expectation of PE count, can be analyzed from the waveforms. We developed a new waveform analysis method called Fast Scholastic Matching Pursuit(FSMP). It is based on Bayesian principles, and the possible solutions are sampled with Markov Chain Monte Carlo(MCMC). To accelerate the method, we ported it to GPU, and could analysis the waveforms with 0.01s per waveform. This method extracts all the information in the waveforms, and will benefit event reconstruction with high resolution. With the improved resolution, we can make our way to our final physics goal.

        Speaker: Yuyi Wang (Tsinghua University)
      • 192
        BESIII track reconstruction algorithm based on machine learning

        Track reconstruction (or tracking) plays an essential role in the offline data processing of collider experiments. For the BESIII detector working in the tau-charm energy region, plenty of efforts were made previously to improve the tracking performance with traditional methods, such as pattern recognition and Hough transform etc. However, for challenging tasks, such as the tracking of low momentum tracks, tracks from secondary vertices and tracks with high noise level, there is still large room for improvement.

        In this contribution, we demonstrate a novel tracking algorithm based on machine learning method. In this method, a hit pattern map representing the connectivity between drift cells is established using an enormous MC sample, based on which we design an optimal method of graph construction, then an edge-classifying Graph Neural Network is trained to distinguish the hit-on-track from noise hits. Finally, a clustering method based on DBSCAN is developed to cluster hits from multiple tracks. Track fitting algorithm based on GENFIT is also studied to obtain the track parameters, where deterministic annealing filter are implemented to deal with ambiguities and potential noises.

        The preliminary results on BESIII MC sample presents promising performance, showing potential to apply this method to other drift chamber based trackers as well, such as the CEPC and STCF detectors under pre-study.

        Keywords: machine learning, tracking, drift chamber, GNN

        Reference:
        1. Steven Farrell et al, Novel deep learning methods for track reconstruction. arxiv: 1810.06111
        2. A Generic Track-Fitting Toolkit. https://github.com/GenFit/GenFit

        Speaker: Ms Xiaoqian Jia (Shandong University)
      • 193
        CaloPointFlow - Generating Calorimeter Showers as Point Clouds

        In particle physics, precise simulations are necessary to enable scientific progress. However, accurate simulations of the interaction processes in calorimeters are complex and computationally very expensive, demanding a large fraction of the available computing resources in particle physics at present. Various generative models have been proposed to reduce this computational cost. Usually, these models interpret calorimeter showers as 3D images in which each active cell of the detector is represented as a voxel. This approach becomes difficult for high-granularity calorimeters due to the larger sparsity of the data.

        In this study, we use this sparseness to our advantage and interpret the calorimeter showers as point clouds. More precisely, we consider each hit as part of a hit distribution depending on a global latent calorimeter shower distribution.

        Our model is based on PointFlow (Yang et al. 2019) and consists of a permutation invariant encoder and two normalizing flows. One flow models the global latent calorimeter shower distribution. The other flow models the distribution of individual hits conditioned on the calorimeter shower distribution.

        We present first results, they are shown and compared with state-of-the-art voxel methods.

        Speaker: Simon Schnake (DESY / RWTH Aachen University)
      • 194
        Deploying a cache content delivery network for CMS experiment in Spain

        The Xrootd protocol is used by CMS experiment of LHC to access, transfer, and store data within Worldwide LHC Computing Grid (WLCG) sites running different kinds of jobs on their compute nodes. Its redirector system allows some execution tasks to run by accessing input data that is stored on any WLCG site. In 2029 the Large Hadron Collider (LHC) will start the High-Luminosity LHC (HL-LHC) program, when the luminosity will increase in a factor 10 as compared to the current values. This scenario will also imply an unprecedented increase of simulation and collision data to transfer, process and store in disk and tape systems. The Spanish WLCG sites that support CMS, the PIC Tier-1 and the CIEMAT Tier-2 have explored content delivery network type solutions in the Spanish region. One of the possible solutions under development has been the deployment of caches between the two sites that store the data requested by the jobs remotely, so that they get closer to the nodes to improve their job efficiency and input data transfer latency. In this contribution, we analyze the impact of deploying physical caches in production in the CMS region between PIC and CIEMAT, as well as the impact they have on job efficiency, latency and bandwidth gains, and potential storage savings.

        Speaker: Carlos Perez Dengra (PIC-CIEMAT)
      • 195
        Development of a lightweight database interface for accessing JUNO conditions and parameters data

        The Jiangmen Underground Neutrino Observatory (JUNO) has a very rich physics program which primarily aims to the determination of the neutrino mass ordering and to the precisely measurement of oscillation parameters. It is under construction in South China at a depth of about 700~m underground. As data taking will start in 2023, a complete data processing chain is developed before the data taking. Conditions and parameters data, as non-event data, are one of important parts in the data processing chain, which are used by reconstruction and simulation. These data could be accessed via Frontier on JUNO-DCI (Distributed Computing Infrastructure), or via databases, such as MySQL and SQLite in local clusters.

        In this contribution, the latest development of a lightweight database interface (DBI) for JUNO conditions and parameters data management system will be shown. This interface provides a unified method to access data from different backends, such as Frontier, MySQL and SQLite: production jobs could run on JUNO-DCI with Frontier; testing jobs could run in a local cluster with MySQL to validate the conditions and parameters data; fast reconstruction could run in a DAQ environment onsite using SQLite without any connections to remote database. Modern C++ template techniques are used in DBI: extension of a new backend is defined by a simple \texttt{struct} with two methods \texttt{doConnect} and \texttt{doQuery}; result sets are binding to \texttt{std::tuple} and the types of all the elements are known at compile-time. Finally, DBI is used by high-level user interfaces: data models in the database are mapping to normal C++ classes, so that users could access these objects without knowing DBI.

        Speaker: Tao Lin (Chinese Academy of Sciences (CN))
      • 196
        Differentiating through Awkward Arrays using JAX and a new CUDA backend for Awkward Arrays

        Awkward Array is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms. Auto-differentiation (also known as “autograd” and “autodiff”) is a technique for computing the derivative of a function defined by an algorithm, which requires the derivative of all operations used in that algorithm to be known.

        The grad-hep group is primarily focused on end-to-end analysis, and they use JAX as their primary library for auto-differentiation. As part of such an effort, we developed an interoperability layer between JAX and Awkward Arrays using JAX’s pytrees API. JAX now differentiates most of the Awkward Array functions including reducers algorithms. This allows investigators to differentiate through their functions if they are using Uproot with Awkward Arrays. However, extending JAX’s vectorized mapping APIs is not possible currently, because of the fundamental differences between the two libraries.

        Future work on this might involve testing for a large subset of most commonly used differentiable cases. Currently, testing is carried out on a relatively small number of cases which were developed to catch edge cases.

        We also developed a GPU backend for Awkward Arrays by leveraging CuPy’s CUDA capabilities. Awkward Arrays now has the entire infrastructure to support operations on a GPU. However, many low-level “C” Kernels (115/204) are yet to be translated to CUDA. After implementing this, Awkward Arrays will have full GPU support and this would indirectly help in making auto-differentiation fully deployable on the GPUs too.

        Speaker: Anish Biswas (Princeton University (US))
      • 197
        Equivariant Graph Neural Networks for Charged Particle Tracking

        A broad range of particle physics data can be naturally represented as graphs. As a result, Graph Neural Networks (GNNs) have gained prominence in HEP and have increasingly been adopted for a wide array of particle physics tasks, including particle track reconstruction. Most problems in physics involve data that have some underlying compatibility with symmetries. These problems may either require, or at the very least, benefit from models that perform computations and construct representations that reflect these symmetries. In this work, we explore the application of symmetry group equivariance to GNNs within the context of charged particle tracking in pileup conditions similar to those expected at the high-luminosity Large Hadron Collider. In particular, we investigate whether rotationally-equivariant GNNs can perform competitively and yield models that either contain fewer, more expressive learned parameters or are more efficient vis-à-vis data and computational requirements. To our knowledge, this is the first study exploring equivariant GNNs for a track reconstruction use case. Additionally, we perform a side-by-side comparison of equivariant and non-equivariant architectures over evaluation metrics that capture both outright tracking performance as well as the track-building power-to-weight ratio of physics-constrained GNNs.

        Speaker: Ameya Thete (Birla Institute of Technology and Science, Pilani - KK Birla Goa Campus (IN))
      • 198
        Evaluating Portable Parallelization Strategies for Heterogeneous Architectures

        High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture specific implementations is not a viable scenario, given the available person power and code maintenance issues.

        The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and Alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using an assortment of representative use cases from DUNE, LHC ATLAS and CMS experiments. Central to the project is to develop a list of metrics that evaluate the suitability of each portability layer for the various testbeds. This list includes both subjective ratings, such as the ease of learning the language, and objective criteria such as performance.

        We report on the status of these projects, the development and evaluation of the metrics, as well as the current benchmarks and evaluations of the portability layers for the testbeds under study and recommendations for HEP experiments seeking forward looking portability solutions.

        Speaker: Charles Leggett (Lawrence Berkeley National Lab (US))
      • 199
        Hyperparameter Optimization as a Service on INFN Cloud

        The simplest and often most effective way of parallelizing the training of complex Machine Learning models is to execute several training instances on multiple machines, possibly scanning the hyperparameter space to optimize the underlying statistical model and the learning procedure.
        Often, such a meta learning procedure is limited by the ability of accessing securely a common database organizing the knowledge of the previous and ongoing trials. Exploiting opportunistic GPUs provided in different environments represents a further challenge when designing such optimization campaigns.
        In this contribution we discuss how a set of REST APIs can be used to access a dedicated service based on INFN Cloud to monitor and possibly coordinate multiple training instances, with gradientless optimization techniques, via simple HTTP requests. The service, named Hopaas (Hyperparameter OPtimization As A Service), is made of web interface and sets of APIs implemented with a FastAPI back-end running through Uvicorn and NGINX in a virtual instance of INFN Cloud. The optimization algorithms are currently based on Bayesian techniques as provided by Optuna. A Python front-end is also made available for quick prototyping.
        We present applications to hyperparameter optimization campaigns performed combining private, INFN Cloud and CINECA resources.

        Speaker: Matteo Barbetti (Universita e INFN, Firenze (IT))
      • 200
        Hyperparameter optimization, multi-node distributed training and benchmarking of AI-based HEP workloads using HPC

        In the European Center of Excellence in Exascale Computing "Research on AI- and Simulation-Based Engineering at Exascale" (CoE RAISE), researchers from science and industry develop novel, scalable Artificial Intelligence technologies towards Exascale. In this work, we leverage European High performance Computing (HPC) resources to perform large-scale hyperparameter optimization (HPO), multi-node distributed data-parallel training as well as benchmarking, using multiple compute nodes, each equipped with multiple GPUs.

        Training and HPO of deep learning-based AI models is often compute resource intensive and calls for the use of large-scale distributed resources as well as scalable and resource efficient hyperparameter search algorithms. We evaluate the benefits of HPC for HPO by comparing different search algorithms and approaches, as well as performing scaling studies. Furthermore, the scaling and benefits of multi-node distributed data-parallel training using Horovod are presented, showing significant speed-up in model training. In addition, we present results from the development of a containerized benchmark based on an AI-model for event reconstruction that allows us to compare and assess the suitability of different hardware accelerators for training deep neural networks. A graph neural network (GNN) model known as MLPF, which has been developed for the task of Machine Learned Particle-Flow reconstruction in High Energy Physics (HEP), acts as the base model for which studies are performed.

        Further developments of AI models in CoE RAISE have the potential to greatly impact the field of High Energy Physics by efficiently processing the very large amounts of data that will be produced by particle detectors in the coming decades. In order to do this efficiently, techniques that leverage modern HPC systems like multi-node training, large-scale distributed HPO as well as standardized benchmarking will be of great use.

        Speaker: Eric Wulff (CERN)
      • 201
        Integration of machine learning-trained models into JUNO's offline software

        The Jiangmen Underground Neutrino Observatory (JUNO) is under construction in South China and will start data taking in 2023. It has a central detector with a 20-kt liquid scintillator, equipped with 17,612 20-inch PMTs (photo-multiplier tubes) and 25,600 3-inch PMTs. The requirement on energy resolution of 3\%@1MeV makes the offline data processing challenging, so several machine learning based methods have been developed for reconstruction, particle identification, simulation etc. These methods are implemented with machine learning libraries in Python, however, the offline software is based on a C++ framework called SNiPER. Therefore, how to integrate them and run the inference in offline software is important.

        In this contribution, integration of machine learning-trained models into JUNO's offline software will be presented. Three methods are explored: using SNiPER's Python binding to share data between C++ and Python; using native C/C++ APIs of the machine learning libraries, such as TensorFlow and PyTorch; using ONNX runtime. Even though SNiPER is implemented in C++, it provides Python binding via Boost Python. In recent updates of SNiPER, a special data buffer is implemented to share data between C++ and Python, which makes it possible to run machine learning methods in following way: a C++ algorithm reads event data and converts them to \texttt{numpy} arrays; a Python algorithm then accesses these \texttt{numpy} arrays and invokes machine learning libraries in Python; finally, the C++ algorithm puts the results into event data. For the native C/C++ APIs of machine learning libraries and ONNX runtime, a C++ algorithm is used to convert the event data to the corresponding formats and invoke the C/C++ APIs. The deployments of the three methods are also studied: using SNiPER's Python binding is the most flexible method for users, as users could install any Python libraries using \texttt{pip} by themselves; using native C/C++ APIs requires the users to use the same versions in JUNO official software release; using ONNX runtime only requires users to convert their own models to ONNX format. By comparing the three methods, ONNX is recommended for most of users in JUNO. For developing and testing of machine learning-models in offline software, developers could choose the other two methods.

        Speaker: Tao Lin (Chinese Academy of Sciences (CN))
      • 202
        k4Clue: Having CLUE at future colliders experiments

        CLUE is a fast and innovative density-based clustering algorithm to group digitized energy deposits (hits) left by a particle traversing the active sensors of a high-granularity calorimeter in clusters with a well-defined seed hit. Outliers, i.e. hits which do not belong to any clusters, are also identified. Its outstanding performance has been proven in the context of the CMS Phase-2 upgrade using both simulated and test beam data.

        Initially CLUE was developed in a standalone repository to allow performance benchmarking with respect to its CPU and GPU implementations, demonstrating the power of algorithmic parallelization in the coming era of heterogeneous computing. In this contribution we will outline CLUE’s capabilities outside CMS and more specifically, at experiments at future colliders. In order to do so, CLUE was adapted to run in the key4hep framework (k4Clue): it was integrated in the Gaudi software framework and it now supports EDM4hep data format for inputs and outputs.

        Implementation details and physics performance will be shown not only for several options of highly granular calorimeters for e+e- linear and circular future colliders, but also for the new Open Data Calorimeter detector, a recent extension to the Open Data Tracking detector, whose aim is to build a simulation-on-the-fly testbed for future algorithm R&D.

        Speaker: Erica Brondolin (CERN)
      • 203
        Lamarr: LHCb ultra-fast simulation based on machine learning models

        About 90% of the computing resources available to the LHCb experiment has been spent to produce simulated data samples for Run 2 of the Large Hadron Collider. The upgraded LHCb detector will operate at much-increased luminosity, requiring many more simulated events for the Run 3. Simulation is a key necessity of analysis to interpret data in terms of signal and background and estimate relevant efficiencies. The amount of simulation required will far exceed the pledged resources, requiring an evolution in technologies and techniques to produce simulated data samples. In this conference contribution, we discuss Lamarr, a Gaudi-based framework to speed-up the simulation production parametrizing both the detector response and the reconstruction algorithms of the LHCb experiment.
        Deep Generative Models powered by several algorithms and strategies are employed to effectively parameterize the high-level response of the single components of the LHCb detector, encoding within neural networks the experimental errors and uncertainties introduced in the detection and reconstruction phases. Where possible, models are trained directly on real data, statistically subtracting any background components through the application of weights.
        Embedding Lamarr in the general LHCb simulation framework (Gauss) allows to combine its execution with any of the available generators in a seamless way. The resulting software package enables a simulation process completely independent of the detailed simulation used to date.

        Speaker: Matteo Barbetti (Universita e INFN, Firenze (IT))
      • 204
        Mock Data Challenge for the JUNO experiment

        The Jiangmen Underground Neutrino Observatory (JUNO) is under construction in South China at a depth of about 700~m underground: the data taking is expected to start in late 2023. JUNO has a very rich physics program which primarily aims to the determination of the neutrino mass ordering and to the precisely measurement of oscillation parameters.
        The JUNO average raw data volume is expected to be about 2~PB/year and
        will be transferred from the experimental site to the main computing center (IHEP, Beijing, China) using a dedicated link. When raw data arrive to IHEP, a Data Quality Monitoring (DQM) system will be used to monitor their quality. A so called Keep-Up-Production (KUP) will reconstruct the data and these processed data will be used for detector status studies and for some prompt physics analysis. In order to validate the complete data processing chain, a Mock Data Challenge is being performed and will produce a large scale Monte Carlo data-set for the JUNO experiment.
        Due to the rare signals, most of the JUNO expected events are backgrounds, coming from natural radioactivity of rocks, cosmic muons and from the detector itself. There are 17 different components considered in this Mock Data Challenge, and the simulation of each component is performed using the JUNO Distributed Computing Infrastructure (JUNO-DCI). The Monte Carlo output can then be used for the electronics and digitization simulation. However, the electronics simulation needs to simultaneously read a huge amount of data for each background component, and that makes the production on JUNO-DCI really challenging. A pre-mixing method is implemented to mix the radioactivity events beforehand so that the number of required input files can be significantly reduced: a radioactivity background event is picked from the existing data files according to the event rates and then saved into a pre-mixed data file.

        In this contribution, details on the Mock Data Challenge, on the JUNO data processing logic-flow and on the practical challenges to be faced for a successful production, will be reported.

        Speaker: Alessandra Carlotta Re (Universita' degli Studi & INFN of Milano (Italy))
      • 205
        Of Frames and schema evolution - The newest features of podio

        The podio event data model (EDM) toolkit provides an easy way to generate a performant implementation of an EDM from a high level description in yaml format. We present the most recent developments in podio, most importantly the inclusion of a schema evolution mechanism for generated EDMs as well as the "Frame", a thread safe, generalized event data container. For the former we discuss some of the technical aspects in relation with supporting different I/O backends and leveraging potentially existing schema evolution mechanisms provided by them. Regarding the Frame we introduce the basic concept and highlight some of the functionality as well as important aspects of its implementation. We also present some other, smaller new features, which have been inspired by the usage of podio for generating different EDMs for future collider projects, most importantly EDM4hep, the common EDM for the Key4hep project. We end with a brief overview on current developments towards a first stable version as well as an outlook on future developments beyond that.

        Speaker: Thomas Madlener (Deutsches Elektronen-Synchrotron (DESY))
      • 206
        Optimized GPU usage in High Energy Physics applications

        Machine Learning (ML) applications, which have become quite common tools for many High Energy Physics (HEP) analyses, benefit significantly from GPU resources. GPU clusters are important to fulfill the rapidly increasing demand for GPU resources in HEP. Therefore, the Karlsruhe Institute of Technology (KIT) provides a GPU cluster for HEP accessible from the physics institute via its batch system and the Grid. As the exact hardware needs of such applications heavily depend on the ML hyperparameters, a flexible resource setup is necessary to utilize the available resources as efficient as possible. Therefore, the multi-instance GPU feature of the Nvidia A100 GPUs was studied. Several neural network training scenarios performed on the GPU cluster at KIT are discussed to illustrate possible performance gains and the setup that has been used.

        Speaker: Tim Voigtlaender (KIT - Karlsruhe Institute of Technology (DE))
      • 207
        Optimizing electron and photon reconstruction using deep learning: application to the CMS electromagnetic calorimeter

        The reconstruction of electrons and photons in CMS depends on topological clustering of the energy deposited by an incident particle in different crystals of the electromagnetic calorimeter (ECAL). These clusters are formed by aggregating neighbouring crystals according to the expected topology of an electromagnetic shower in the ECAL. The presence of upstream material (beampipe, tracker and support structures) causes electrons and photons to start showering before reaching the calorimeter. This effect, combined with the 3.8T CMS magnetic field, leads to energy being spread in several clusters around the primary one. It is essential to recover the energy contained in these satellite clusters in order to achieve the best possible energy resolution for physics analyses.

        Historically satellite clusters have been associated to the primary cluster using a purely topological algorithm which does not attempt to remove spurious energy deposits from additional pileup interactions (PU). The performance of this algorithm is expected to degrade during LHC Run 3 (2022+) because of the larger average PU levels and the increasing levels of noise due to the ageing of the ECAL detector. New methods are being investigated that exploit state-of-the-art deep learning architectures like Graph Neural Networks (GNN) and self-attention algorithms. These more sophisticated models improve the energy collection and are more resilient to PU and noise.

        This contribution covers the model optimization results and the steps to put it in production inside the realistic CMS reconstruction sequence. The impact on the electron and photon energy resolution and tests of the resiliency of the algorithm to the changing detector conditions are shown.

        Speaker: Davide Valsecchi (ETH Zurich (CH))
      • 208
        Primary Vertex Reconstruction for Heterogeneous Architecture at CMS

        The future development projects for the Large Hadron Collider will constantly bring nominal luminosity increase, with the ultimate goal of reaching a peak luminosity of $5 \times 10^{34} cm^{−2} s^{−1}$. This would result in up to 200 simultaneous proton collisions (pileup), posing significant challenges for the CMS detector reconstruction.

        The CMS primary vertex (PV) reconstruction is a two-step procedure consisting of vertex finding and fitting. First, the Deterministic Annealing algorithm clusters tracks coming from the same interaction vertex. Secondly, an Adaptive Vertex Fit computes the best estimate of the vertex position. In High Luminosity LHC (HL-LHC) conditions, due to the high track density, the reconstruction of PVs is expected to be particularly time expensive (up to 6\% of reconstruction time).

        This work presents a complete study about adapting the CMS primary vertex reconstruction algorithms in order to be run on heterogeneous architectures that allows us to exploit parallelization techniques to significantly reduce the processing time, while retaining similar physics performance. Results obtained for both Run3 and HL-LHC conditions will be discussed.

        Speakers: Adriano Di Florio (Politecnico e INFN, Bari), Giorgio Pizzati (Universita & INFN, Milano-Bicocca (IT))
      • 209
        Pyrate: a novel system for data transformations, reconstruction and analysis for the SABRE experiment

        The pyrate framework provides a dynamic, versatile, and memory-efficient approach to data format transformations, object reconstruction and data analysis in particle physics. Developed within the context of the SABRE experiment for dark matter direct detection, pyrate relies on a blackboard design pattern where algorithms are dynamically evaluated throughout a run and scheduled by a central control unit. The system intends to improve the user experience, portability and scalability of offline software systems currently available in the particle physics community, with particular attention to medium to small-scale experiments. Pyrate is implemented with the python programming language, allowing easy access to the scientific python ecosystem and commodity big data technologies. This presentation addresses the pyrate design and implementation.

        Speaker: Federico Scutti (Swinburne University of Technology)
      • 210
        Real-time alignment procedure at the LHCb experiment for Run3

        The LHCb detector at the LHC is a general purpose detector in the forward region with a focus on studying decays of c- and b-hadrons. For Run 3 of the LHC (data taking from 2022), LHCb will take data at an instantaneous luminosity of 2 × 10^{33} cm−2 s−1, five times higher than in Run 2 (2015-2018). To cope with the harsher data taking conditions, LHCb will deploy a purely software based trigger with a 30 MHz input rate.
        The software trigger at LHCb is composed of two stages: in the first stage the selection is based on a fast and simplified event reconstruction, while in the second stage a full event reconstruction is used. This gives room to perform a real-time alignment and calibration after the first trigger stage, which provides an offline-quality detector alignment in the second stage of the trigger. The detector alignment is an essential ingredient to have the best detector performance in the full event reconstruction. The alignment of the whole tracking system of LHCb is evaluated in real-time by an automatic iterative procedure. This is particularly important for the vertex detector, which is retracted for LHC beam injection and centered around the primary vertex position with stable beam conditions in each fill. Hence it is sensitive to position changes on fill-by-fill basis.
        The real-time alignment procedure is fully automatic procedure in the online framework that uses a multi-core farm. It is executed as soon as the required data sample is collected. The alignment tasks are split in two parts to allow the parallelization of the event reconstruction via a multi-threads process, while the the evaluation of the alignment parameters is performed on a single thread after collecting all the needed information from all the reconstruction processes in the first part. The execution of the alignment tasks is under the control of the LHCb Experiment Control System, and it is implemented as a finite state machine. The procedure is run at the beginning of each LHC fill and for the alignment of the full tracking system (about 300 elements and about 1000 dofs) takes few minutes. The parameters are updated immediately in the software trigger. This in turn allows to achieve the optimal performance in the trigger output data that can be used for physics analysis without a further offline event reconstruction.
        The framework and the procedure for a real-time alignment of the LHCb detector developed for Run 3 data taking are discussed from both the technical and operational point of view. Specific challenges of this procedure and its performance are presented.

        Speaker: Florian Reiss (University of Manchester (GB))
      • 211
        Reconstructing Particle Decay Trees with Quantum Graph Neural Networks for High Energy Physics

        Quantum Computing and Machine Learning are both significant and appealing research fields. In particular, the combination of both has led to the emergence of the research field of quantum machine learning which has recently taken enormous popularity. We investigate in the potential advantages of this synergy for the application in high energy physics, more precisely in the reconstruction of particle decay trees in particle collision experiments. Due to the larger computational space of quantum computers, this highly complex combinatorical problem is well suited for investigating in a potential quantum advantage compared to the classical scenario. However, current quantum devices are subject to noise and provide only a limited number of qubits. We therefore propose the utilization of a variational quantum circuit within a classical graph neural network which has been shown to be feasible for reconstruction of particle decay trees before. We evaluate our approach on artificially generated decay trees on a quantum simulator and a real quantum computer by IBM Quantum and compare our results to the purely classical approach. Our proposed approach does not only enable the effective utilization of nowadays quantum devices, but also shows competitive results even in the presence of noise.

        Speaker: Melvin Strobl
      • 212
        Speeding up the CMS track reconstruction with a parallelized and vectorized Kalman-filter-based algorithm during the LHC Run 3

        One of the most challenging computational problems in the Run 3 of the Large Hadron Collider (LHC) and more so in the High-Luminosity LHC (HL-LHC) is expected to be finding and fitting charged-particle tracks during event reconstruction. The methods used so far at the LHC and in particular at the CMS experiment are based on the Kalman filter technique. Such methods have shown to be robust and to provide good physics performance, both in the trigger and offline. In order to improve computational performance, we explored Kalman-filter-based methods for track finding and fitting, adapted for many-core SIMD architectures. This adapted Kalman-filter-based software, called “mkFit”, was shown to provide a significant speedup compared to the traditional algorithm, thanks to its parallelized and vectorized implementation. The mkFit software was recently integrated into the offline CMS software framework, in view of its exploitation during the Run 3 of the LHC. At the start of the LHC Run 3, mkFit will be used for track finding in a subset of the CMS offline track reconstruction iterations, allowing for significant improvements over the existing framework in terms of computational performance, while retaining comparable physics performance. The performance of the CMS track reconstruction using mkFit at the start of the LHC Run 3 is presented, together with prospects of further improvement in the upcoming years of data taking.

        Speaker: Manos Vourliotis (Univ. of California San Diego (US))
      • 213
        The Key4hep Turnkey Software Stack: Beyond Future Higgs Factories

        The Key4hep project aims to provide a turnkey software solution for the full experiment life-cycle, based on established community tools. Several future collider communities (CEPC, CLIC, EIC, FCC, and ILC) have joined to develop and adapt their workflows to use the common data model EDM4hep and common framework. Besides sharing of existing experiment workflows, one focus of the Key4hep project is the development and integration of new experiment independent software libraries. Ongoing collaborations with projects such as ACTS, CLUE, PandoraPFA and the OpenDataDector show the potential of Key4hep as an experiment-independent testbed and development platform. In this talk, we present the challenges of an experiment-independent framework along with the lessons learned from discussions of interested communities (such as LUXE) and recent adopters of Key4hep in order to discuss how Key4hep could be of interest to the wider HEP community while staying true to its goal of supporting future collider designs studies.

        Speaker: Valentin Volkl (CERN)
      • 214
        Track reconstruction using quantum algorithms at LUXE

        LUXE (Laser Und XFEL Experiment) is a proposed experiment at DESY using the electron beam of the European XFEL and a high-intensity laser. LUXE will study Quantum Electrodynamics (QED) in the strong-field regime, where QED becomes non-perturbative. One of the key measurements is the positron rate from electron-positron pair creation, which is enabled by the use of a silicon tracking detector. Precision tracking of positrons becomes very challenging at high laser intensities due to the high rates, which can be computationally expensive for classical computers. The talk will present the latest progress of quantum algorithm-based tracking, which relies on Variational Quantum Eigensolver (VQE) or Quantum Approximate Optimisation Algorithm (QAOA) to reconstruct tracks, and compare the results with classical methods using Graph Neural Networks or a Combinatorial Kalman Filter.

        Speaker: Annabel Kropf (DESY Hamburg)
    • Track 1: Computing Technology for Physics Research Sala Federico II (Villa Romanazzi)

      Sala Federico II

      Villa Romanazzi

      Conveners: Taylor Childers (Argonne National Laboratory (US)), Dr Maria Girone (CERN)
      • 215
        The Virtual Research Environment: towards a complexive analysis platform

        One of the objectives of the EOSC (European Open Science Cloud) Future Project is to integrate diverse analysis workflows from Cosmology, Astrophysics and High Energy Physics in a common framework. The project’s development relies on the implementation of the Virtual Research Environment (VRE), a prototype platform supporting the goals of Dark Matter and Extreme Universe Science Projects in the respect of FAIR data policies, making use of a common AAI system, and leveraging experiments data via a reliable and scalable distributed storage infrastructure for multi-science: the Data Lake. The entry point of such a platform is a jupyterhub instance sitting on top of a complex K8s infrastructure, which provides an interactive GUI interface for researchers to access and share data, as well as to run notebooks. The data access and browsability is enabled through API calls to the high level data management and storage orchestration software (Rucio).
        The cluster’s functionality, currently allowing data injection replication, storage and deletion, is being expanded to include a software repository plug-in enabling researchers to directly select computational environments from Docker images and to host a re-analysis platform (REANA) supporting various distributed computing backends (K8s, HTCondor, Slurm), which allows scientists to spawn and interact with complete re-analysis workflows.
        The goal of the VRE project, bringing together data and software access, workflow reproducibility and enhanced user interface, is to facilitate scientific collaboration, ultimately accelerating research in various fields.

        Speaker: Elena Gazzarrini (CERN)
      • 216
        Computing for Gravitational-wave Research towards O4

        The LIGO, VIRGO and KAGRA Gravitational-wave interferometers are getting ready for their fourth observational period, scheduled to begin in March 2023, with improved sensitivities and higher event rates.

        Data from the interferometers are exchanged between the three collaborations and processed by running search pipelines for a range of expected signals, from coalescing compact binaries to continuous waves and burst events, along with sky localisation and parameter estimation pipelines. One of the most important peculiarities of GW computing (and, more generally, of time-domain astrophysics) is that data processing happens both offline and on special low-latency infrastructures, in order to provide timely “event candidate alerts” to other observatories and make multi-messenger astronomy possible.

        Significant efforts have been made in recent years to design and build a common computing infrastructure, both in terms of a common architecture and shared resources, to prepare for growing computing demand and increasingly exploit distributed computing resources. Many custom tools, difficult to maintain, have been replaced by more mainstream tools, more widely adopted in the physics community, in order to streamline workflows and reduce the burden of maintenance and operations.

        We report on this activities, the status of the infrastructure and the plans for the upcoming observation period.

        Speaker: Stefano Bagnasco (Istituto Nazionale di Fisica Nucleare, Torino)
      • 217
        CernVM 5: a versatile container-based platform to run HEP applications

        Since its inception, the minimal Linux image CernVM provides a portable and reproducible runtime environment for developing and running scientific software. Its key ingredient is the tight coupling with the CernVM-FS client to provide access to the base platform (operating system and tools) as well as the experiment application software. Up to now, CernVM images are designed to use full virtualization. The goal of CernVM 5 is to deliver all the benefits of the CernVM appliance and to be equally practical as a container and as a full VM. To this end, the CernVM 5 container image consists of a “Just Enough Operating System (JeOS)”, with its contents defined by the HEP_OSlibs meta-package commonly used as a base platform in HEP. CernVM 5 further aims at smooth integration of the CernVM-FS client in various container environments (such as Docker, kubernetes, podman, apptainer). Lastly, CernVM 5 uses special build tools and post-build processing to ensure that experiment software stacks using their custom compilers and build chains can coexist with standard system application stacks. As a result, CernVM 5 aims at providing a single, minimal container image that can be used as a virtual appliance for mounting the CernVM-FS client and for running and developing HEP application software.

        Speaker: Jakob Karl Eberhardt (University of Applied Sciences (DE))
    • Track 2: Data Analysis - Algorithms and Tools Sala Europa (Villa Romanazzi)

      Sala Europa

      Villa Romanazzi

      Conveners: Dalila Salamani (CERN), Felice Pantaleo (CERN)
      • 218
        Efficient search for new physics using Active Learning in the ATLAS Experiment

        Searches for new physics set exclusion limits in parameter spaces of typically up to 2 dimensions. However, the relevant theory parameter space is usually of a higher dimension but only a subspace is covered due to the computing time requirements of signal process simulations. An Active Learning approach is presented to address this limitation. Compared to the usual grid sampling, it reduces the number of parameter space points for which exclusion limits need to be determined. Hence it allows to extend interpretations of searches to higher dimensional parameter spaces and therefore to raise their value, e.g. via the identification of barely excluded subspaces which motivate dedicated new searches.

        In an iterative procedure, a Gaussian Process is fit to excluded signal cross-sections. Within the region close to the exclusion contour predicted by the Gaussian Process, Poisson disc sampling is used to determine further parameter space points for
        which the cross-section limits are determined. The procedure is aided by a warm-start phase based on computationally inexpensive, approximate limit estimates such as total signal cross-sections. A python package, excursion [1], provides the Gaussian Process routine. The procedure is applied to a Dark Matter search performed by the ATLAS experiment, extending its interpretation from a 2 to a 4-dimensional parameter space while keeping the computational effort at a low level.

        [1] https://github.com/diana-hep/excursion

        Speaker: Patrick Rieck (New York University (US))
      • 219
        Temporal Variational Autoencoders and Simulation-based inference for interpolation of light curves of Gravitationally Lensed Quasars

        The Hubble Tension presents a crisis for the canonical LCDM model of modern cosmology: it may originate in systematics in data processing pipelines or it may come from new physics related to dark matter and dark energy. The aforementioned crisis can be addressed by studies of time-delayed light curves of gravitationally lensed quasars, which have the capacity to constrain the Hubble constant ($H_0$). A critical task in this analysis is the interpolation of time series with varying duration and irregular time sampling. In this problem, the baseline approach is Gaussian processes (GPs), which have issues in converging on the maximum likelihood.
        In this work, we compare the interpolation performance of multiple models: GPs inferred with maximum likelihood optimization, GPs inferred with neural density estimation (NDE), and heteroscedastic temporal neural networks. For the NDE approach, a normalizing flow infers the posteriors of GP’s parameters from time series’ encodings independent of duration or time sampling. Of the neural networks, we use spline-based convolutional variational autoencoders (VAEs) and multi-time attention VAEs.
        We validate our methods on simulations of Gaussian processes, on the observed lensed quasar light curves as well as on real-world datasets that are baselines for irregularly sampled time series interpolation. Our analysis shows that the Gaussian processes inferred with neural density estimators outperform the other approaches in interpolation quality.

        Speaker: Egor Danilov (Fermilab and EPFL)
      • 220
        Galaxy survey data reduction with deep learning

        PAUS is a 40 narrow-band imaging survey using the PAUCam instrument installed at
        the William Herschel Telescope (WHT). Since the survey started in 2015, this
        instrument has acquired a unique dataset, performing a relatively deep and
        wide survey, but with a simultaneous excelled redshift accuracy. The survey
        is a compromise in performance between deep spectroscopic survey and wide
        field imaging, showing an order of magnitude better redshift resolution
        than typical broad band surveys.

        The survey data reduction was designed based on classical data reduction
        techniques. For example the redshift template fitting needed a different
        algorithm to properly handle the PAUS data (Eriksen 2019). While the data
        reduction and redshift estimation worked, it had room for improvements.
        In this talk, we detail the different efforts of replacing steps in the
        PAUS data reduction with deep learning algorithms. First, deep learning
        techniques obtain a 50 per.cent reduction in the photo-z scatter for
        the fainted galaxies. This is achieved through various techniques,
        including using transfer learning from simulations to handle a small
        data set.

        Furthermore, we have constructed multiple algorithms to improve the
        data reduction stage. Noise estimation from background estimation from
        a non-uniform background was handled in BKGNet (Cabayol-Garcia 2019),
        the galaxy photometry (light measure) was introduced with Lumus
        (Cabayol-Garcia 2021). Recent work includes the effort of directly
        estimating the galaxy distance from images. In this talk we also
        discuss the challenges encountered by differences between the
        survey fields and recent advances in applying unsupervised denoising
        techniques.

        Speaker: Martin Eriksen
      • 221
        Automatic differentiation of binned likelihoods with RooFit and Clad

        RooFit is a toolkit for statistical modeling and fitting used by most experiments in particle physics. Just as data sets from next-generation experiments grow, processing requirements for physics analysis become more computationally demanding, necessitating performance optimizations for RooFit. One possibility to speed-up minimization and add stability is the use of automatic differentiation (AD). Unlike for numerical differentiation, the computation cost scales linearly with the number of parameters, making AD particularly appealing for statistical models with many parameters. In this talk, we report on one possible way to implement AD in RooFit. Our approach is to add a facility to generate C++ code for a full RooFit model automatically. Unlike the original RooFit model, this generated code is free of virtual function calls and other RooFit-specific overhead. In particular, this code is then used to produce the gradient automatically with Clad. Clad is a source transformation AD tool implemented as a plugin to the clang compiler, which automatically generates the derivative code for input C++ functions. We show results demonstrating the improvements observed when applying this code generation strategy to HistFactory and other commonly used RooFit models. HistFactory is the subcomponent of RooFit that implements binned likelihood models with probability densities based on histogram templates. These models frequently have a very large number of free parameters, and are thus an interesting first target for AD support in RooFit.

        Speaker: Garima Singh (Princeton University (US))
    • Track 3: Computations in Theoretical Physics: Techniques and Methods Sala A+A1 (Villa Romanazzi)

      Sala A+A1

      Villa Romanazzi

      Conveners: Domenico Pomarico (INFN Sezione di Bari), Joshua Davies (University of Sussex)
      • 222
        lips: complex phase space goes singular and p-adic

        High-multiplicity loop-level amplitude computations involve significant algebraic complexity, which is usually sidestepped by employing numerical routines. Yet, when available, final analytical expressions can display improved numerical stability and reduced evaluation times. It has been shown that significant insights into the analytic structure of the results can be obtained by tailored numerical evaluations. I present new developments on the object-oriented python package lips (Lorentz invariant phase space) for the generation and manipulation of complex massless kinematics. Phase-space points can be defined at the spinor level over complex numbers ($\mathbb{C}$), finite fields ($\mathbb{F}_p$ ), and $p$-adic numbers ($\mathbb{Q}_p$). Facilities are also available for the evaluation of arbitrary spinor-helicity expressions in any of these fields. Through the algebraic-geometry submodule, which relies on Singular through the python interface syngular, one can define and manipulate ideals in spinor variables (either covariant components or invariant brackets). These allow to identify irreducible varieties, where amplitudes have well-defined zeros and poles, and to fine-tune numerical phase-space points to be on or close to such varieties. Explicit precision tracking in the $p$-adic implementation allows one to perform numerical computations in singular configurations while keeping track of the numerical uncertainty as an $\mathcal{O}(p^k)$ term. As an example application, I will show how to infer valid partial-fraction decompositions from $p$-adic evaluations.

        Speaker: Giuseppe De Laurentis (Freiburg University)
      • 223
        Two-loop five-point amplitudes in massless QCD with finite fields

        I will discuss the analytic calculation of two-loop five-point helicity amplitudes in massless QCD. In our workflow, we perform the bulk of the computation using finite field arithmetic, avoiding the precision-loss problems of floating-point representation. The integrals are provided by the pentagon functions. We use numerical reconstruction techniques to bypass intermediate complexity and obtain compact forms for the rational coefficients. I will present results for NLO gluon-initiated diphoton-plus-jet production and NNLO trijet production.

        Speaker: Ryan Moodie (Turin University)
    • 1:00 PM
      Group Photo

      In front Federico II room

    • 1:15 PM
      Lunch break Sala Scuderia (Villa Romanazzi)

      Sala Scuderia

      Villa Romanazzi

    • Track 1: Computing Technology for Physics Research Sala Federico II (Villa Romanazzi)

      Sala Federico II

      Villa Romanazzi

      Conveners: Raquel Pezoa Rivera (Federico Santa Maria Technical University (CL)), Gioacchino Vino (INFN Bari (IT))
      • 224
        covfie: a compositional library for heterogeneous vector fields

        Vector fields are ubiquitous mathematical structures in many scientific domains including high-energy physics where — among other things — they are used to represent magnetic fields. Computational methods in these domains require methods for storing and accessing vector fields which are both highly performant and usable in heterogeneous environments. In this paper we present covfie, a co-processor-aware vector field library developed by the ACTS community which aims to flexibly and performantly represent vector fields for a wide variety of scientific domains and across a range of programming platforms. To this end, we employ a compositional design philosophy which enables us to meet domain requirements through the composition of simple structures we refer to as vector field transformers. In this work, we detail the design and implementation of our library, and enumerate the different kinds of vector fields that our library supports. Furthermore, we evaluate the performance of our library using a mini-application that renders vector magnitudes of a slice of the ATLAS magnetic field on both an x86-based CPU platform and a CUDA-compatible GPGPU platform; through this mini-application, we demonstrate that different storage methods — all of which can be implemented using our library — can have a significant impact on the performance of client applications.

        Speaker: Stephen Nicholas Swatman (University of Amsterdam (NL))
      • 225
        Speeding up CMS simulations, reconstruction and HLT code using advanced compiler options

        The CMS simulation, reconstruction, and HLT code have been used to deliver an enormous number of events for analysis during Runs 1 and 2 of the LHC at CERN. In fact, these techniques have been regarded as of fundamental importance for the CMS experiment. In the following arguments presented, several ways to improve efficiency of these procedures will be described and it will be displayed how no particular conceptual or technical blocker has been identified in their implementation.

        In this framework, particular attention will be devoted to highlight how CMS simulation, Reco and HLT will gain a considerable increase in speed recompiling several CMS sub-libraries using advanced compiler options. In fact, using this logic, the compiler will be leveraged to obtain a up to 10% speedup. As will be shown, the focus of the reasonings reported will be on the LTO (Link Time Optimization) and PGO (Profile Guided Optimization) approaches: using these advanced tools, several results will be seen about improving the event loop time and event throughput and the differences between the profiles of the processes will be shown. Moreover, an important feature of PGO approach will be considered: profiles obtained running events based on one process will be enough to speedup many other ones (and a profile obtained with the Phase 1 detector configuration will manage to give an improvement for Phase 2 processes too).

        Speaker: Danilo Piparo (CERN)
      • 226
        Using a DSL to read ROOT TTrees faster in Uproot

        Uproot reads ROOT TTrees using pure Python. For numerical and (singly) jagged arrays, this is fast because a whole block of data can be interpreted as an array without modifying the data. For other cases, such as arrays of std::vector<std::vector<float>>, numerical data are interleaved with structure, and the only way to deserialize them is with a sequential algorithm. When written in Python, such algorithms are very slow.

        We solve this problem by writing the same logic in a language that can be executed quickly. AwkwardForth is a Domain Specific Language (DSL), based on Standard Forth with I/O extensions for making Awkward Arrays, and it JIT-compiles to a fast virtual machine without requiring LLVM as a dependency. We generate code as late as possible to take advantage of optimization opportunities. All ROOT types previously implemented with Python are being converted to AwkwardForth.

        Double and triple-jagged arrays have already been implemented and are 400× faster in AwkwardForth than in Python, with multithreaded scaling up to 1 second/GB because AwkwardForth releases the Python GIL. In this talk, we describe design aspects, performance studies, and future directions in accelerating Uproot with AwkwardForth.

        Speaker: Aryan Roy
      • 227
        Implementing Machine Learning inference on FPGAs: from software to hardware using hls4ml

        In the past few years, using Machine and Deep Learning techniques has become more and more viable, thanks to the availability of tools which allow people without specific knowledge in the realm of data science and complex networks to build AIs for a variety of research fields. This process has encouraged the adoption of such techniques: in the context of High Energy Physics, new algorithms based on ML are being tested for event selection in trigger operations, end-user physics analysis, computing metadata based optimizations, and more. Time critical applications can benefit from implementing algorithms on low-latency hardware like specifically designed ASICs and programmable micro-electronics devices known as FPGAs. The latter offers a unique blend of the benefits of both hardware and software. Indeed, they implement circuits just like hardware, providing power, area and performance benefits over software, yet they can be reprogrammed cheaply and easily to implement a wide range of tasks, at the expense of performance with respect to ASICs.

        In order to facilitate the translation of ML models to fit in the usual workflow for programming FPGAs, a variety of tools have been developed. One example is the HLS4ML toolkit, developed by the HEP community, which allows the translation of Neural Networks built using tools like TensorFlow to a High-Level Synthesis description (e.g. C++) in order to implement this kind of ML algorithms on FPGAs.

        This paper presents and discusses the activity started at the Physics and Astronomy department of University of Bologna and INFN-Bologna devoted to preliminary studies for the trigger systems of the Compact Muon Solenoid (CMS) experiment at the CERN LHC accelerator. A broader-purpose open-source project from Xilinx (a major FPGA producer) called PYNQ is being tested combined with the HLS4ML toolkit. The PYNQ purpose is to grant designers the possibility to exploit the benefits of programmable logic and microprocessors using the Python language. This software environment can be deployed on a variety of Xilinx platforms, from IOT devices like the ZYNQ-Z1 board, to the high performance ones, like Alveo accelerator cards and on the cloud AWS EC2 F1 instances.

        Even though a rich documentation can be found on how to use hls4ml, a comprehensive description of the entire workflow from Python to FPGA is still hard to find. This work tries to fill this gap, presenting hardware and software set-up, together with performance tests on various baseline models used as benchmarks. The presence or not of some overhead causing an increase in latency will be investigated. Eventually, the consistency in the predictions of the NN, with respect to a more traditional way of interacting with the FPGA using C++ code, will be verified.

        Speaker: Marco Lorusso (Universita e INFN, Bologna (IT))
      • 228
        Extending ADL/CutLang with a new dynamic multipurpose protocol

        Use of declarative languages for HEP data analysis is an emerging, promising approach. One highly developed example is ADL (Analysis Description Language), an external domain specific language that expresses the analysis physics algorithm in a standard and unambiguous way, independent of frameworks. The most advanced infrastructure that executes an analysis written in the formal ADL syntax is the CutLang (CL) runtime interpreter based on traditional parsing tools. CL which was previously presented in this conference, has been further developed in the last years to cope with most LHC analyses. The new additions include full fledged histogramming and data-MC comparison facilities alongside an interface to a number of well known limit setting tools.

        The ADL/CL architecture was thus far prepared and built with a general-purpose programming language, without formal computing expertise and has grown into a complex monolithic structure. To facilitate maintenance and further development of CL, while making it reusable in other (non-scientific) domains, we designed a protocol called Dynamic Domain Specific eXtensible Language (DDSXL) that modularizes its monolithic structure. The DDSXL protocol provides a set of strict rules that allow each researcher to work in their area of ​​expertise and understand the work done without any expertise in other areas, completely independent of the programming languages ​​and frameworks used.
        DDSXL integrates a domain ecosystem (such as CL) into the development environment with a completely abstract structure using various OOP design patterns and with a set of rules determined through communication over the network. This protocol also integrates numerous programming languages ​​and frameworks, allowing each developer to integrate it into their own module without the need for expertise in technologies from other modules.

        Here, we introduce the latest developments in ADL/CL focusing on the working principles of the DDSXL protocol and integration.

        Speaker: Gokhan Unel (University of California Irvine (US))
    • Track 2: Data Analysis - Algorithms and Tools Sala Europa (Villa Romanazzi)

      Sala Europa

      Villa Romanazzi

      Conveners: Davide Valsecchi (ETH Zurich (CH)), Thomas Owen James (Imperial College (GB))
      • 229
        RDataFrame: a flexible and scalable analysis experience

        The growing amount of data generated by the LHC requires a shift in how HEP analysis tasks are approached. Usually, the workflow involves opening a dataset, selecting events, and computing relevant physics quantities to aggregate into histograms and summary statistics. The required processing power is often so high that the work needs to be distributed over multiple cores and multiple nodes. This contribution establishes ROOT RDataFrame as the single entry point for virtually all HEP data analysis use cases. In fact, the typical steps of an analysis workflow can be easily and flexibly written with RDataFrame. Data ingestion from multiple sources is streamlined through a single interface. Relevant metadata can be made available to the dataframe and used during analysis execution. A declarative API offers the most common operations to the users, while transparently taking care of data processing optimisations. For example, it is possible to inject user-defined code to compute complex quantities, gather them into histograms or other relevant statistics, include large sets of systematic variations and use machine-learning inference kernels. A Pythonic layer allows dynamic injection of Python functions in the main C++ event loop. Finally, any RDataFrame application can seamlessly scale out to hundreds of cores on the same machine or multiple distributed nodes by changing a single line of code. The latest performance validation studies are also included in this contribution to demonstrate the efficiency of the tool on both the computation complexity and the scalability spectra.

        Speaker: Vincenzo Eduardo Padulano (Valencia Polytechnic University (ES))
      • 230
        A multi-purposed reconstruction method based on machine learning for atmospheric neutrino at JUNO

        The Jiangmen Underground Neutrino Observation (JUNO) experiment is designed to measure the neutrino mass order (NMO) using a 20-kton liquid scintillator detector to solve one of the biggest remaining puzzles in neutrino physics. Regarding the sensitivity of JUNO’s NMO measurement, besides the precise measurement of reactor neutrinos, the independent measurement of the atmospheric neutrino oscillation has great potential to enhance the sensitivity in the combined analysis. This heavily relies on the event reconstruction performance at high energy (GeV) level, including the angular resolution of the incident neutrino, the energy resolution, as well as the accuracy of the flavor identification etc.
        In this contribution, we present a multi-purposed reconstruction algorithm for high energy particles in JUNO based on machine learning method. This includes extracting effective features from tens of thousands of PMT waveforms, as well as the development of two types of machine learning models (spherical GNN and planar CNN/Transformer). Novel techniques, such as improving the model convergence speed and eliminating reconstruction bias by maintaining the rotation-invariance are also discussed. Preliminary results based on JUNO simulation present reconstruction precision at an unprecedented level, showing great application potential for other large liquid scintillator detectors as well.

        Speaker: Teng LI (Shandong University, CN)
      • 231
        A Machine Learning Method for calorimeter signal processing in sPHENIX

        The sPHENIX experiment at RHIC requires substantial computing power for its complex reconstruction algorithms. One class of these algorithms is tasked with processing signals collected from the sPHENIX calorimeter subsystems, in order to extract signal features such as the amplitude, timing of the peak and the pedestal. These values, calculated for each channel, form the basis of event reconstruction in the calorimeter. The baseline technique used for signal feature extraction is fitting the signal waveforms in individual calorimeter channels with a parametrized function which optimally represents the signal shape. Due to the large channel count in the sPHENIX calorimeters, such fitting procedure may consume a non-trivial fraction of the total reconstruction time in a given event. To solve this problem, an alternative technique is being explored, based on a Machine Learning algorithm utilizing a Neural Network, in which the training data sample is produced using the traditional fitting technique. Initial results demonstrate an order of magnitude improvement in speed of signal processing while preserving acceptable level of accuracy. A prototype of a Keras/TensorFlow-based inference application has been created, to be deployed on the worker nodes running sPHENIX event reconstruction software. Comparison with the standard fitting technique has been performed. We present our experience with the design and implementation of the ML-based algorithm for the sPHENIX calorimeter signal processing.

        Speaker: Maxim Potekhin (Brookhaven National Laboratory (US))
      • 232
        Flow-Unet for High Dimensional Image Semantic Segmentation

        Nowadays, medical images play a mainstay role in medical diagnosis, and computer tomography, nuclear magnetic resonance, ultrasound and other imaging technologies have become a powerful means of in vitro imaging. Extracting lesion information from these images can enable doctors to observe and diagnose the lesion more effectively, so as to improve the accuracy of quasi diagnosis. Therefore, the segmentation of medical images has important social value.The achievement of image semantic segmentation shows the potential of the Convolutional Neural Network (CNN) for medical image analysis. However, the application of the existing CNN model to the video neglect the correlation between frames of the video. A video semantic segmentation framework based on U-Net is proposed in this article that the feature map of the pre-frame is propagated to the next frame via an optical flow field. The accuracy of segmentation is boosted with slight performance degradation. The framework includes three parts: 1) a segmentation sub module using UNet to segment the current frame; 2) an optical flow feature extraction module to perform feature extraction on the motion information of the current frame and the previous frame; 3) a correction module, which assigns weights to the segmentation results and optical flow features to achieve the correction effect. The effectiveness of our proposed method is presented on two public datasets (Drosophila melanogaster electron micrographs, Chaos), and private Digital Subtraction Angiography (DSA) video datasets.

        Speakers: Mr Yu Hu (Institute of High Energy Physics, CAS), Ms Xiaomeng Qiu (Zhengzhou University)
      • 233
        Hybrid Quantum-Classical Networks for Reconstruction and Classification of Earth Observation Images

        Earth Observation (EO) has experienced promising progress in the modern era via an impressive amount of research on establishing a state-of-the-art Machine Learning (ML) technique to learn a large dataset. Meanwhile, the scientific community has also extended the boundary of ML to the quantum system and exploited a new research area, so-called Quantum Machine Learning (QML), to integrate advantages from both ML and Quantum Computing (QC). Recent papers investigated the application of QML in the EO domain mainly based on Parameterized Quantum Circuits (PQCs), which are regarded as suitable architecture for quantum neural networks (QNNs) due to their potential to be efficiently simulated on near-term quantum hardware. But more contributions are still required in-depth, and various challenges should be tackled, such as large EO image size for the current quantum simulators, trainability of the quantum circuit, etc.
        This work introduces a hybrid Quantum-Classical model performing reconstruction and classification simultaneously and explores its application for EO image multi-class classification. Moreover, we investigate for the first time the correlation between different PQC descriptors and the training results in the realistic EO use case. The results demonstrate that the hybrid model successfully achieves up to 10 class classification suggesting a potential usage of QNNs for a realistic context, and also hint at generic approaches for choosing the suitable PQC architecture for a given problem.

        Speaker: Su Yeon Chang (EPFL - Ecole Polytechnique Federale Lausanne (CH))
    • Track 3: Computations in Theoretical Physics: Techniques and Methods Sala A+A1 (Villa Romanazzi)

      Sala A+A1

      Villa Romanazzi

      Conveners: Andrea Valassi (CERN), Domenico Colella (University and INFN Bari, Italy)
      • 234
        DMG4: a fully GEANT4-compatible package for the simulation of Dark Matter

        The search of New Physics through Dark Sectors is an exciting possibility to explain, among others, the origin of Dark Matter (DM). Within this context, the sensitivity study of a given experiment is a key point in estimating its potential for discovery. In this contribution we present the fully GEANT4-compatible Monte Carlo simulation package for production and propagation of DM particles, DMG4. In particular, we discuss the implementation of production cross-sections in its GEANT4-independent sub-package, DarkMatter, and DMG4 latest release, including a finer application programming interface (API) to GEANT4. We also cover its recent developments with faster and more accurate cross-sections computations, sampling methods, extended energy range, as well as the expansion of the package to $B-L$ and semi-visible models. We finally discuss the improvements in the simulations of New Physics processes specific to muon beams.

        Speaker: Henri Hugo Sieber (ETH Zurich (CH))
      • 235
        Run Dependent Monte Carlo at Belle II

        The Belle II is an experiment taking data from 2019 at the asymmetric e+e- SuperKEKB collider, a second generation B-factory, at Tsukuba, Japan. Its goal is to perform high precision measurements of flavor physics observables One of the many challenges of the experiment is to have a Monte Carlo simulation with very accurate modeling of the detector, including any variation occurring during data taking. To this goal, a dedicated “run dependent” Monte Carlo has been developed, using the detector conditions during data taking, as well as using beam induced background collected with random triggers. In this talk, the procedure for setup and processing of run-dependent Monte Carlo at Belle II will be shown.

        Speaker: Alberto Martini (DESY)
      • 236
        Unweighted event generation for multi-jet production processes based on matrix element emulation

        The generation of unit-weight events for complex scattering processes presents a severe challenge to modern Monte Carlo event generators. Even when using sophisticated phase-space sampling techniques adapted to the underlying transition matrix elements, the efficiency for generating unit-weight events from weighted samples can become a limiting factor in practical applications. Here we present the combination of a two-staged unweighting procedure with a factorisation-aware matrix element emulator using neural networks which we make accessible in the Sherpa event generation framework. The algorithm can significantly accelerate the unweighting process, while it still guarantees unbiased sampling from the correct target distribution. We apply, validate and benchmark the approach in high-multiplicity LHC production processes, including Z/W+4 jets and t¯t+3 jets, where we find speed-up factors up to 60.

        Speaker: Timo Janssen
      • 237
        Conditional Normalizing Flow for Markov Chain Monte Carlo Sampling in the Critical Region of Lattice Field Theory

        In Lattice Field Theory, one of the key drawbacks of the Markov Chain Monte Carlo(MCMC) simulation is the critical slowing down problem. Generative machine learning methods, such as normalizing flows, offer a promising solution to speed up MCMC simulations, especially in the critical region. However, training these models for different parameter values of the lattice theory is inefficient. We address this issue by interpolating or extrapolating the flow model in the critical region. We demonstrate the effectiveness of the proposed method for MCMC sampling in critical regions for multiple parameter values of phi4 scalar theory and U(1) gauge theory in 1+1 dimensions and compare its performance against HMC and flow-based methods.

        Speaker: ankur singha
    • 4:45 PM
      Excursion & Social dinner Bisceglie (Villa Ciardi)

      Bisceglie

      Villa Ciardi

    • Plenary Sala Europa (Villa Romanazzi Carducci)

      Sala Europa

      Villa Romanazzi Carducci

      Conveners: Fons Rademakers (CERN), Lucia Silvestris (Universita e INFN, Bari (IT))
      • 238
        Updates from the organizers
        Speakers: Axel Naumann (CERN), Lucia Silvestris (Universita e INFN, Bari (IT))
      • 239
        Graph Neural Networks and their application in IceCube

        The interpretation of detector data to observables that we can use to perform our physics analyses is an essential part in modern day experimental physics. It is also a field among the biggest profiteers in the recent advances of machine learning. In this contribution we want to highlight our event reconstruction efforts using Graph Neural Networks in the IceCube experiment. Using a pulse-based approach our network can adapt to the irregular architecture of our detector. We can show not only speed-ups on the order of magnitudes but also increases in reconstruction resolution of up to 20% compared to our current baseline algorithms. Our goal is to provide an easy-to-use but effective entry into machine learning-based event reconstruction for any physics
        purpose: from neutrino oscillations, over beyond-the-standard-model searches, to neutrino astronomy. In addition, our software package is not just compatible with the current IceCube experiment, but also for future extensions, like the IceCube Upgrade or Gen2, as well as any neutrino detector.

        Speaker: Martin Han Minh
      • 240
        Foundation Models for Accelerated Discovery

        AI is making an enormous impact on scientific discovery. Growing volumes of data across scientific domains are enabling the use of machine learning at ever increasing scale to accelerate discovery. Examples include using knowledge extraction and reasoning over large repositories of scientific publications to quickly study scientific questions or even come up with new questions, applying AI surrogate models to speed up simulation campaigns and generate critical new data and knowledge, leveraging generative models to construct new hypotheses and make predictions about them, and automating experimentation through robotic labs to enable tighter loops of hypothesis-test cycles. At the same time, new machine learning techniques based on “foundation models” are gaining focus in AI. Foundation models aim to learn “universal representations” from enormous amounts of data, typically using self-supervised or unsupervised training, with the goal to effectively enable subsequent downstream tasks. Prominent examples are large-language models, which have been driving state-of-the-art performance for natural language processing tasks. In this talk, we review how foundation models work by learning representations at scale and show examples of how they can further accelerate scientific discovery. By targeting bottlenecks in the scientific method, we discuss the potential of foundational models to impact a broad set of scientific challenges.

        Speaker: John Smith (IBM T. J. Watson Research Center)
      • 241
        Simpler, faster and bigger: HEP analysis in the LHC Run 3 era

        The production, validation and revision of data analysis applications is an iterative process that occupies a large fraction of a researcher's time-to-publication.
        Providing interfaces that are simpler to use correctly and more performant out-of-the-box not on