HSF WLCG Virtual Workshop

Europe/Zurich
Description

Workshop logos

The second edition of the HSF-WLCG virtual workshop series brings us together again to review progress and discuss plans in the key areas of software and computing for HEP.

This workshop will be run with parallel software and computing tracks.

Please note that workshop sessions will be RECORDED and the video made public afterwards.

HSF sessions are being uploaded to YouTube now and you will also find the recording of each talk attached in Indico.

The document summarising outcomes and follow-ups from the workshop is now available.


For software we plan sessions on:

  • Software plenary covering a wide spectrum of HSF activity
  • A focus session on event generation
  • A focus session on detector simulation, co-organised with Geant4
  • An open session, for software R&D where we invite your contributions

HSF Sessions Notebook


The computing sessions will focus on storage and its evolution towards the HL-LHC needs. 

WLCG Session Notebook  

Organisers:

  • Julia Andreeva, CERN
  • Simone Campana, CERN
  • Philippe Canal, Fermilab
  • Ian Collier, STFC
  • Gloria Corti, CERN
  • Jose Flix Molina, CIEMAT/PIC
  • Alessandra Forti, University of Manchester
  • Michel Jouvin, IJCLab
  • Teng Jian Khoo, Humbold University / University of Innsbruck
  • David Lange, Princeton University
  • Jonathan Madsen, LBNL
  • Josh McFayden, University of Sussex
  • Helge Meinhard CERN
  • Maarten Litmaath, CERN
  • Witek Pokorski, CERN
  • Oxana Smirnova, Lund University
  • Graeme A Stewart, CERN
  • Andrea Valassi, CERN
  • Mattias Wadestein, Umea
  • Efe Yazgan, National Taiwan University
Participants
    • 16:00
      It's the weekend, take a break!
    • 16:00
      It's the weekend, take a break!
    • Computing: Analysis Facilities and implications on storage
    • Software: Diverse R&D (Coffee break is internal - see detailed view)
      Conveners: David Lange (Princeton University (US)), Graeme A Stewart (CERN), Michel Jouvin (Université Paris-Saclay (FR)), Teng Jian Khoo (Humboldt University of Berlin (DE))
      • 62
        Introduction
        Speakers: David Lange (Princeton University (US)), Graeme A Stewart (CERN), Michel Jouvin (Université Paris-Saclay (FR)), Teng Jian Khoo (Humboldt University of Berlin (DE))
      • 63
        Phoenix Event Display

        Visualising HEP experiment data is vital for physicists trying to debug their reconstruction software, to examine detector geometry or to understand physics analyses, and also for outreach and publicity purposes. Traditionally experiments used in-house applications which required installation (often as part of a much larger experiment specific framework). In recent years, web-based event/geometry displays have started to appear, which dramatically lower the entry-barrier to use, but typically were still per-experiment.

        Phoenix was adopted as part of the HSF visualisation activity: a TypeScript-based event display framework, using the popular three.js library for rendering. It is experiment agnostic by design, with shared common tools (such as custom menus, controls, propagators) and the ability to add experiment specific extensions. It consists of two packages: a plain TypeScript core library (phoenix-event-display) and an Angular application (a React example is also provided in the documentation).The core library can be adapted for any experiment with some simple steps. It has been selected for Google Summer of Code the last two years, and is ATLAS’ officially supported web-event display. This talk will focus on the status, as well as recent developments, such as WebXR prototypes, interface improvements and the Runge-Kutta propagator.

        Speaker: Edward Moyse (University of Massachusetts (US))
      • 64
        Discussion
      • 65
        High-throughput data analysis with modern ROOT interfaces

        With the upcoming start of LHC Run III and beyond, HEP data analysis is facing a large increase in average input dataset sizes. At the same time, balancing analysis software complexity with the need to extract as much performance as possible from the latest HPC hardware is still often difficult.
        Recent developments in ROOT significantly lower the energy barrier for the development of high-throughput data analysis applications. This was achieved through a unique combination of ingredients: a high-level and high-performance analysis framework; just-in-time compilation of C++ code for efficient I/O and usability enhancements; automatic generation of Python bindings; transparent offloading of computations to distributed computation engines such as Spark.
        The resulting simplified data analysis model has enabled a whole range of R&D activities that are expected to deliver further acceleration, such as context-aware caching.
        This talk will provide an overview of recent developments in ROOT as an engine for high-throughput data analysis and how it is employed in several existing real-world usecases.

        Speakers: Dr Enrico Guiraud (EP-SFT, CERN), Mr Vincenzo Eduardo Padulano (Valencia Polytechnic University (ES)), Mr Stefan Wunsch (KIT - Karlsruhe Institute of Technology (DE))
      • 66
        Discussion
      • 67
        bamboo: easy and efficient analysis with python and RDataFrame

        The bamboo analysis framework [1] allows to write simple declarative analysis code (it effectively implements a domain-specific language embedded in python), and runs it efficiently using RDataFrame (RDF) - or viewed differently: it introduces a set of tools to efficiently generate large RDF computation graphs from a minimal amount of user code (in python), e.g. a simple way to specify selections and outputs, automatically filling a set of histograms with different systematic variations of some input variables.
        It is currently being used for several analyses on the full CMS Run2 dataset, and thus provides an example of a very analysis description language-like approach that is compatible with the practical needs of modern HEP data analysis (different types of corrections, machine learning inference, user-provided extensions, combining many input samples and scaling out to a batch cluster etc.).

        [1] https://cp3.irmp.ucl.ac.be/~pdavid/bamboo/

        Speaker: Pieter David (Universite Catholique de Louvain (UCL) (BE))
      • 68
        Discussion
      • 69
        Analysis Description Language for LHC-type analyses

        Physicists aiming to perform an LHC-type analysis today are facing a number of challenges: intense computing knowledge is needed at programming level to implement the relevant algorithm, and at system level to interact with the ever evolving sets of analysis frameworks for interfacing with the analysis object information. Moreover, the ambiguity concerning the configuration of the overall computing environment impairs the reproduction of previous results. To overcome at least some of these difficulties, we propose the utilization of an Analysis Description Language (ADL), a domain specific, declarative language capable of describing the contents of an LHC analysis in a standard and unambiguous way, independent of any computing framework. Such a language decouples the computer intense aspects such as data access from the actual physics algorithm. It would therefore benefit both the experimental and phenomenological communities by facilitating the design, validation, combination, reproduction, interpretation and overall communication of the analysis contents. It would also help to preserve the analyses beyond the lifetimes of experiments or analysis software.
        This presentation aims to introduce the ADL concept and summarize the current efforts to make it realistically usable in LHC analyses. In particular, the work that has been ongoing to develop transpiler and interpreter systems adl2tnm and CutLang, to implement various example analyses as well as documentation and validation efforts will be presented.

        Speakers: Gokhan Unel (University of California Irvine (US)), Sezen Sekmen (Kyungpook National University (KR)), Harry Prosper (Florida State University (US))
      • 70
        Discussion
      • 71
        podio - latest developments and new features of a flexible EDM toolkit

        Creating efficient event data models (EDMs) for high energy physics (HEP) experiments is a non-trivial task. Past approaches, employing virtual inheritance and possibly featuring deep object-hierarchies, have shown to exhibit severe performance limitations. Additonally, the advent of multi-threading and heterogenous computing poses further constraints on how to efficiently implement EDMs and the corresponding I/O layer. podio is a c++ toolkit for the creation of EDMs with a fast and efficient I/O layer using plain-old-data (POD) structures wherever possible. Physicist users are provided with a high-level interface of lightweight handle classes. The podio code generator that produces all the necessary c++ code from a high-level description in YAML files has recently been completely reworked to improve maintainability and extensibility. We will briefly discuss the new implementation and present, as a first use case, how it has been used to introduce an additional I/O backend based on SIO, a simple binary I/O library that is also used in LCIO. We will further discuss our first implementation of providing access to metadata, i.e. data that does not fit into the EDM itself. Finally, we will show how all of these capabilities are put to use in EDM4hep, the EDM for the Key4hep project.

        Speaker: Thomas Madlener (Deutsches Elektronen-Synchrotron (DESY))
      • 72
        Discussion
      • 17:16
        Coffee and Cake
      • 73
        Use of auto-differentiation within the ACTS tookit

        The use of first and higher order differentiation is essential for many parts of track reconstruction: either as part of the transport of track parameters through the detector, in several linearization applications, and for establishing the detector alignment. While in general those derivations are well known, they can be complex to derive and even more difficult to be validated. The latter is often done with numerical cross checking using a Ridder's algorithm or similar approaches. The vast development of machine learning application in the last years has also renewed interest in algorithmic differentiation techniques, that uses compiler or runtime techniques to compute exact derivates from function expressions, surpassing the precision achievable via standard numerical differerntiation based on finite differerences.
        ACTS is a common track reconstruction toolkit that aims to preserve the tack reconstruction software from the LHC era and at the same time prepares a R&D testbed for further algorithm and technology research. We present the successful inclusion of the auto-diff library into the ACTS propagation and track based alignment modules that serves as a complimentary way to calculate transport jacobians and alignment derivatives: the implementation within the ACTS software is shown, and the validation and CPU time comparison with respect to the implemented analytical or numerically determined expressions are given.

        Speaker: Mr Huth Benjamin (University of Regensburg)
      • 74
        Discussion
      • 75
        Reconstruction for Liquid Argon TPC Neutrino Detectors Using Parallel Architectures

        Neutrinos are particles that interact rarely, so identifying them requires large detectors which produce lots of data. Processing this data with the computing power available is becoming more challenging as the detectors increase in size to reach their physics goals. Liquid argon time projection chamber (TPC) neutrino experiments are planned to grow by 100 times in the next decade relative to currently operating experiments, and modernization of liquid argon TPC reconstruction code, including vectorization and multi-threading, will help to mitigate this challenge. The liquid argon TPC hit finding algorithm used across multiple experiments, through the LArSoft framework, has been vectorized and multi-threaded. This increases the speed of the algorithm up to 200 times within a standalone version on Intel architectures. This new version of the hit finder has been incorporated back into LArSoft so that it can be used by experiments. To fully take advantage of this implemented parallelism, an experiment workflow is being developed to run LArSoft at a high performance computing center. This will be used to produce samples as part of a central processing campaign.

        Speaker: Sophie Berkman (Fermi National Accelerator Laboratory)
      • 76
        Discussion
      • 77
        GPU-accelerated machine learning inference for offline reconstruction and analysis workflows in neutrino experiments

        Future neutrino experiments like DUNE represent big-data experiments that will acquire petabytes of data per year. Processing this amount of data itself is a significant challenge. In recent years, however, the use of deep learning applications in the reconstruction and analysis of data acquired by LArTPC-based experiments has grown substantially. This will impose an even bigger amount of strain on the computing requirements of these experiments since the CPU-based systems used to run offline processing are not well suited to the task of deep learning inference. To address this problem, we adopt an "as a Service" model where the inference task is provided as a web service. We demonstrate the feasibility of this approach by testing it on the full reconstruction chain of ProtoDUNE using fully simulated data, where the GPU-based inference server is hosted on the Google Cloud Platform. We present encouraging results from our tests that include detailed studies of scaling behavior. Based on these results, the "as a Service" approach shows great promise as a solution for the growing computing needs of future neutrino experiments which are associated with deep-learning inference tasks.

        Speaker: Tingjun Yang (Fermi National Accelerator Lab. (US))
      • 78
        Discussion
      • 79
        GPU-based tracking with Acts

        At future hadron colliders such as the High-Luminosity LHC(HL-LHC), tens of thousands of particles can be produced in a single event, which results in a very challenging tracking environment. The estimated CPU resources required by the event processing at the HL-LHC could well exceed the available resources. To mitigate this problem, modern tracking software tends to gain performance by taking advantage of modern computing techniques on hardware such as multi-core CPUs or GPUs with the capability to process many threads in parallel.

        The Acts (A Common Tracking Software) project encapsulates the current ATLAS tracking software into an experiment-independent toolkit designed for modern computing architectures. It provides a set of high-level track reconstruction tools agnostic to the details of the detector and magnetic field configuration. Particular emphasis is placed on thread-safety of the code in order to support concurrent event processing with context-dependent detector conditions, such as detector alignments or calibrations. Acts also aims to be a research and development platform for studying innovative tracking techniques and exploiting modern hardware architectures. The multi-threaded event processing on multi-core is supported by using the Intel Thread Building Block (TBB) library. It also provides plugins for heterogeneous computing, such as CUDA and SYCL/oneAPI, and contains example code that could be offloaded to a GPUs, for instance, the Acts seed finder.

        In this talk, I will present a summary of the R&D activities to explore parallelism and acceleration of elements of track reconstruction using GPUs, such as the GPU-based seed finding, geometry navigation and Kalman fitting, based on the Acts software. The strategies of GPUs implementation will be shown. Both the achieved performance and the encountered difficulties will be discussed.

        Speaker: Xiaocong Ai (DESY)
      • 80
        Discussion
      • 81
        Investigating Portable Heterogeneous Solutions with Fast Calorimeter Simulation

        Physicists at the Large Hadron Collider (LHC), near Geneva,
        Switzerland, are preparing their experiments for the high
        luminosity (HL) era of proton-proton collision data-taking. In
        addition to detector hardware research and development for
        upgrades necessary to cope with the more than two-fold increase
        in instantaneous luminosity, physicists are investigating
        potential heterogeneous computing solutions to address CPU
        limitations that could be detrimental to an otherwise successful
        physics program.

        At the dawn of supercomputers employing a wide range of
        architectures and specifications, it is crucial that experiments'
        software be much as possible abstracted away from the underlying
        hardware implementation in order to utilize the vast array of
        these machines. New developments in application programming
        interfaces (APIs) aim to be architecture-independent, providing
        the ability to write single-source codes that can be compiled for
        virtually any hardware. In this talk, we present the details of
        our work on a cross-platform software prototyping with Kokkos, a
        single source, performant parallel C++ API that provides hardware
        backends for wide range of parallel architectures, including
        NVIDIA, AMD, Intel, OpenMP and pThreads, and SYCL, an abstraction
        layer whose specification is defined by the Khronos Group and
        members from industry-leading entities such as Intel. Using
        ATLAS’s new fast calorimeter simulations codes, FastCaloSim, as a
        testbed, we evaluate Kokkos and SYCL in terms of its
        heterogeneity and its performance with respect to other parallel
        computing APIs.

        Speakers: Vincent Pascuzzi (Lawrence Berkeley National Lab. (US)), Dr Charles Leggett (Lawrence Berkeley National Lab (US))
      • 82
        Discussion
      • 83
        Closing Remarks
        Speakers: David Lange (Princeton University (US)), Graeme A Stewart (CERN), Michel Jouvin (Université Paris-Saclay (FR)), Teng Jian Khoo (Humboldt University of Berlin (DE))
    • Computing: HPC/cloud storage integration
    • 17:30
      Cofee and cake
    • Computing: Discussion of the WS summary draft and future activities