- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !
Welcome to the WLCG/HSF Workshop at DESY Hamburg, May 13-17 2024
Limited participation will be possible from remote, please register using the dedicated registration form if you wish to participate remotely. You will need to register to have access to the zoom connection(s).
Main workshop plenary session
Main workshop plenary session
In this contribution, we’ll review the current status of the ROOT project, characterising its structure, available effort and strategic goals. We’ll explain how in the recent years the energy flowing from the open source community changed ROOT and boosted the development, materialising in the form of code, reports, ideas and proposals. We’ll review the recently integrated features that are key for the remainder of Run 3’s analysis and data processing as well as the HL-LHC era. We’ll not only focus on I/O, statistical tools and analysis, but also on less advertised ROOT components, such as packaging, distribution, Python support and graphics, and how those interoperate with other tools in the ecosystem. Moreover, we’ll discuss how the project will evolve in the next years, continuing to be at the heart of CERN’s flagship activity, the LHC, and prepare to support forthcoming and future experiments.
Scikit-HEP is a community-driven and community-oriented project with the goal of providing an ecosystem for particle physics data analysis in Python fully integrated with the wider scientific Python ecosystem. The project started in Autumn 2016 and has evolved into a toolset of approximately thirty packages and a few “affiliated” packages.
It expands the typical Python data analysis tools for particle physicists,with packages spanning the spectrum from general scientific libraries for data manipulation to domain-specific libraries.Each package focuses on a particular topic, and interacts with other packages in the toolset, where appropriate. Interoperability between Particle Physics tools and the Python scientific ecosystem is an important aspect of the project. Most of the packages are easy to install in many environments; much work has been done to provide binary wheels on PyPI and conda-forge packages. The project has gained interest and momentum over the years, carefully building a user and developer community engaging collaboration across experiments. Some of the packages are being used by other projects and communities. Utilities started within Scikit-HEP have in the meantime made its way as contributions to the wide Scientific Python project - the development guide and repository reviewer.
An overview of the overall project and toolset will be presented, with comments on its history and evolution. Areas of particular relevance to community software, impact and engagement will be stressed. Future developments and matters of sustainability will be discussed.
Providing and maintaining the necessary tools for studying and developing detectors for future colliders is non trivial. On the one hand it requires a substantially sized software stack with all complications arising therefrom. On the other hand the available person power is usually strongly limited. In order to tackle both the Key4hep project aims at providing a complete software stack that can be used by all future collider communities, e.g. FCC, ILC, CEPC, EIC and MuonCollider among others.
In this presentation we give an overview and status update of the Key4hep project itself but will also dive deeper into some aspects. These include building and maintaining the stack with the spack package manager, key insights and experiences we gained while developing for different communities simultaneously and also how to connect different existing software tools into a coherent framework. Additionally, we will also talk about some of the currently ongoing developments and future plans.
The HSF-India initiative, which aims to implement new and impactful research software collaborations between India, Europe and the United States. The intent of this project is to increase the engagement of software experts in Asia with the HSF community. The starting point of this collaboration is a series of software workshops focused on building software skills. These workshops are the basis of a mutual training network that enables early-career researchers to pursue impactful research software initiatives in ways that advance their careers in experimental data-intensive science. Other project components include student projects and bidirectional researcher exchange programs. The experimental scope of this project is relatively broad, aiming to bring together researchers across facilities with common problems in research spanning experimental high-energy physics, nuclear physics and particle astrophysics. This talk will describe the scope of this initiative, its mechanisms for fostering new collaborations, and ways for interested research groups to get involved.
A number of analyses and performance groups in ATLAS use an analysis framework, written in C++ with python steering files, called xAODAnaHelpers (xAH). xAH is used to loop on events a variety of ATLAS analysis data formats, by using central software to calibrate, select and correct physics objects. xAH has been chosen as one of the EVERSE (European Virtual Institute for Research Software Excellence) pilot cases representing user analysis software in particle physics, given (a) its widespread use in a large collaboration (b) the fact that its modular and intuitive interface fits the needs of diverse analysis use cases that require custom calibrations and objects beyond traditional physics analyses and (c) the challenges that end-user analysis software faces when relying on centrally developed tools that are updated often but still need to have full backward compatibility for ongoing analyses.
After a brief description of the framework itself, this contribution will focus on the software development and maintenance practices for such a framework and on the development of tutorials for newcomers. It will also discuss plans for future work on software sustainability.
I will describe the current status of the Pyhtia8 project and some future developments that we are working with. I will also describe services offered by the Pythia8 collaboration such as on-line tutorials, and our GitLab help desk.
Phoenix is a TypeScript-based event display framework, created in response to the 2017 HSF community white paper.
It uses industry standard web tools (such as the popular three.js library for 3D rendering), and runs entirely in the client's web browser. It is experiment agnostic by design, providing shared common functionality (such as custom menus, controls, propagators) but also has support for experiment specific extensions for geometry and event data. It consists of two packages: a plain TypeScript core library (phoenix-event-display) and an Angular application for the UI (a React example is also provided in the documentation). Phoenix has been selected as a Google Summer of Code project for several years, and its contributors come from a wide variety of backgrounds. Recent developments have focused on improving event navigation and comprehension, with tools to better understand the relative position of objects, as well as native support for common formats such as EDM4HEP.
It is currently used by several experiments, including ATLAS, FCC, LHCb and Belle-II.
Main workshop plenary session
Proposal documents for HSF future organisation
HepMC3 is a library developed to handle the simulated collision events from Monte Carlo event Generators in High Energy Physics. The library is a successor in spirit of the earlier HepMC library and incorporated multiple ideas which appeared in the recent decade in the HEP community.
This contribution discusses in detail the recent developments of the HepMC3 project, the relation of HepMC3 to other projects in the community, and the perspective of future developments.
Simulations of neutrino interactions are playing an increasingly important role in the pursuit of high-priority measurements for the field of particle physics. A significant technical barrier for efficient development of these simulations is the lack of a standard data format for representing individual neutrino scattering events. We propose and define such a universal format, named NuHepMC, as a common standard for the output of neutrino event generators. The NuHepMC format uses data structures and concepts from the HepMC3 event record library adopted by other subfields of high-energy physics. These are supplemented with an original set of conventions for generically representing neutrino interaction physics within the HepMC3 infrastructure.
Conditions data is the subset of non-event data that is necessary to process event data. It poses a unique set of challenges, namely a heterogeneous structure and high access rates by distributed computing. As these challenges are similar across various High Energy Physics (HEP) and Nuclear Physics (NP) experiments, the HEP Software Foundation (HSF) hosted a forum to discuss and share experiences from different collaborations. This yielded a white paper on 'best practice' for conditions data access, and a corresponding chapter of the HSF Community White Paper. Based on this experience, the potential for an experiment-agnostic conditions database was evident. An HSF activity was created to publish a white paper on conditions data use cases and requirements, to provide the basis for a conditions database designed as Community Software.
This presentation will discuss the reference implementation, an 'HSF project' that satisfies those use cases and requirements. The reference implementation was developed in collaboration with sPHENIX, serving as the first real world application. This direct feedback provided a clearer understanding of the requirements, and additional implementation recommendations, while using the experts in the HSF activity to maintain its experiment-agnostic nature. In addition to sPHENIX, Belle II has also expressed interest in adopting the HSF reference implementation to benefit from its demonstrated scalability and performance.
Gaussino is an experiment-independent simulation package built upon the Gaudi software framework. It provides generic core components and interfaces for a complete HEP simulation application: event generation, detector simulation, geometry, monitoring and output of the simulated data. The generator interface allows for a wide variety of external event generator packages to be used, with an example implementation included for Pythia8. Detector simulation relies on the Geant4 toolkit for particle transport. It also provides a fast simulation interface to offload the simulation of specific sub-detectors to external processes, including GPU-accelerated and machine-learning-based options. Geometry descriptions can be provided through DD4Hep, GDML, experiment-specific software, or simple volumes specified at configuration time. Visualisation of the geometry and simulated data can be performed using the Geant4 visualisation driver or by saving the necessary objects for visualisation with Phoenix. Gaussino ensures a consistent multi-threaded execution between the various components and the underlying Gaudi infrastructure. This talk will focus on the features of Gaussino as a generic standalone application, giving examples for a diverse range of HEP experiments. Finally, the use of Gaussino as a toolkit to build experiment-specific applications will be covered, with LHCb's Gauss as an example.
The increasing computational demand in High Energy Physics as well as increasing concerns about energy efficiency in high performance/throughput computing are driving forces in the search for more efficient ways to utilize available resources. Since avoiding idle resources is key in achieving high efficiency, an appropriate measure is sharing of idle resources of under-utilized sites with fully occupied sites. The software COBalD/TARDIS can automatically, transparently and dynamically (dis)integrate such resources in an opportunistic manner.
However, resource sharing also requires accounting. This is done with AUDITOR (AccoUnting DatahandlIng Toolbox for Opportunistic Resources), a flexible and extensible accounting ecosystem that can cover a wide range of use cases and infrastructures. Accounting data is gathered via so-called collectors and stored in a database. So-called plugins can access the data and can act based on the accounting information.
An HTCondor collector, a Slurm collector and a TARDIS collector are currently available, and a Kubernetes collector is already being worked on.
The APEL plugin, for example, enables the creation of APEL accounting summaries and their transmission to the APEL accounting server. While the original goal of developing AUDITOR was to enable accounting for opportunistic resources managed by COBalD/TARDIS, it can also be used for normal accounting of a WLCG computing resource. Because AUDITOR uses a highly flexible data structure to store accounting data, extensions such as accounting GPU resources can be added with minimal effort.
An intense collaborative work is ongoing about the development and testing of RNTuple, the future HEP columnar storage software technology, involving LHC experiments, DUNE and the ROOT team.
In this contribution we’ll review the status of the plan of work of RNTuple, towards the freezing of the specification at the end of the year. We’ll review the new features of RNTuple, as well as the work items already delivered this year and the remaining ones. In particular, we’ll show how the currently available I/O infrastructure that already allows writing relevant experiment EDMs in the RNTuple format.
We’ll complement the aforementioned information with the latest performance plots.
The storage, transmission and processing of data is a major challenge across many fields of physics and industry. Traditional generic data compression techniques are lossless, but are limited in performance and require additional computation.
BALER [1,2] is an open-source autoencoder-based framework for the development of tailored lossy data compression models suitable for data from multiple disciplines. BALER models can also be used in FPGAs to compress live data from detectors or other sources, potentially allowing for massive increases in network throughput.
BALER is developed by a cross-disciplinary team of physicists, engineers, computer scientists and industry professionals, and has received substantial contributions from a large number of master’s and doctoral students. BALER has received support from industry both in providing datasets to develop BALER, and to transfer industry best practices.
This presentation will introduce BALER, demonstrate its performance on a range of data types, discuss the involvement of students and industry in the project and lessons learned, and include a live demonstration.
[1] https://arxiv.org/pdf/2305.02283.pdf
[2] https://github.com/baler-collaboration/baler
A fast turn-around time and ease of use are important factors for systems supporting the analysis of large HEP data samples. We study and compare multiple technical approaches.
This presentation will be about setting up and benchmarking the Analysis Grand Challenge (AGC) [1] using CMS Open Data. The AGC is an effort to provide a realistic physics analysis with the intent of showcasing the functionality, scalability and feature-completeness of the Scikit-HEP Python ecosystem.
I will present the results of setting up the necessary software environment for the AGC and benchmarking the analysis' runtime on various computing clusters: the institute SLURM cluster at my home institute, LMU Munich, a SLURM cluster at LRZ (WLCG Tier-2 site) and the analysis facility Vispa [2], operated by RWTH Aachen.
Each site provides slightly different software environments and modes of operation which poses interesting challenges on the flexibility of a setup like that intended for the AGC.
Comparing these benchmarks to each other also provides insights about different storage and caching systems. At LRZ and LMU we have regular Grid storage (HDD) as well as and SSD-based XCache server and on Vispa a sophisticated per-node caching system is used.
[1] https://github.com/iris-hep/analysis-grand-challenge
[2] https://vispa.physik.rwth-aachen.de/
Cloud data lake technologies have been used successfully in industry for analysis of exabyte scale datasets. The technologies that underly this architecture are
We will describe our work using a Trino distributed SQL engine to join selected event data with inference results. We will show how this architecture can eliminate the need to maintain analysis specific copies of datasets.
Main workshop plenary session
Contur (Constraints On New Theories Using Rivet) is a public python package sitting on top of Rivet and Yoda, which allows information on new BSM models to be extracted from particle-level differential cross section measurements from the LHC. BSM events simulated by a general-purpose MC event generator are "signal injected" into the fiducial phase space of hundreds of measurements simultaneously, allowing a rapid scan of a wide range of model parameters and signatures. Contur takes as input the Yoda histograms from Rivet, and so can interoperate with any generator producing HepMC events. However, it also has convenience methods for parameter scanning using Herwig, and is interfaced to the scanning machinery of Madgraph and GAMBIT.
Experimental High Energy Physics has entered an era of precision measurements. However, measurements of many of the accessible processes assume that the final states' underlying kinematic distribution is the same as the Standard Model prediction. This assumption introduces an implicit model-dependency into the measurement, rendering the reinterpretation of the experimental analysis complicated without reanalysing the underlying data. We present a novel reweighting method in order to perform reinterpretation of particle physics measurements. It makes use of reweighting the Standard Model templates according to kinematic signal distributions of alternative theoretical models, prior to performing the statistical analysis. The generality of this method allows us to perform statistical inference in the space of theoretical parameters, assuming different kinematic distributions, according to a beyond Standard Model prediction. We implement our method as an extension to the pyhf software and interface it with the EOS software, which allows us to perform flavor physics phenomenology studies. Furthermore, we argue that, beyond the pyhf or HistFactory likelihood specification, only minimal information is necessary to make a likelihood model-agnostic and hence easily reinterpretable. We showcase that publishing such likelihoods is crucial for a full exploitation of experimental results.
The Python HEP analysis ecosystem and its user base grew significantly in the last few years, and with it the need for advanced statistical inference tools involving likelihood fits; a core part of most analyses in HEP.
zfit started over five years ago with the goal to provide this capability, a library for model fitting in HEP: scalable - in terms of model building complexity and performance - and pythonic - well-integrated into the Python ecosystem.
After many iterations with users and a long development process, zfit reaches a maturity stage.
In this talk, we will go over the extensive feature set of zfit: from binned and unbinned fits, extensive model building and the ability to create custom models up to advanced likelihood building, weighted fits and a variety of available minimizers. Thanks to its modern numpy-like backend, TensorFlow, with just-in-time compilation and the ability to run on CPUs and GPU, zfit is highly performant. zfit is also well-embeded into the Scikit-HEP ecosystem and beyond: it seamlessly integrates for data loading, plotting and more statistical tools, and allows libraries that build sophisticade models, such as ComPWA and more, to use zfit for statistical inference.
NUISANCE is a neutrino event generator prediction comparison and tuning framework. It facilitates cross-section predictions for the five main event generators in use by the few-GeV neutrino scattering community, enabling non-expert users to compare predictions to over 350 neutrino cross-section measurements, from the historical to the cutting edge.
We are currently in the process of re-designing NUISANCE to meet the needs of next generation of neutrino experiments. A key goal for this effort is to tightly couple NUISANCE to HepData, which will allow us to offload the responsibility of managing experimental data releases back to the experimental collaborations via a repository expressly built for the job, HepData. A technological requirement for this is the ability to execute analysis code packaged in the HepData releases. This talk will introduce NUISANCE and discuss our approach to solving this problem, which is based on providing a standardised and extensible language-agnostic event-processing framework, with a working implementation in C++, leveraging HEP standard tools: HepMC3
and cling
.
In this session we will discuss the aspects of the strategy related to the infrastructure [INFRA]. The open items from the community feedback on the strategy include:
In the age of GPU-accelerated event generation, pivotal community tools like HepMC and Rivet, vital for event generation infrastructure and Monte Carlo event analysis, risk becoming significant bottlenecks in the near future.
We present an adaptable and highly efficient approach to simulating collider events featuring multi-jet final states, encompassing both leading and next-to-leading order QCD calculations. Rooted in an enhanced parton-level event file format with streamlined scalable data management, our technique offers a scalable solution for producing high-precision calculations on HPC clusters using modern hardware architectures. We verify the efficacy of our framework across various processes, notably Higgs boson plus multi-jet production with up to seven jets, and showcase its integration within the Sherpa and Pythia event generators. Augmented by an enhanced interface for data management in massively parallel applications in Rivet4, our approach represents a significant step towards facilitating efficient data-model comparisons and statistical interpretations in collider physics.
High-precision calculations are crucial for the success of the LHC physics programme. However, the rising computational complexity for high-multiplicity final states is threatening to become a limiting bottleneck in the coming years. At the same time, the rapid deployment of non-traditional GPU-based computing hardware in data centres around the world demands an overhaul of the event generator design.
We propose a flexible and efficient approach for simulating collider events with multi-jet final states, based on the first portable leading-order parton-level event generation framework, along with an GPU-accelerated version of LHAPDF for fast and efficient evaluation of parton distribution functions. Our approach lends itself neatly to most modern GPU-accelerated hardware, allowing to better exploit computing resources in large-scale production campaigns, and paving the way for economically and ecologically sustainable event generation in the high-luminosity era.
Since 2022, the LHCb detector is taking data with a full software trigger at the LHC proton-proton collision rate, implemented in GPUs in the first stage and CPUs in the second stage. This setup allows to perform the alignment & calibration online and to perform physics analyses directly on the output of the online reconstruction, following the real-time analysis paradigm.
This talk will focus on the first level of the LHCb trigger implementation on GPUs, discuss challenges of using a heterogeneous architecture and report on the experience from the first running periods in 2022 and 2023.
Reconstructing the tracks left by charged particles in modern HEP detectors is one of the most computationally challenging tasks in analyzing the data of modern experiments. During the High-Luminosity LHC era the LHC experiments, including ATLAS, will have to be able to process much more complex data at much higher rates than ever before.
To achieve this, GPU accelerated code has been developed as an R&D effort as part of the ACTS project (https://acts.readthedocs.io). With ATLAS preparing to use ACTS for all of its CPU based track reconstruction during LHC's Run-4, we plan to integrate the GPU accelerated algorithms/tools from ACTS into ATLAS's offline, and possibly trigger reconstruction.
In this talk we present the latest status of the ACTS Parallelization R&D effort, with updated (physics and computing) performance figures.
In this section we will discuss the financial part [FIN] of the WLCG strategy. Items to be discussed based on the feedback on the strategy document:
Main workshop plenary session
Main workshop plenary session
Main workshop plenary session
WLCG Strategy:
Brief conclusions from the Collaboration Board (minutes urgently needed):
Coffee/Beer discussions:
Main workshop plenary session