PyHEP.dev 2024 - "Python in HEP" Developer's Workshop

Name: PyHEP.dev 2024 - "Python in HEP" Developer's Workshop
Start: 2024-08-26T07:30:00+02:00
End: 2024-08-30T20:00:00+02:00
Location: Aachen, Germany

26 Aug 2024, 07:30 → 30 Aug 2024, 20:00 Europe/Brussels

Aachen, Germany

Erholungs-Gesellschaft Reihstraße 13, 52062 Aachen

Description

PyHEP.dev is an in-person, informal workshop for developers of Python software in HEP to plan a coherent roadmap and make priorities for the upcoming year. It complements the PyHEP Users online workshop, which is intended for both developers and physicists.

Both PyHEP workshops are supported by the HEP Software Foundation (HSF). Further information is on the PyHEP Working Group website.

The agenda will consist of morning kick-off talks and afternoon discussions, in which the discussion groups and topics are self-assigned. Pre-workshop organization is happening here, via GitHub Issues.

You are encouraged to register to the PyHEP WG Gitter channel and/or to the HSF forum to receive further information concerning the organisation of the workshop. Workshop updates and information will also be shared on the workshop Twitter in addition to email. Follow the workshop at @PyHEPConf.

Questions and Discussions document

Organising Committee

Eduardo Rodrigues - University of Liverpool (Chair)
Jim Pivarski - Princeton University
Nikolai Hartmann - Ludwig Maximilian University of Munich
Matthew Feickert - University of Wisconsin-Madison

Local Organising Committee

Peter Fackeldey - RWTH Aachen University & ErUM-Data-Hub
Angela Warkentin - ErUM-Data-Hub

The workshop is sponsored by, and organized in cooperation with, the ErUM-Data-Hub. The ErUM-Data-Hub is the central networking and transfer office for digital transformation in research on universe and matter in Germany and is funded by the German Federal Ministry of Education and Research (BMBF).

This event is also kindly sponsored by the Python Software Foundation.

Contact

pyhepdev2024-organisation@cern.ch

Participants

26 View full list

Monday 26 August
- 08:30
  
  Coffee
- 1
  
  Welcome slides
  
  Erdmann-20240826.pdf
  
  PyHEP.dev 2024 - Welcome.pdf
- Kick-off talks
  - 2
    
    Self-introduction: Eduardo Rodrigues
    
    Speaker: Eduardo Rodrigues (University of Liverpool (GB))
    
    2024-08-26_PyHEP-dev2024_slide-deck.pdf
  - 3
    
    Self-introduction: Juraj Smiesko
    
    Speaker: Juraj Smiesko (CERN)
    
    PyHEP.dev 2024 Self Introduction_ Juraj Smiesko.pdf
    
    PyHEP-dev intro: Juraj Smiesko
  - 4
    
    Self-introduction: Jan Bürger
    
    Speaker: Jan Bürger (ErUM-Data-Hub)
    
    self-introduction-pyhepdev_jan oneslide.pdf
  - 5
    
    Self-introduction: Jim Pivarski
    
    Speaker: Jim Pivarski (Princeton University)
    
    Google Slides
    
    PDF
  - 6
    
    Self-introduction: Lino Oscar Gerlach
    
    Speaker: Lino Oscar Gerlach (Princeton University (US))
    
    intro_lino_gerlach.pdf
  - 7
    
    Self-introduction: Josue Molina
    
    Speaker: Josue Molina
    
    JMolina_PyHEP-dev2024.pdf
  - 8
    
    Self-introduction: Ianna Osborne
    
    Speaker: Ianna Osborne (Princeton University)
    
    PyHEP2024-dev Ianna Osborne.pdf
  - 9
    
    Self-introduction: Máté Farkas
    
    Speaker: Mate Farkas (Rheinisch Westfaelische Tech. Hoch. (DE))
    
    Máté Farkas Self-Introduction-1.pdf
  - 10
    
    Self-introduction: Yaroslav Nikitenko
    
    Speaker: Yaroslav Nikitenko
    
    Nikitenko_introduction.pdf
  - 11
    
    Fast end-to-end analysis pipelines for the HL-LHC
    
    We provide an overview of two ongoing projects that aim to ensure the availability of fast and user-friendly solutions for physics analysis pipelines towards the HL-LHC. The Analysis Grand Challenge (AGC) defines an analysis task that captures relevant physics analysis workflow aspects. A variety of implementations have been developed for this task, allowing to probe user experience and interoperability, and helping to center community discussions around a common benchmark. We will focus on the reference implementation provided by IRIS-HEP, which makes use of many tools in the Python HEP ecosystem and in particular a stack of Scikit-HEP libraries.
    
    A second project started in 2024 with a focus on specifically achieving very large data throughput in a physics analysis context. The project is often referred to by its target, "200 Gbps", for sustaining such a data rate. The project involved a collaboration between many areas of expertise, but we describe the user-facing software aspect of it, which we built with libraries from Scikit-HEP and the surrounding ecosystem.
    
    Speakers: Alexander Held (University of Wisconsin Madison (US)), Oksana Shadura (University of Nebraska Lincoln (US))
    
    AGC + IDAP pyhep.dev 2024.pdf
- 11:00
  
  Coffee break
- Discussion: What is a HEP analysis? What does PyHEP cover?
- 12:30
  
  Lunch
- Discussion: Challenges - AGC & 200Gbps
  - 12
    
    200 Gbps notebook links
    
    200 Gbps notebook shown live
    
    quick pyhep.dev 2024 look at uproot / awkward / Dask / coffea
- 15:00
  
  Coffee break
- Hacking
- Welcome Reception & Dinner
Tuesday 27 August
- 08:30
  
  Coffee
- Kick-off talks
  - 13
    
    Self-introduction: Peter Fackeldey
    
    Speaker: Manfred Peter Fackeldey (RWTH Aachen University (DE))
    
    self-introduction-pyhepdev.pdf
  - 14
    
    Self-introduction: Stefan Fröse
    
    Speaker: Stefan Fröse (ErUM-Data-Hub)
    
    PyHEP.dev-2.pdf
  - 15
    
    Self-introduction: Matthew Feickert
    
    Speaker: Matthew Feickert (University of Wisconsin Madison (US))
    
    intro_slide_feickert_2024-08-27.pdf
  - 16
    
    Self-introduction: Jonas Eschle
    
    Speaker: Jonas Eschle (Syracuse University (US))
    
    google slides
  - 17
    
    Self-introduction: Alexander Held
    
    Speaker: Alexander Held (University of Wisconsin Madison (US))
    
    20240827_pyhepdev_intro.pdf
  - 18
    
    Self-introduction: Giordon Holtsberg Stark
    
    Speaker: Dr Giordon Holtsberg Stark (University of California,Santa Cruz (US))
    
    20240826_PyHEPAboutGiordon.pdf
  - 19
    
    Self-introduction: Marcel Rieger
    
    Speaker: Marcel Rieger (Hamburg University (DE))
    
    marcel_rieger.pdf
  - 20
    
    Self-introduction: Jonas Eppelt
    
    Speaker: Jonas Eppelt (Karlsruher Insititute of Technology (KIT))
    
    Selfpresentation_PyHEPdev2024.pdf
  - 21
    
    Self-introduction: Alexander Heidelbach
    
    Speaker: Alexander Heidelbach
    
    Selfpresentation_AlexHeidelbach_PyHep.pdf
  - 22
    
    Self-introduction: Vincenzo Eduardo Padulano
    
    Speaker: Dr Vincenzo Eduardo Padulano (CERN)
    
    padulano_pyhep_dev_self_presentation.pdf
  - 23
    
    An overview of the fitting ecosystem
    
    This talk will give a broad overview on the fitting that we're doing in
    HEP. On one hand, the talk will cover the variety of fits in HEP, the
    different needs and types of inference as well as efforts for
    serialization and standardization. On the other hand, the relevant
    libraries will be covered, that is zfit, pyhf, hepstats iminuit and
    Python packages like SciPy and how they work together today, as well as
    the future plans and technical considerations.
    
    Speaker: Jonas Eschle (Syracuse University (US))
    
    google slides
  - 24
    
    evermore: differentiable (binned) likelihoods in JAX
    
    I'd like to present evermore (https://github.com/pfackeldey/evermore) that focusses in efficiently building and evaluating likelihoods typically for HEP. Currently, it focusses on binned template fits.
    It supports autodiff, JIT-compilation and vectorization of full fits (even on GPUs).
    
    Speaker: Manfred Peter Fackeldey (RWTH Aachen University (DE))
    
    evermore-pyhepdev2024.pdf
- 25
  
  Group photo
- 11:07
  
  Coffee break
- Discussion: Building and evaluating likelihoods
- 12:30
  
  Lunch
- Discussion: Statistical model serialisation
- 15:00
  
  Coffee break
- Hacking
Wednesday 28 August
- 08:30
  
  Coffee
- Kick-off talks
  - 26
    
    Plothist - integrating with other histogram libraries (Remote talk)
    
    Speakers: Cyrille Praz, Tristan Fillinger (KEK / IPNS)
    
    plothist_PyHEPdev.pdf
  - 27
    
    Self-introduction: Saransh Copra
    
    Speaker: Saransh Chopra (Princeton University (US))
    
    pyhep.dev.pdf
  - 28
    
    Self-introduction: Oksana Shadura
    
    Speaker: Oksana Shadura (University of Nebraska Lincoln (US))
    
    oshadura(1).pdf
  - 29
    
    Self-introduction: Nikolai Hartmann
    
    Speaker: Nikolai Hartmann (Ludwig Maximilians Universitat (DE))
    
    nikolai_intro_pyhep.dev-2024.pdf
  - 30
    
    b2luigi — bringing batch 2 luigi!
    
    Workflow managers help structure the code of pipelined jobs by defining and managing dependencies between tasks in a clear and easy-to-understand fashion. This abstraction allows independent tasks to be automatically parallelised more independently of computing systems. Additionally, workflow managers help keep track of different tasks’ outputs and inputs.
    
    b2luigi is an extension of the workflow manager luigi and offers easy integration with batch systems such as HTCondor and LSF, allowing the combination of different systems within one workflow.
    
    b2luigi also provides additional interfaces tailored for Belle II workflows, allowing smooth interaction with the Belle II analysis software framework and distributed computing. Workflows such as VIBE, an automated Monte Carlo validation framework, the Systematics Framework, and many Belle II physics analyses have been automated using b2luigi.
    
    As the current maintainers of b2luigi and Belle II users, we look forward to discussing our experiences and plans for this tool at the PyHep 2024 conference.
    
    Speakers: Alexander Heidelbach, Jonas Eppelt (Karlsruher Insititute of Technology (KIT))
    
    b2luigi phyhep.dev '24
    
    b2luigi - PyHEP.dev '24.pdf
  - 31
    
    End-to-end workflow automation: updates of the luigi analysis workflow package
    
    Physicists performing data analyses are usually required to steer their individual, complex workflows manually, frequently involving job submission in several stages and interaction with distributed storage systems by hand. This process is not only time-consuming and error-prone, but also leads to undocumented relations between particular workloads, rendering the steering of an analysis a serious challenge, especially for newcomers to the field. In this presentation, I will demonstrate the main components of the Luigi Analysis Workflow (Law) package which is developed independently of any experiment or the language of executed code. Its core consists of flexible, pythonic workflow descriptions, interfaces to remote batch job and storage systems, as well as a granular environment sandboxing mechanism. In the second half, I will highlight the recent key changes to the package that were driven by requests of the user base that increased steadily over the past years.
    
    Speaker: Marcel Rieger (Hamburg University (DE))
    
    2024-08-28_pyhepdev_law.pdf
  - 32
    
    waluigi - Beyond luigi
    
    Workflows for research in HEP experiments are not only quite complex but also require sufficient flexibility to adapt to changes in structuring, conditions, methodologies, and research interests. This holds especially true in the physics analyses extracting the results and measurements.
    Here, the use of workflows systems, specifically Luigi, have shown to be of great use to manage and organize the intricate dependencies of large task graphs that describe such analyses.
    Still, with intensive use comes the insight of where limitations lie. Now, as adoption and use of such software is rising, is good point to start thinking about how to improve upon it.
    I present a list of grievances and an idea to to address them, both of them to be discussed and iterated upon. While the "issues" are specific to the principles within Luigi, the current idea implies the need for a new software package: waluigi - Why Another LUIGI
    
    Speaker: Benjamin Fischer (RWTH Aachen University (DE))
    
    PyHEPdev_waluigi.pdf
  - 33
    
    offloading @ coffea
    
    Offloading resource intensive tasks, i.e.:
    - histograms (accumulation) - memory intensive
    - DL algorithms - compute intensive
    
    Speaker: Benjamin Fischer (RWTH Aachen University (DE))
    
    PyHEPdev_HugeDenseHistograms.pdf
- 11:00
  
  Coffee break
- Discussion: Workflows
- 34
  
  Sustainability in computing
  
  Speaker: Martin Erdmann (Rheinisch Westfaelische Tech. Hoch. (DE))
  
  Sustainability-Erdmann-20240828.pdf
- 12:30
  
  Lunch
- Discussion: Histogramming
- 15:00
  
  Coffee break
- Hacking
- 35
  
  Workshop Dinner
  
  Ratskeller Aachen, Markt 40, 52062 Aachen
Thursday 29 August
- 08:30
  
  Coffee
- Kick-off talks
  - 36
    
    Job openings in the ROOT team
    
    Speaker: Dr Vincenzo Eduardo Padulano (CERN)
    
    root_openings.pdf
  - 37
    
    Self-introduction: Benjamin Fischer
    
    Speaker: Benjamin Fischer (RWTH Aachen University (DE))
    
    Self-Introduction.pdf
  - 38
    
    Self-introduction: Azzah Alshehri
    
    Speaker: Azzah Aziz Alshehri (University of Glasgow (GB))
  - 39
    
    File synchronization between Linux systems in Python with yarsync
    
    Yet Another Rsync is a Python wrapper around a well-established Linux tool rsync with a simple and familiar interface of git. Python allows us to create a higher-level instrument, which is safer and sometimes more efficient than the original binary.
    
    While many data analysts today heavily use databases and rely on cloud computing, other approaches have also their benefits. Many data kinds are difficult to represent in relational databases or it takes time to do that. Files in a user-defined format become a simpler and more general solution, which is often less expensive and error prone. Linux servers take a considerable share today, and many data analysts also use Linux as a good programming environment. Our approach is inspired by data analysis workflow in HEP. We shall tell about creating data repositories with yarsync, relevant rsync features and how the tool will assist against possible problems in data synchronization.
    
    Speaker: Yaroslav Nikitenko
    
    yarsync.pdf
  - 40
    
    Architectural framework for data analysis Lena
    
    The term «architecture» in software has numerous definitions. Ultimately it defines whether your analysis code will be extensible and maintainable. We propose an architecture based on the functional style and separation of data, logic and presentation. It is implemented in a free software framework Lena.
    
    Lena is a general data analysis framework in Python, named after a great Siberian river. It allows usage of any Python constructs and functions, but structures the analysis into reusable sequences and elements. It natively supports metadata (which is important for modern data analysis). It employs lazy evaluation, which makes it suitable for processing data which would not fit into memory, in particular, for big data analysis.
    
    The talk will be of primary interest to those who write large programs and face architectural challenges and who need to automatically create many similar plots. The audience will get a powerful tool, which would make their code structured and beautiful, or understand strengths and weaknesses of an alternative approach to data analysis in Python.
    
    Speaker: Yaroslav Nikitenko
    
    lena_framework.pdf
  - 41
    
    FCCAnalyses: A Framework for Future Circular Collider Physics Performance Studies
    
    Physics performance analyses provide essential input for defining detector requirements in the Future Circular Collider (FCC) project. To streamline these analyses, we employ FCCAnalyses, a software framework built on top of the ROOT DataFrame.
    
    Among the functionalities offered by the FCCAnalyses are:
    * Standard set of RDataFrame EDM4hep functions: Events can be analysed directly in the EDM4hep event data format with ability to easily work with relationships among the datamodel objects.
    * Multi-stage Analysis Workflow: Enables running analyses locally or on CERN's HTCondor cluster split into multiple stages.
    * Metadata Management: Manages metadata associated with centrally produced pre-generated samples produced in Delphes fast simulation and Geant4 full simulation.
    
    The framework can be used equivalently well from Python and C++ and as the analyser functions are written in C++, this gives them possibility to directly employ any of the High Energy Physics (HEP) C++ frameworks.
    
    Speaker: Juraj Smiesko (CERN)
    
    FCCAnalyses
    
    FCCAnalyses_ A Framework for FCC Physics Performance Studies.pdf
  - 42
    
    Bridging Python and Julia for Enhanced Data Analysis
    
    Let’s discuss the exciting world of combining Python and Julia for data analysis for high-energy physics (HEP) and other data-intensive fields.
    
    We'll kick things off with a quick overview of why Python is so popular for data analysis and introduce Julia, which is making waves with its incredible performance and suitability for scientific computing.
    
    Next, I'll show you how we can get the best of both worlds. We'll talk about using PythonCall to bring Python functions and libraries into Julia and how we can embed Julia code right into our Python scripts using JuliaCall. It's easier than you might think!
    
    I'll walk you through some practical examples where mixing Python and Julia really shines. We'll look at real-world scenarios and see how this combination can speed up our data analysis and make our work more efficient.
    
    Of course, there are always some bumps in the road, so I'll share some common challenges you might face and how to overcome them. We'll cover best practices for managing dependencies and keeping everything running smoothly.
    
    Finally, we'll look ahead to the future. There's so much potential for deeper integration and community-driven innovation. I hope to inspire you to explore these possibilities and collaborate with other developers to push the boundaries of what's possible.
    
    By the end of this talk, you'll have a good grasp of how to mix Python and Julia in your projects and leverage the strengths of both languages to supercharge your data analysis.
    
    Speaker: Ianna Osborne (Princeton University)
    
    PyHEP2024-dev Bridging Python and Julia for Enhanced Data Analysis.pdf
  - 43
    
    A Deep Dive into PocketCoffea
    
    PocketCoffea is a python columnar analysis framework based on coffea for CMS NanoAOD events. It provides a workflow for HEP analyses using a combination of customizable abstractions and configuration files. The package features dataset query automatisation, jet calibration, data processing, histogramming and plotting. PocketCoffea also provides support for code execution on various remote clusters out-of-the-box.
    In this talk, a detailed overview of PocketCoffea from a user's and from a technical perspective is going to be given.
    
    Speaker: Mate Farkas (Rheinisch Westfaelische Tech. Hoch. (DE))
    
    A Deep Dive into PocketCoffea.pdf
- 11:05
  
  Coffee break
- Discussion: RDataframe/Coffea analyses (at scale)
- 12:30
  
  Lunch
- Discussion: Future of PyHEP.dev
- 44
  
  Sightseeing Tour
  
  Starting and ending at Erholungsgesellschaft (workshop venue)
- 18:00
  
  Optional Dinner at "60 Seconds to Napoli" - feel free to join!
  
  Markt 17, 52062 Aachen
Friday 30 August
- 08:30
  
  Breakfast
- 45
  
  Organizing the paper-writing
- Discussion: Paper-writing
- 11:00
  
  Coffee break
- Discussion: Paper-writing
  - 11:30
    
    Coffee break
- 46
  
  Close-out
  
  Farewell.pdf
- 12:30
  
  Lunch

Choose timezone

PyHEP.dev 2024 - "Python in HEP" Developer's Workshop

Aachen, Germany

Organising Committee

Local Organising Committee