PyHEP 2020 (virtual) Workshop

Name: PyHEP 2020 (virtual) Workshop
Start: 2020-07-13T08:00:00-05:00
End: 2020-07-17T18:15:00-05:00
Location: No location set

13 Jul 2020, 08:00 → 17 Jul 2020, 18:15 US/Central

Benjamin Krikler (University of Bristol (GB)), Eduardo Rodrigues (University of Liverpool (GB)), Jim Pivarski (Princeton University), Matthew Feickert (Univ. Illinois at Urbana Champaign (US))

Description

The PyHEP workshops are a series of workshops initiated and supported by the HEP Software Foundation (HSF) with the aim to provide an environment to discuss and promote the usage of Python in the HEP community at large. Further information is given on the PyHEP WG website.

PyHEP 2020 will be a virtual workshop given the worldwide conditions related to the COVID-19 pandemic. It was meant to be held in Austin (Texas), USA, on 11-13 July 2020, co-locating with the SciPy 2020 conference on scientific computing in Python, will a slight overlap in time with it, to facilitate inter-community exchanges. We do encourage HEP participation in SciPy, which will also be a virtual event.

PyHEP 2020 will be a forum for the participants and the community at large to discuss developments of Python packages and tools, exchange experiences, and inform the future evolution of community activities. There will be ample time for discussion.

The agenda is composed of plenary sessions:

1) A keynote presentation.
2) Topical sessions.

3) Hands-on tutorials.

4) Presentations following up from topics discussed at PyHEP 2019.

Registration is open until July 10th. There will be *no* workshop fees.

We thank IRIS-HEP, the University of Liverpool, the Python Software Foundation, the UK Software Sustainability Institute and FNAL for their support.

You are encouraged to register to the PyHEP WG Gitter channel and/or to the HSF forum to receive further information concerning the organisation of the workshop. Workshop updates and information will also be shared on the workshop Twitter in addition to email. Follow the workshop @PyHEPConf and #PyHEP2020.

Organising Committee

Eduardo Rodrigues - University of Liverpool (Chair)
Ben Krikler - University of Bristol (Co-chair)
Jim Pivarski - Princeton University (Co-chair)
Matthew Feickert - University of Illinois at Urbana-Champaign

Local organisation

Chris Tunnell - Rice University
Peter Onyisi - The University of Texas at Austin

Sponsors

The event is kindly sponsored by

pyhep2020-organisation@cern.ch

Participants

1000 View full list

Monday 13 July
- 08:00 → 11:25
  Welcome & Analysis fundamentals
  
  ATLANTIC TIME ZONE SESSION 1
  
  15h00 - 18h25 CET, 06h00 - 09h25 PDT, 18h30 - 21h55 IST , 21h00 - 00h25+1 CST, 22h00 - 01h25+1 JST
  
  Conveners: Eduardo Rodrigues (University of Liverpool (GB)), Graeme A Stewart (CERN)
  - 08:00
    
    Welcome and workshop overview 10m
    
    Speaker: Eduardo Rodrigues (University of Liverpool (GB))
    
    EduardoRodrigues_2020-07-13_PyHEP2020.pdf
  - 08:10
    Uproot & Awkward Arrays (TUTORIAL) 1h
    
    Speaker: Jim Pivarski (Princeton University)
    
    Run on Binder
    
    See it on GitHub
    
    YouTube Recording
    
    Questions from Slido:
    
    What are the main benefits of uproot over pyroot? +24 -1
    
    Does opening a file with uproot4.open() read the whole content into memory directly, or does this only happen when specifying trees or branches? +16 -2
    
    Why don't we get rid of .root files altogether and switch to .hdf5? +17 -4
    
    When looking at TTree's in Uproot, you said that a TTree was also a Mapping - what does a Mapping mean? Is it just a dict? +8 -1
    
    Re:Uproot which parts of ROOT are I/O? Just TFile? Or canvasses and such? +11 -4
    
    Is the cache safe for the multithreading / multiprocessing out of the box? +9 -3
    
    Does uproot4 have all the features uproot3 has? +6 -2
    
    Can you say a little more about reading 'weird' objects in uproot? Particularly, what custom objects work out-of-the-box, what needs massaging, what won't work? +5 -1
    
    Is there any preference for the uproot and awkward compared to using the RDataFrame? Which one is preferred in which case? Thanks :) +4 -1
    
    .arrays() has the convenient "cuts" keyword argument, while .array() does not. Is this on purpose? +3
    
    Jim showed e.g. getting an array of the first element in the first 20 events with branch[:20, 0]. What happens if an event has no element in it? +5 -3
    
    line 57: the result is an array, which is not jagged anymore, but if there are elements with 0 sub-elements? Will there be a None or is the result just shorter? +2 -2
    
    Can we use uproot instead of using ROOT at all? Or are there some cases where ROOT is superior than uproot or pyroot? +5 -3
    
    Can we run any numpy based function under numba@jit? +3 -1
    
    are the cut strings parsed as c++ code? +3 -1
    
    using events = tree.arrays(library="ak", how="zip"), can you deal with eg different numbers of muons and electrons in the same "event"? +3 -1
    
    why do we have awkward arrays? cant we just open using normal arrays? and whats the difference between jagged, awkward and normal arrays? (edited) +3 -1
    
    Is iterating over arrays fast in python? I thought python always gets really slow if one uses loops +4 -2
    
    Are the TTree array alias formulas the same as TFormulas like in TTree::Draw? Or some other syntax? +4 -3
    
    how you convert root-file to numpy array if your ntuple has different length of branching (for example, pt of tracks in each vertex, which is varied). +2 -2
    
    How hard is it to write code that's as performant as uproot from C++ and ROOT? Is someone who is proficient at C++ and ROOT at a disadvantage? If so, how? +1
    
    using iterate, can you control the size of the "chunk" or is it the defined cache? +2 -1
    
    Do you need to manually clear the cache or does it vacate itself when full and you need more? +1 -1
    
    what are the main benefits of uproot over ROOT? +3 -2
    
    Can uproot do everything that ROOT does? What are the limitations of using uproot? +2 -2
    
    Is it possible to create a hierarchy of directories with histograms inside? +1 -1
    
    If one needs to incorporate methods where multiple dataframes/arrays need to be loaded into memory to make histograms what is the optimal way to do this? 0
  - 09:10
    
    BREAK 30m
  - 09:40
    The NanoEvents object 30m
    
    Speaker: Nick Smith (Fermi National Accelerator Lab. (US))
    
    run in binder
    
    view notebook
    
    YouTube Recording
    
    Questions from Slido:
    
    Is NanoEvents useful in other experiments then CMS? What types of files can we access with it? Thanks! +6
    
    How does the NanoEventsObject relates to Numba? Can it be used similary to awkward arrays?
    
    Can I load with uproot, manipulate with awkwardarray (e.g. pad for ML), and then also use nanoevents (to select events), rather than load directly to nanoevents + 2
    
    What would happen if you add two objects that is not a Lorentz vector like so `mmevents.Muon[:, 0] + mmevents.Muon[:, 1]`?
    
    How much of the NanoAOD naming conventions does NanoEvents rely on, beyond splitting by underscore? e.g. lorentz vectors, indices... and can this be customised?
  - 10:10
    Jagged physics data analysis with numba, awkward and uproot on a GPU (TUTORIAL) 45m
    
    Speaker: Joosep Pata (California Institute of Technology (US))
    
    colab notebook link (GPU)
    
    pyhep_gpu.ipynb
    
    YouTube Recording
    
    Questions from Slido:
    
    How do I choose the optimal number of blocks and threads? (edited) +6
    
    Can numba/cupy be used for maximum likelihood fitting on the GPU? +5
    
    Is it also possible to use shared memory as a kernel argument, just as in the CUDA C version? +3
    
    What about jax, instead of cupy? +3
    
    Is there a way to check if there is any race condition? +3
    
    The very first thing you did, selecting a GPU kernel - how do you do this outside a Jupyter notebook. How would you do it from a plain python .py file? +4 -2
    
    Is there any siginificant advantage of using numba instead of pycuda? +2
    
    can that overflow bin be omitted? (to the right of the end of the histogram?) +2
    
    Is pandas compatible with cupy?
    
    comment: @joosep pata, there is a searchsorted implementation in thrust, if you expose that e.g. via ctypes you have it readily available.
  - 10:55
    TITANIA - how to structure detector monitoring 30m
    
    Speakers: Mr Jakub Kowalski, Maciej Witold Majewski (AGH University of Science and Technology (PL))
    
    Kopia Titania.pdf
    
    YouTube Recording
    
    Questions from Slido:
    
    Are you planning to license your software? If yes, which license?
    
    If you want to change the data backend, in how many places do you need to change the code? +1
    
    What's the plan for TITANIA in the context of the already-existing LHCb monitoring environment (e.g. lb-monet)? Is it meant to complement it, supercede it,...? +2
- 17:00 → 18:35
  Welcome & Analysis platforms
  
  PACIFIC TIME ZONE SESSION 1
  
  15h00 - 16h35 PDT, 00h00+1 - 01h35+1 CET, 03h30+1 - 05h05+1 IST, 06h00+1 - 07h35+1 CST, 07h00+1 - 08h35+1 JST
  
  Convener: Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
  - 17:00
    
    Welcome and workshop overview 10m
    
    Speaker: Jim Pivarski (Princeton University)
    
    Slides, er, notebook-slides
  - 17:10
    
    Rubin Observatory: The software behind the science 40m
    
    KEYNOTE PRESENTATION
    
    Speaker: Nate Lust
    
    PyHEP-NLust.pdf
    
    YouTube Recording
  - 17:50
    
    Ganga: flexible use of of virtualisation for user based large scale computations (TUTORIAL) 45m
    
    Speaker: Ulrik Egede (Monash University (AU))
    
    Ganga.pdf
    
    Run on Binder
    
    Tutorial
    
    YouTube Recording
Tuesday 14 July
- 08:00 → 11:25
  Analysis fundamentals & analysis platforms
  
  ATLANTIC TIME ZONE SESSION 2
  
  15h00 - 18h25 CET, 06h00 - 09h25 PDT, 18h30 - 21h55 IST , 21h00 - 00h25+1 CST, 22h00 - 01h25+1 JST
  
  Conveners: Benjamin Krikler (University of Bristol (GB)), Peter Onyisi (University of Texas at Austin (US))
  - 08:00
    
    Python & HEP: a perfect match, in theory 40m
    
    KEYNOTE PRESENTATION
    
    Speaker: David Straub (Lilium GmbH, Munich)
    
    GitHub source
    
    YouTube Recording
  - 08:40
    
    A new PyROOT for ROOT 6.22 30m
    
    Speakers: Enric Tejedor Saavedra (CERN), Stefan Wunsch (KIT - Karlsruhe Institute of Technology (DE))
    
    Binder
    
    Notebook Viewer
    
    PyROOT PyHEP 2020.pdf
    
    YouTube Recording
  - 09:10
    
    resample: use the bootstrap and jackknife from Python 30m
    
    Speaker: Hans Peter Dembinski (Max-Planck-Institute for Nuclear Physics, Heidelberg)
    
    Binder
    
    GitHub
    
    YouTube Recording
  - 09:40
    
    BREAK 30m
  - 10:10
    
    Design Pattern for Analysis Automation on Interchangeable, Distributed Resources using Luigi Analysis Workflows 30m
    
    Speaker: Marcel Rieger (CERN)
    
    law on GitHub
    
    Presentation on GitHub
    
    Run on Binder
    
    YouTube Recording
  - 10:40
    
    ServiceX: On-Demand Data Transformation and Delivery for the Present and HL-LHC Era 45m
    
    Speaker: Kyungeon Choi (University of Texas at Austin (US))
    
    pyhep_ServiceX.pdf
    
    YouTube Recording
- 17:00 → 19:00
  Analysis platforms
  
  PACIFIC TIME ZONE SESSION 2
  
  15h00 - 16h15 PDT, 00h00 - 01h15+1 CET, 03h30+1 - 04h45+1 IST, 06h00+1 - 07h15+1 CST, 07h00+1 - 08h15+1 JST
  
  Conveners: Jim Pivarski (Princeton University), Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
  - 17:00
    
    A prototype U.S. CMS analysis facility (TUTORIAL) 45m
    
    Speaker: Oksana Shadura (University of Nebraska Lincoln (US))
    
    Slides (check README)
    
    YouTube Recording
  - 17:45
    
    Integrating Coffea and Work Queue 30m
    
    Speaker: Cami Carballo (University of Notre Dame)
    
    GitHub Link
    
    YouTube Recording
Wednesday 15 July
- 08:00 → 11:00
  Analysis platforms & automatic differentiation
  
  ATLANTIC TIME ZONE SESSION 3
  
  15h00 - 18h00 CET, 06h00 - 09h00 PDT, 18h30 - 21h30 IST, 21h00 - 24h00 CST, 22h00 - 01h00+1 JST
  
  Conveners: Eduardo Rodrigues (University of Liverpool (GB)), Graeme A Stewart (CERN)
  - 08:00
    
    Columnar Analysis at Scale with Coffea (TUTORIAL) 45m
    
    Speaker: Mat Adamec (University of Nebraska Lincoln (US))
    
    Binder
    
    GitHub
    
    YouTube Recording
  - 08:45
    
    High Granularity Calorimeter (HGCAL) test beam analysis using Jupyter notebooks 30m
    
    Speaker: Matteo Bonanomi (LLR, Ecole Polytechnique (FR))
    
    MBonanomi_HGCAL_Notebooks.pdf
    
    Notebook on GitHub
    
    YouTube Recording
  - 09:15
    
    BREAK 30m
  - 09:45
    
    Introduction to automatic differentiation (TUTORIAL) 45m
    
    Speaker: Lukas Alexander Heinrich (CERN)
    
    Run on Binder
    
    Source on GitHub
    
    YouTube Recording
  - 10:30
    
    neos: physics analysis as a differentiable program 30m
    
    Speaker: Nathan Daniel Simpson (Lund University (SE))
    
    Run on Binder
    
    Source on GitHub
    
    YouTube Recording
- 17:00 → 18:00
  Performance
  
  PACIFIC TIME ZONE SESSION 3
  
  15h00 - 16h00 PDT, 00h00 - 01h00+1 CET, 03h30+1 - 04h30+1 IST, 06h00+1 - 07h00+1 CST, 07h00+1 - 08h00+1 JST
  
  Conveners: Jim Pivarski (Princeton University), Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
  - 17:00
    
    High-performance Python (TUTORIAL) 1h
    
    Speaker: Henry Fredrick Schreiner (Princeton University)
    
    Run on Binder
    
    Source on GitHub
    
    YouTube Recording
Thursday 16 July
- 08:00 → 11:15
  Fitting & statistics
  
  ATLANTIC TIME ZONE SESSION 4
  
  15h00 - 18h15 CET, 06h00 - 09h15 PDT, 18h30 - 21h45 IST, 21h00 - 00h15+1 CST, 22h00 - 01h15+1 JST
  
  Conveners: Benjamin Krikler (University of Bristol (GB)), Eduardo Rodrigues (University of Liverpool (GB))
  - 08:00
    
    Model building and statistical inference with zfit and hepstats (TUTORIAL) 45m
    
    Speakers: Jonas Eschle (Universitaet Zuerich (CH)), Matthieu Marinangeli (EPFL - Ecole Polytechnique Federale Lausanne (CH))
    
    Binder
    
    GitHub
    
    YouTube Recording
  - 08:45
    
    SModelS – a tool for interpreting simplified-model results from the LHC 30m
    
    Speaker: Wolfgang Waltenberger (Austrian Academy of Sciences (AT))
    
    Run on Binder
    
    Source on GitHub
    
    YouTube Recording
  - 09:15
    
    BREAK 30m
  - 09:45
    
    Tensorflow-based Maximum Likelihood fits for High Precision Standard Model Measurements at CMS 30m
    
    Speaker: Josh Bendavid (CERN)
    
    GitHub
    
    tffit-pyhep-Jul16-2020.pdf
    
    YouTube Recording
  - 10:15
    
    iminuit: Past and Future 30m
    
    Speaker: Hans Peter Dembinski (Max-Planck-Institute for Nuclear Physics, Heidelberg)
    
    Binder
    
    GitHub
    
    YouTube Recording
  - 10:45
    
    zfit - TensorFlow 2.0: dynamic and compiled HPC 30m
    
    Speaker: Jonas Eschle (Universitaet Zuerich (CH))
    
    PyHEP_zfit_TF2.pdf
    
    YouTube Recording
- 17:00 → 18:15
  Fitting & statistics
  
  PACIFIC TIME ZONE SESSION 4
  
  15h00 - 16h00 PDT, 00h00 - 01h00+1 CET, 03h30+1 - 04h30+1 IST, 06h00+1 - 07h00+1 CST, 07h00+1 - 08h00+1 JST
  
  Conveners: Jim Pivarski (Princeton University), Mariel Pettee (Yale University (US))
  - 17:00
    
    Machine learning technique for signal-background separation of nuclear interaction vertices in the CMS detector 30m
    
    Speaker: Anna Kropivnitskaya (The University of Kansas (US))
    
    2020.07.16_ML_NI_PyHEP_kropiv.pdf
    
    Binder
    
    GitHub
    
    YouTube Recording
    
    Zenodo DOI
  - 17:30
    
    pyhf: Accelerating analyses and preserving likelihoods (TUTORIAL) 45m
    
    Speaker: Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
    
    Jupyter Book
    
    source GitHub repo
    
    YouTube Recording
    
    Zenodo DOI
Friday 17 July
- 08:00 → 11:00
  HEP analysis ecosystem & performance
  
  ATLANTIC TIME ZONE SESSION 5
  
  15h00 - 18h00 CET, 06h00 - 09h00 PDT, 18h30 - 21h30 IST , 21h00 - 24h00 CST, 22h00 - 01h00+1 JST
  
  Conveners: Eduardo Rodrigues (University of Liverpool (GB)), Hans Peter Dembinski (Max-Planck-Institute for Nuclear Physics, Heidelberg)
  - 08:00
    
    The boost-histogram package 30m
    
    The boost-histogram library provides first-class histogram objects in Python. You can compose axes and a storage to fit almost any problem. You can fill, manipulate, slice, and project then, and pass them between other Scikit-HEP libraries like Uproot4, mplhep, and histoprint. Boost-histogram is meant to be the "NumPy" of histogram libraries that others can build on; the "pandas" of histograms is "Hist", a physicist friendly front-end that extends and expands boost-histogram to do plotting and more. An early version of Hist is shown for the first time here.
    
    Speakers: Hans Peter Dembinski (Max-Planck-Institute for Nuclear Physics, Heidelberg), Henry Fredrick Schreiner (Princeton University)
    
    Run on Binder
    
    Talk repo
    
    YouTube Recording
  - 08:30
    
    Providing Python Bindings For Complex and Feature-Rich C and C++ Libraries 30m
    
    Speaker: Martin Schwinzerl (University of Graz (AT))
    
    GitHub Repository
    
    pyhep2020_cxx_bindings_final.ipynb
    
    YouTube Recording
  - 09:00
    
    Integrating GPU libraries for fun and profit 30m
    
    Speaker: Adrian Oeftiger (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
    
    github repo with notebook
    
    slides on github
    
    YouTube Recording
  - 09:30
    
    BREAK 30m
  - 10:00
    
    mplhep: bridging Matplotlib and HEP 30m
    
    Speaker: Andrzej Novak (RWTH Aachen (DE))
    
    mplhep-binder
    
    source GitHub repo
    
    YouTube Recording
  - 10:30
    
    ROOT preprocessing pipeline for machine learning with TensorFlow 30m
    
    Speaker: Matthias Komm (CERN)
    
    github
    
    Run on Binder
    
    YouTube Recording
    
    Zenodo DOI
- 17:00 → 18:15
  Analysis systems
  
  PACIFIC TIME ZONE SESSION 5
  
  15h00 - 16h15 PDT, 00h00 - 01h15+1 CET, 03h30+1 - 04h45+1 IST, 06h00+1 - 07h15+1 CST, 07h00+1 - 08h15+1 JST
  
  Conveners: Jim Pivarski (Princeton University), Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
  - 17:00
    
    Integrated Data Acquisition in Python 30m
    
    Speaker: Charles Burton (University of Texas at Austin (US))
    
    burton_pyhep_2020-07-17.pdf
    
    burton_pyhep_2020-07-17.pptx
    
    GitHub Gist of notebook
    
    read_hdf5.ipynb
    
    Run on Binder
    
    YouTube Recording
  - 17:30
    
    ThickBrick: Optimal event selection and categorization in high energy physics (TUTORIAL) 45m
    
    Speaker: Prasanth Shyamsundar (University of Florida)
    
    GitHub repository
    
    Run on Binder
    
    YouTube Recording

Choose timezone

PyHEP 2020 (virtual) Workshop

Organising Committee

Sponsors