PyHEP 2020 (virtual) Workshop

US/Central
Benjamin Krikler (University of Bristol (GB)), Eduardo Rodrigues (University of Liverpool (GB)), Jim Pivarski (Princeton University), Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
Description

The PyHEP workshops are a series of workshops initiated and supported by the HEP Software Foundation (HSF) with the aim to provide an environment to discuss and promote the usage of Python in the HEP community at large. Further information is given on the PyHEP WG website.

PyHEP 2020 will be a virtual workshop given the worldwide conditions related to the COVID-19 pandemic. It was meant to be held in Austin (Texas), USA, on 11-13 July 2020, co-locating with the SciPy 2020 conference on scientific computing in Python, will a slight overlap in time with it, to facilitate inter-community exchanges. We do encourage HEP participation in SciPy, which will also be a virtual event.

PyHEP 2020 will be a forum for the participants and the community at large to discuss developments of Python packages and tools, exchange experiences, and inform the future evolution of community activities. There will be ample time for discussion.
 
The agenda is composed of plenary sessions:
1) A keynote presentation.
2) Topical sessions.
3) Hands-on tutorials.
4) Presentations following up from topics discussed at PyHEP 2019.

Registration is open until July 10th. There will be *no* workshop fees.

We thank IRIS-HEP, the University of Liverpool, the Python Software Foundation, the UK Software Sustainability Institute and FNAL for their support.

You are encouraged to register to the PyHEP WG Gitter channel and/or to the HSF forum to receive further information concerning the organisation of the workshop. Workshop updates and information will also be shared on the workshop Twitter in addition to email. Follow the workshop @PyHEPConf and #PyHEP2020.
 

Organising Committee

Eduardo Rodrigues - University of Liverpool (Chair)
Ben Krikler - University of Bristol (Co-chair)
Jim Pivarski - Princeton University (Co-chair)
Matthew Feickert - University of Illinois at Urbana-Champaign

Local organisation

Chris Tunnell - Rice University
Peter Onyisi - The University of Texas at Austin

 

Sponsors

The event is kindly sponsored by

                                     

 

Participants
    • 08:00 11:25
      Welcome & Analysis fundamentals

      ATLANTIC TIME ZONE SESSION 1

      15h00 - 18h25 CET, 06h00 - 09h25 PDT, 18h30 - 21h55 IST , 21h00 - 00h25+1 CST, 22h00 - 01h25+1 JST

      Conveners: Eduardo Rodrigues (University of Liverpool (GB)), Graeme A Stewart (CERN)
      • 08:00
        Welcome and workshop overview 10m
        Speaker: Eduardo Rodrigues (University of Liverpool (GB))
      • 08:10
        Uproot & Awkward Arrays (TUTORIAL) 1h
        Speaker: Jim Pivarski (Princeton University)

        Questions from Slido:

        • What are the main benefits of uproot over pyroot? +24 -1
        • Does opening a file with uproot4.open() read the whole content into memory directly, or does this only happen when specifying trees or branches? +16 -2
        • Why don't we get rid of .root files altogether and switch to .hdf5? +17 -4
        • When looking at TTree's in Uproot, you said that a TTree was also a Mapping - what does a Mapping mean? Is it just a dict? +8 -1
        • Re:Uproot which parts of ROOT are I/O? Just TFile? Or canvasses and such? +11 -4
        • Is the cache safe for the multithreading / multiprocessing out of the box? +9 -3
        • Does uproot4 have all the features uproot3 has? +6 -2
        • Can you say a little more about reading 'weird' objects in uproot? Particularly, what custom objects work out-of-the-box, what needs massaging, what won't work? +5 -1
        • Is there any preference for the uproot and awkward compared to using the RDataFrame? Which one is preferred in which case? Thanks :) +4 -1
        • .arrays() has the convenient "cuts" keyword argument, while .array() does not. Is this on purpose? +3
        • Jim showed e.g. getting an array of the first element in the first 20 events with branch[:20, 0]. What happens if an event has no element in it? +5 -3
        • line 57: the result is an array, which is not jagged anymore, but if there are elements with 0 sub-elements? Will there be a None or is the result just shorter? +2 -2
        • Can we use uproot instead of using ROOT at all? Or are there some cases where ROOT is superior than uproot or pyroot? +5 -3
        • Can we run any numpy based function under numba@jit? +3 -1
        • are the cut strings parsed as c++ code? +3 -1
        • using events = tree.arrays(library="ak", how="zip"), can you deal with eg different numbers of muons and electrons in the same "event"? +3 -1
        • why do we have awkward arrays? cant we just open using normal arrays? and whats the difference between jagged, awkward and normal arrays? (edited) +3 -1
        • Is iterating over arrays fast in python? I thought python always gets really slow if one uses loops +4 -2
        • Are the TTree array alias formulas the same as TFormulas like in TTree::Draw? Or some other syntax? +4 -3
        • how you convert root-file to numpy array if your ntuple has different length of branching (for example, pt of tracks in each vertex, which is varied). +2 -2
        • How hard is it to write code that's as performant as uproot from C++ and ROOT? Is someone who is proficient at C++ and ROOT at a disadvantage? If so, how? +1
        • using iterate, can you control the size of the "chunk" or is it the defined cache? +2 -1
        • Do you need to manually clear the cache or does it vacate itself when full and you need more? +1 -1
        • what are the main benefits of uproot over ROOT? +3 -2
        • Can uproot do everything that ROOT does? What are the limitations of using uproot? +2 -2
        • Is it possible to create a hierarchy of directories with histograms inside? +1 -1
        • If one needs to incorporate methods where multiple dataframes/arrays need to be loaded into memory to make histograms what is the optimal way to do this? 0
        •  
      • 09:10
        BREAK 30m
      • 09:40
        The NanoEvents object 30m
        Speaker: Nick Smith (Fermi National Accelerator Lab. (US))

        Questions from Slido:

        • Is NanoEvents useful in other experiments then CMS? What types of files can we access with it? Thanks! +6
        • How does the NanoEventsObject relates to Numba? Can it be used similary to awkward arrays?
        • Can I load with uproot, manipulate with awkwardarray (e.g. pad for ML), and then also use nanoevents (to select events), rather than load directly to nanoevents + 2
        • What would happen if you add two objects that is not a Lorentz vector like so `mmevents.Muon[:, 0] + mmevents.Muon[:, 1]`?
        • How much of the NanoAOD naming conventions does NanoEvents rely on, beyond splitting by underscore? e.g. lorentz vectors, indices... and can this be customised?
      • 10:10
        Jagged physics data analysis with numba, awkward and uproot on a GPU (TUTORIAL) 45m
        Speaker: Joosep Pata (California Institute of Technology (US))

        Questions from Slido:

        • How do I choose the optimal number of blocks and threads? (edited) +6
        • Can numba/cupy be used for maximum likelihood fitting on the GPU? +5
        • Is it also possible to use shared memory as a kernel argument, just as in the CUDA C version? +3
        • What about jax, instead of cupy? +3
        • Is there a way to check if there is any race condition? +3
        • The very first thing you did, selecting a GPU kernel - how do you do this outside a Jupyter notebook. How would you do it from a plain python .py file? +4 -2
        • Is there any siginificant advantage of using numba instead of pycuda? +2
        • can that overflow bin be omitted? (to the right of the end of the histogram?) +2
        • Is pandas compatible with cupy?
        • comment: @joosep pata, there is a searchsorted implementation in thrust, if you expose that e.g. via ctypes you have it readily available.
      • 10:55
        TITANIA - how to structure detector monitoring 30m
        Speakers: Mr Jakub Kowalski, Maciej Witold Majewski (AGH University of Science and Technology (PL))

        Questions from Slido:

        • Are you planning to license your software? If yes, which license?
        • If you want to change the data backend, in how many places do you need to change the code? +1
        • What's the plan for TITANIA in the context of the already-existing LHCb monitoring environment (e.g. lb-monet)? Is it meant to complement it, supercede it,...? +2
    • 17:00 18:35
      Welcome & Analysis platforms

      PACIFIC TIME ZONE SESSION 1

      15h00 - 16h35 PDT, 00h00+1 - 01h35+1 CET, 03h30+1 - 05h05+1 IST, 06h00+1 - 07h35+1 CST, 07h00+1 - 08h35+1 JST

      Convener: Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
    • 08:00 11:25
      Analysis fundamentals & analysis platforms

      ATLANTIC TIME ZONE SESSION 2

      15h00 - 18h25 CET, 06h00 - 09h25 PDT, 18h30 - 21h55 IST , 21h00 - 00h25+1 CST, 22h00 - 01h25+1 JST

      Conveners: Benjamin Krikler (University of Bristol (GB)), Peter Onyisi (University of Texas at Austin (US))
    • 17:00 19:00
      Analysis platforms

      PACIFIC TIME ZONE SESSION 2

      15h00 - 16h15 PDT, 00h00 - 01h15+1 CET, 03h30+1 - 04h45+1 IST, 06h00+1 - 07h15+1 CST, 07h00+1 - 08h15+1 JST

      Conveners: Jim Pivarski (Princeton University), Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
    • 08:00 11:00
      Analysis platforms & automatic differentiation

      ATLANTIC TIME ZONE SESSION 3

      15h00 - 18h00 CET, 06h00 - 09h00 PDT, 18h30 - 21h30 IST, 21h00 - 24h00 CST, 22h00 - 01h00+1 JST

      Conveners: Eduardo Rodrigues (University of Liverpool (GB)), Graeme A Stewart (CERN)
    • 17:00 18:00
      Performance

      PACIFIC TIME ZONE SESSION 3

      15h00 - 16h00 PDT, 00h00 - 01h00+1 CET, 03h30+1 - 04h30+1 IST, 06h00+1 - 07h00+1 CST, 07h00+1 - 08h00+1 JST

      Conveners: Jim Pivarski (Princeton University), Matthew Feickert (Univ. Illinois at Urbana Champaign (US))
    • 08:00 11:15
      Fitting & statistics

      ATLANTIC TIME ZONE SESSION 4

      15h00 - 18h15 CET, 06h00 - 09h15 PDT, 18h30 - 21h45 IST, 21h00 - 00h15+1 CST, 22h00 - 01h15+1 JST

      Conveners: Benjamin Krikler (University of Bristol (GB)), Eduardo Rodrigues (University of Liverpool (GB))
      • 08:00
        Model building and statistical inference with zfit and hepstats (TUTORIAL) 45m
        Speakers: Jonas Eschle (Universitaet Zuerich (CH)), Matthieu Marinangeli (EPFL - Ecole Polytechnique Federale Lausanne (CH))
      • 08:45
        SModelS – a tool for interpreting simplified-model results from the LHC 30m
        Speaker: Wolfgang Waltenberger (Austrian Academy of Sciences (AT))
      • 09:15
        BREAK 30m
      • 09:45
        Tensorflow-based Maximum Likelihood fits for High Precision Standard Model Measurements at CMS 30m
        Speaker: Josh Bendavid (CERN)
      • 10:15
        iminuit: Past and Future 30m
        Speaker: Hans Peter Dembinski (Max-Planck-Institute for Nuclear Physics, Heidelberg)
      • 10:45
        zfit - TensorFlow 2.0: dynamic and compiled HPC 30m
        Speaker: Jonas Eschle (Universitaet Zuerich (CH))
    • 17:00 18:15
      Fitting & statistics

      PACIFIC TIME ZONE SESSION 4

      15h00 - 16h00 PDT, 00h00 - 01h00+1 CET, 03h30+1 - 04h30+1 IST, 06h00+1 - 07h00+1 CST, 07h00+1 - 08h00+1 JST

      Conveners: Jim Pivarski (Princeton University), Mariel Pettee (Yale University (US))
    • 08:00 11:00
      HEP analysis ecosystem & performance

      ATLANTIC TIME ZONE SESSION 5

      15h00 - 18h00 CET, 06h00 - 09h00 PDT, 18h30 - 21h30 IST , 21h00 - 24h00 CST, 22h00 - 01h00+1 JST

      Conveners: Eduardo Rodrigues (University of Liverpool (GB)), Hans Peter Dembinski (Max-Planck-Institute for Nuclear Physics, Heidelberg)
      • 08:00
        The boost-histogram package 30m

        The boost-histogram library provides first-class histogram objects in Python. You can compose axes and a storage to fit almost any problem. You can fill, manipulate, slice, and project then, and pass them between other Scikit-HEP libraries like Uproot4, mplhep, and histoprint. Boost-histogram is meant to be the "NumPy" of histogram libraries that others can build on; the "pandas" of histograms is "Hist", a physicist friendly front-end that extends and expands boost-histogram to do plotting and more. An early version of Hist is shown for the first time here.

        Speakers: Hans Peter Dembinski (Max-Planck-Institute for Nuclear Physics, Heidelberg), Henry Fredrick Schreiner (Princeton University)
      • 08:30
        Providing Python Bindings For Complex and Feature-Rich C and C++ Libraries 30m
        Speaker: Martin Schwinzerl (University of Graz (AT))
      • 09:00
        Integrating GPU libraries for fun and profit 30m
        Speaker: Adrian Oeftiger (GSI - Helmholtzzentrum fur Schwerionenforschung GmbH (DE))
      • 09:30
        BREAK 30m
      • 10:00
        mplhep: bridging Matplotlib and HEP 30m
        Speaker: Andrzej Novak (RWTH Aachen (DE))
      • 10:30
        ROOT preprocessing pipeline for machine learning with TensorFlow 30m
        Speaker: Matthias Komm (CERN)
    • 17:00 18:15
      Analysis systems

      PACIFIC TIME ZONE SESSION 5

      15h00 - 16h15 PDT, 00h00 - 01h15+1 CET, 03h30+1 - 04h45+1 IST, 06h00+1 - 07h15+1 CST, 07h00+1 - 08h15+1 JST

      Conveners: Jim Pivarski (Princeton University), Matthew Feickert (Univ. Illinois at Urbana Champaign (US))