EDM4hep Discussion

Europe/Zurich
Zoom

Zoom

Zoom Meeting ID
98484040528
Host
Andre Sailer
Useful links
Join via phone
Zoom URL

EDM4hep Live Notes
==================

Date: July 27, 2021
Indico: https://indico.cern.ch/event/1062644/
This is a document for taking notes during EDM4hep meetings.

Connected: Andre, Thomas, Clement, Juraj, Sanghyun, Valentin, Yizhou, Gerri,

Apologies: Tao, Benedikt, Placido, Frank, 

## Introduction and General Points

* EPS
https://www.eps-hep2021.eu/programme/
    * Abstract Accepted: https://indico.desy.de/event/28202/contributions/105603/
    * July 26-30, 2021 
    * Slide draft circulated
    
* ACAT
    * https://indico.cern.ch/event/855454/
    * submission deadline postponed: Aug 29
 
 * PyHEP (virtual) workshop 2021
     * https://indico.cern.ch/event/1019958/
     * July 5-9
     * CAD related FreeCAD could be interesting for FCC, needs communication with dd4hep developers
         * https://indico.cern.ch/event/1019958/contributions/4419015/
     * RDataFrame on Spark
         * Currently not possible to use key4hep stack on SWAN (needs python + jupyter packages).
     * Julia
         * If someone is interested they are invited to bring things to key4hep
         * Would be nice to have a "working example" (no framework integration)
         * Original "phase space" for Julia has gotten smaller with newer python libraries (numpy, etc)
         * Have Julia in the stack and "monitor the activities"

## Progress and Discussion


## Aleph to EDM4hep

* Effort started, no major problem
* Some places where the data does not completely "fit"
* Need for User extensions
    * Fraction of energy in calo layers

* Sanghyun: storing waveform could be useful

## Podio

###  Support for multiple input sources in parallel
* https://github.com/AIDASoft/podio/issues/190
* Please read and maybe comment
* Presentation by Benedikt at meeting on July 27 (Postponed?)

### Benchmarking

* Discuss how to integrate with validation in Key4hep

### Issues/PRs

* Object assignment operator causes duplicated releasing
    * https://github.com/AIDASoft/podio/issues/200

* Fixed width integer types
    * https://github.com/key4hep/EDM4hep/issues/112
    * https://github.com/AIDASoft/podio/pull/186 for podio support, merged
    * Move to fixed width types in edm4hep?
        * Makes sense and "doesn't hurt us".
    * Doubles/floats? Are they are guaranteed to be fixed size?
        * standard defines only minimum width
        * `<cstdint>` available for floats?
            * boost has fixed width float types
            * In `std` only means to check if they are IEEE 754 compliant and width
            
* Disentangle storage and collection interface and reference collections
    * https://github.com/AIDASoft/podio/pull/197
    * Ready for review
    * Maybe rename to subset collections (same as in LCIO)
    * Store reference/subset collection without storing the original objects?
        * Not possible at the moment, would lead to a dangling reference
        * Possible in LCIO(?)
    
* Generate an additional cmake lists file containing the generated source files
    * https://github.com/AIDASoft/podio/pull/143

#### Constness in python
* Currently solved by `Const` classes
* Also useful for `auto` type deduction, where `const auto` can be necessary otherwise to get something fully correct
* LCIO solved with flag in collections
* Do we need python writing?
    * probably yes, for writing ntuple like data


#### Heap-use-after-free

* https://github.com/AIDASoft/podio/issues/174
* Not a problem in frameworks, but if collections used outside of them
* Deep inside the memory management of podio, so not easy to fix
* Happens more often with clang than with gcc, but could be compiler options.
    * Flagged by address-sanitizer
* Maybe requires deep changes. Change for reference counting of object classes.

#### c++ concepts
* BH: add compile time checks for class behaviours: e.g., movable

#### issue w/ ROOT and (vectors of) non-copyable collections
* happens in ROOT 6.22 
* PM: there is a patch available in LCG repository
    * ROOT team is working on a general solution
* Still relevant for us? 

#### What are the different branches in the root file?
* https://github.com/AIDASoft/podio/issues/169
* Encode more information in the _relation_ branch names?
* Related to use in RDataFrame/RNTupe, directly looking at root file content
* Are branch names an implementation detail?
* backward compatibility "Impossible" (?)

#### Schema Evolution
- https://github.com/AIDASoft/podio/issues/86
- Discussion: https://indico.cern.ch/event/1030566/

#### Multi-Threading

#### "event class" in podio

* Currently being perceived

### PRs
* https://github.com/AIDASoft/podio/pulls

### Meta Data

#### Usage of "metadata" for user defined data
* need to check if current implementation addresses all use cases
* need test use-cases

### EventStore

### Features

* Subset collections?

#### framework ROOTWriter
* Could be refactored to use more standalone things via Gaudi Tools and Services
* Could potentially more easily write `std::vector`s or other "generic" types
    * Ideally put "generic" data types also in standalone podio
    * How to best allow for "extensions" to an EDM? 
    
## LCIOConverters
* https://github.com/key4hep/k4LCIOReader

## EDM4hep
https://github.com/key4hep/EDM4hep/pulls

### cellID decoding functionality in EDM4hep is missing
* https://github.com/key4hep/EDM4hep/issues/115
* pull request to fix it coming

### EDM4hep-Utils
* To be suppressed: https://github.com/key4hep/EDM4hep-utils/issues/2

### User defined collections as proper datatypes
* https://github.com/key4hep/EDM4hep/pull/114
* https://github.com/key4hep/EDM4hep/pull/117 (built on top of this idea)
* Type is `UserFloat` and not `float` -> APIs have to take `UserFloat`
    * A bit "suprising" that it is not possible to simply store a `float`
    * Needs proper documentation
* Something similar in FCC-edm with clash on macOS due to `Float.h` header file
* Benedikt: Be more strict? E.g. only allow to store user data if there is also metadata in the file.
    * For one parameter branch name could serve as "metadata"
    * Would make it more explicit, but also remove a bit of flexibility
    * Use utility class to enforce this
* Would be good to have the simple implementation now, including ints and doubles, and then try it out


### MCParticle Endpoint
* https://github.com/key4hep/EDM4hep/issues/113
* Keep podio simple by not allowing getters/setters to access multiple internal states. Use utility functions for this use case, if needed.


### Gaudi Documentation

* new "readthedocs" style at https://cern.ch/gaudi

### Issues

## AOB

### Next meeting:

* August 10
* Presentation by Benedikt about Multiple Input sources

There are minutes attached to this event. Show them.
    • 1
      Introduction
      Speakers: Andre Sailer (CERN), Frank-Dieter Gaede (Deutsches Elektronen-Synchrotron (DE))
    • 2
      Discussion
      Speaker: Dr All