EDM4hep Discussion

Europe/Zurich
Zoom

Zoom

Zoom Meeting ID
98484040528
Host
Andre Sailer
Useful links
Join via phone
Zoom URL

EDM4hep Live Notes
==================

Date: Aug 10, 2021
Indico: https://indico.cern.ch/event/1065461/
This is a document for taking notes during EDM4hep meetings.

Connected: Tao, Xingtao, Wenxing, Thomas, Raffaela, Joseph, Zou, Sanghyun, Placido, Benedikt, Frank, Juraj, Valentin, Andre

Apologies: Gerri

## Introduction and General Points
    
* ACAT
    * https://indico.cern.ch/event/855454/
    * submission deadline postponed: Aug 29
    * abstracts ready for next Key4hep meeting (Aug 17, 2021) 
    * Check this week if Key4hep + EDM4hep/podio should be merged

## Progress and Discussion


## Aleph to EDM4hep

* Effort started, no major problem
* Some places where the data does not completely "fit"
* Need for User extensions
    * Fraction of energy in calo layers

* Sanghyun: storing waveform could be useful

## Podio

### change class names of const classes
* See presentation
* Use c++ templates for the two hierarchies?
    * More or less equivalent to using the podio templates
* Migration for already existing codebases?
    * Can we do something to help users?
    * (main) customers:
        * k4MarlinWrapper, k4SimDelphes, FCC full simulation 
    * How much effort is it? How much is hidden behind `auto`?
    * Make a separate branch with the changes to see where things break
* Benedikt: Namespaces for mutable classes can lead to very confusing situations
* Nobody objects to changes during meeting, need more feedback from people not connected
* [x] Open issue on github to collect more feedback and see where other users of podio are
    * Send email to edm4hep mailing list
    * https://github.com/AIDASoft/podio/issues/204

###  Support for multiple input sources in parallel
* https://github.com/AIDASoft/podio/issues/190
* Please read and maybe comment
* Presentation by Benedikt at meeting on July 27 (Postponed?)

### Benchmarking

* Discuss how to integrate with validation in Key4hep

### Issues/PRs

* Object assignment operator causes duplicated releasing
    * https://github.com/AIDASoft/podio/issues/200

* Fixed width integer types
    * https://github.com/key4hep/EDM4hep/issues/112
    * https://github.com/AIDASoft/podio/pull/186 for podio support, merged
    * Move to fixed width types in edm4hep?
        * Makes sense and "doesn't hurt us".
    * Doubles/floats? Are they are guaranteed to be fixed size?
        * standard defines only minimum width
        * `<cstdint>` available for floats?
            * boost has fixed width float types
            * In `std` only means to check if they are IEEE 754 compliant and width
            
* Disentangle storage and collection interface and reference collections
    * https://github.com/AIDASoft/podio/pull/197
    * Ready for review
    * Maybe rename to subset collections (same as in LCIO)
    * Store reference/subset collection without storing the original objects?
        * Not possible at the moment, would lead to a dangling reference
        * Possible in LCIO(?)
    
* Generate an additional cmake lists file containing the generated source files
    * https://github.com/AIDASoft/podio/pull/143

#### Constness in python
* Currently solved by `Const` classes
* Also useful for `auto` type deduction, where `const auto` can be necessary otherwise to get something fully correct
* LCIO solved with flag in collections
* Do we need python writing?
    * probably yes, for writing ntuple like data


#### Heap-use-after-free

* https://github.com/AIDASoft/podio/issues/174
* Not a problem in frameworks, but if collections used outside of them
* Deep inside the memory management of podio, so not easy to fix
* Happens more often with clang than with gcc, but could be compiler options.
    * Flagged by address-sanitizer
* Maybe requires deep changes. Change for reference counting of object classes.

#### c++ concepts
* BH: add compile time checks for class behaviours: e.g., movable

#### issue w/ ROOT and (vectors of) non-copyable collections
* happens in ROOT 6.22 
* PM: there is a patch available in LCG repository
    * ROOT team is working on a general solution
* Still relevant for us? 

#### What are the different branches in the root file?
* https://github.com/AIDASoft/podio/issues/169
* Encode more information in the _relation_ branch names?
* Related to use in RDataFrame/RNTupe, directly looking at root file content
* Are branch names an implementation detail?
* backward compatibility "Impossible" (?)

#### Schema Evolution
- https://github.com/AIDASoft/podio/issues/86
- Discussion: https://indico.cern.ch/event/1030566/

#### Multi-Threading

#### "event class" in podio

* Currently being perceived

### PRs
* https://github.com/AIDASoft/podio/pulls

### Meta Data

#### Usage of "metadata" for user defined data
* need to check if current implementation addresses all use cases
* need test use-cases

### EventStore

### Features

* Subset collections?

#### framework ROOTWriter
* Could be refactored to use more standalone things via Gaudi Tools and Services
* Could potentially more easily write `std::vector`s or other "generic" types
    * Ideally put "generic" data types also in standalone podio
    * How to best allow for "extensions" to an EDM? 
    
## LCIOConverters
* https://github.com/key4hep/k4LCIOReader

## EDM4hep
https://github.com/key4hep/EDM4hep/pulls

### Missing TrackerHitPlane type from LCIO
* https://github.com/key4hep/EDM4hep/issues/121
* Data can be stored in `edm4hep::TrackerHit` but converting back to LCIO might break algorithms that expect a dedicated class
    * Conversion is working in both directions.
* Could be solved with a "wrapper TrackerHit" class that can be used in `OneToManyRelations` etc that just stores the `Obj*` of the underlying class and forwards function calls 
    * Avoids to have virtual base classes
* Start now with adding a `TrackerHitPlane` to edm4hep to be able to continue with the converters

### cellID decoding functionality in EDM4hep is missing
* https://github.com/key4hep/EDM4hep/issues/115
* pull request to fix it coming

### EDM4hep-Utils
* To be suppressed: https://github.com/key4hep/EDM4hep-utils/issues/2

### User defined collections as proper datatypes
* https://github.com/key4hep/EDM4hep/pull/114
* https://github.com/key4hep/EDM4hep/pull/117 (built on top of this idea)
* Type is `UserFloat` and not `float` -> APIs have to take `UserFloat`
    * A bit "suprising" that it is not possible to simply store a `float`
    * Needs proper documentation
* Something similar in FCC-edm with clash on macOS due to `Float.h` header file
* Benedikt: Be more strict? E.g. only allow to store user data if there is also metadata in the file.
    * For one parameter branch name could serve as "metadata"
    * Would make it more explicit, but also remove a bit of flexibility
    * Use utility class to enforce this
* Would be good to have the simple implementation now, including ints and doubles, and then try it out


### MCParticle Endpoint
* https://github.com/key4hep/EDM4hep/issues/113
* Keep podio simple by not allowing getters/setters to access multiple internal states. Use utility functions for this use case, if needed.


### Gaudi Documentation

* new "readthedocs" style at https://cern.ch/gaudi

### Issues

## AOB

### Next meeting:

    * August 24
    * Presentation by Benedikt about Multiple Input sources

 
There are minutes attached to this event. Show them.
    • 09:00 09:05
      Introduction 5m
      Speakers: Andre Sailer (CERN), Frank-Dieter Gaede (Deutsches Elektronen-Synchrotron (DE))
    • 09:05 09:15
      podio - Better class names(?) 10m
      Speaker: Thomas Madlener (Deutsches Elektronen-Synchrotron (DESY))
    • 10:00 10:01
      Discussion 1m
      Speaker: Dr All