EDM4hep Discussion

Europe/Zurich
Zoom

Zoom

Zoom Meeting ID
98484040528
Host
Andre Sailer
Useful links
Join via phone
Zoom URL

EDM4hep Live Notes
==================

Date: Aug 24, 2021
Indico: https://indico.cern.ch/event/1067644/
This is a document for taking notes during EDM4hep meetings.

Connected: Benedikt, Tao, Teng, Wenxing, Jiaheng, Thomas, Raffaela, Frank, Xingtao, Placido, Andre, Clement, Juraj

Apologies:

## Introduction and General Points
    
* ACAT
    * https://indico.cern.ch/event/855454/
    * submission deadline Aug 29 (Sunday)
    * Abstract Draft: https://demo.hedgedoc.org/mqb0OWTeT7OTx-b5cKFRbg#

## Progress and Discussion


## Aleph to EDM4hep

* Effort started, no major problem
* Some places where the data does not completely "fit"
* Need for User extensions
    * Fraction of energy in calo layers

* Sanghyun: storing waveform could be useful

## Podio

## multiple input sources
* see presentation
* Is it enough to be able to read independent events and "move" collections from one event to another?
    * Could be done by the MultiSourceReader. Question is do we need to expose this low level functionality. How would dependencies be handled?
    * Ownership transfer of collections is definitely necessary.
    * Tracking of dependencies of collections; Get's complicated pretty soon, e.g. what happens with dependencies of dependencies?
        * User knows best
    * Need to be explicit of what are the things we guarantee and which we don't
* Direct access (aka EventHeader). Podio or edm4hep feature?
    * In principle it is EDM specific.
    * Unique "event tag" should be podio feature
* Object comparison uses ObjectID
    * Mainly used to check if we are referring to the same object
    * Up to the user to not abuse this too much
* Focus on providing the necessary building blocks to implement more complicated features

### change class names of const classes
* See presentation
* Use c++ templates for the two hierarchies?
    * More or less equivalent to using the podio templates
* Migration for already existing codebases?
    * Can we do something to help users?
    * (main) customers:
        * k4MarlinWrapper, k4SimDelphes, FCC full simulation
    * How much effort is it? How much is hidden behind `auto`?
    * Make a separate branch with the changes to see where things break
* Benedikt: Namespaces for mutable classes can lead to very confusing situations
* Nobody objects to changes during meeting, need more feedback from people not connected
* [x] Open issue on github to collect more feedback and see where other users of podio are
    * Send email to edm4hep mailing list
    * https://github.com/AIDASoft/podio/issues/204

###  Support for multiple input sources in parallel
* https://github.com/AIDASoft/podio/issues/190
* Please read and maybe comment
* Presentation by Benedikt at meeting on July 27 (Postponed?)

### Benchmarking

* Discuss how to integrate with validation in Key4hep

### Issues/PRs

* Object assignment operator causes duplicated releasing
    * https://github.com/AIDASoft/podio/issues/200

* Fixed width integer types
    * https://github.com/key4hep/EDM4hep/issues/112
    * https://github.com/AIDASoft/podio/pull/186 for podio support, merged
    * Move to fixed width types in edm4hep?
        * Makes sense and "doesn't hurt us".
    * Doubles/floats? Are they are guaranteed to be fixed size?
        * standard defines only minimum width
        * `<cstdint>` available for floats?
            * boost has fixed width float types
            * In `std` only means to check if they are IEEE 754 compliant and width
            
* Disentangle storage and collection interface and reference collections
    * https://github.com/AIDASoft/podio/pull/197
    * Ready for review
    * Maybe rename to subset collections (same as in LCIO)
    * Store reference/subset collection without storing the original objects?
        * Not possible at the moment, would lead to a dangling reference
        * Possible in LCIO(?)
    
* Generate an additional cmake lists file containing the generated source files
    * https://github.com/AIDASoft/podio/pull/143

#### Constness in python
* Currently solved by `Const` classes
* Also useful for `auto` type deduction, where `const auto` can be necessary otherwise to get something fully correct
* LCIO solved with flag in collections
* Do we need python writing?
    * probably yes, for writing ntuple like data


#### Heap-use-after-free

* https://github.com/AIDASoft/podio/issues/174
* Not a problem in frameworks, but if collections used outside of them
* Deep inside the memory management of podio, so not easy to fix
* Happens more often with clang than with gcc, but could be compiler options.
    * Flagged by address-sanitizer
* Maybe requires deep changes. Change for reference counting of object classes.

#### c++ concepts
* BH: add compile time checks for class behaviours: e.g., movable

#### issue w/ ROOT and (vectors of) non-copyable collections
* happens in ROOT 6.22
* PM: there is a patch available in LCG repository
    * ROOT team is working on a general solution
* Still relevant for us?

#### What are the different branches in the root file?
* https://github.com/AIDASoft/podio/issues/169
* Encode more information in the _relation_ branch names?
* Related to use in RDataFrame/RNTupe, directly looking at root file content
* Are branch names an implementation detail?
* backward compatibility "Impossible" (?)

#### Schema Evolution
- https://github.com/AIDASoft/podio/issues/86
- Discussion: https://indico.cern.ch/event/1030566/

#### Multi-Threading

#### "event class" in podio

* Currently being perceived

### PRs
* https://github.com/AIDASoft/podio/pulls

### Meta Data

#### Usage of "metadata" for user defined data
* need to check if current implementation addresses all use cases
* need test use-cases

### EventStore

### Features

* Subset collections?

#### framework ROOTWriter
* Could be refactored to use more standalone things via Gaudi Tools and Services
* Could potentially more easily write `std::vector`s or other "generic" types
    * Ideally put "generic" data types also in standalone podio
    * How to best allow for "extensions" to an EDM?
    
## LCIOConverters
* https://github.com/key4hep/k4LCIOReader

## EDM4hep
https://github.com/key4hep/EDM4hep/pulls

### Missing TrackerHitPlane type from LCIO
* https://github.com/key4hep/EDM4hep/issues/121
* Data can be stored in `edm4hep::TrackerHit` but converting back to LCIO might break algorithms that expect a dedicated class
    * Conversion is working in both directions.
* Could be solved with a "wrapper TrackerHit" class that can be used in `OneToManyRelations` etc that just stores the `Obj*` of the underlying class and forwards function calls
    * Avoids to have virtual base classes
* Start now with adding a `TrackerHitPlane` to edm4hep to be able to continue with the converters

### cellID decoding functionality in EDM4hep is missing
* https://github.com/key4hep/EDM4hep/issues/115
* pull request to fix it coming

### EDM4hep-Utils
* To be suppressed: https://github.com/key4hep/EDM4hep-utils/issues/2

### User defined collections as proper datatypes
* https://github.com/key4hep/EDM4hep/pull/114
* https://github.com/key4hep/EDM4hep/pull/117 (built on top of this idea)
* Type is `UserFloat` and not `float` -> APIs have to take `UserFloat`
    * A bit "suprising" that it is not possible to simply store a `float`
    * Needs proper documentation
* Something similar in FCC-edm with clash on macOS due to `Float.h` header file
* Benedikt: Be more strict? E.g. only allow to store user data if there is also metadata in the file.
    * For one parameter branch name could serve as "metadata"
    * Would make it more explicit, but also remove a bit of flexibility
    * Use utility class to enforce this
* Would be good to have the simple implementation now, including ints and doubles, and then try it out


### MCParticle Endpoint
* https://github.com/key4hep/EDM4hep/issues/113
* Keep podio simple by not allowing getters/setters to access multiple internal states. Use utility functions for this use case, if needed.


### Gaudi Documentation

* new "readthedocs" style at https://cern.ch/gaudi

### Issues

## AOB

### Next meeting:

*  Sep 7, * August 24
* Presentation by15:00 Benedikt about Multiple Input sourcesCEST**

There are minutes attached to this event. Show them.