EDM4hep Discussion

Europe/Zurich
Zoom

Zoom

Zoom Meeting ID
98484040528
Host
Andre Sailer
Useful links
Join via phone
Zoom URL

EDM4hep Live Notes
==================

Date: Apr 15, 2025
Indico: https://indico.cern.ch/event/1533730/

Connected: Juraj, Andre, Thomas, Leonhard, Sanghyun, Aurora, Juan, Sun, Pere, Brieuc

Apologies: Frank, Mateusz, 


### Upcoming workshops / conferences
https://github.com/orgs/key4hep/projects/4/views/1
    
## Progress and discussion

### dynamic loading / path issue in k4FWCore / podio / ROOT / spack?
* https://github.com/key4hep/key4hep-spack/issues/736
* [ ] Check differences in build output for podio before and after break

## Podio
* https://github.com/AIDASoft/podio/pulls
* https://github.com/AIDASoft/podio/issues
* https://github.com/orgs/AIDASoft/projects/2/views/1
 
### Merged PRs
* Add a c++ implementation for `podio-dump` [#620](https://github.com/AIDASoft/podio/pull/620)
* Add types that can be used to get all the data (or links) types [#761](https://github.com/AIDASoft/podio/pull/761)
    * Define a compile-time usable typelist of all types defined by a datamodel. Can be used to remove boilerplate code, see e.g. [k4FWCore#305](https://github.com/key4hep/k4FWCore/pull/305)
* Add `typeName` member to link collections [#748](https://github.com/AIDASoft/podio/pull/748), [#762](https://github.com/AIDASoft/podio/pull/762)
    * Harmonize behavior with generated collections
* CMake cleanup and typo fixes [#763](https://github.com/AIDASoft/podio/pull/763), [#759](https://github.com/AIDASoft/podio/pull/759)

### Silence deprecation warning due to RNTuple API stabilization 
* https://github.com/AIDASoft/podio/pull/757
* All features that we use are now moved outside of the `Experimental` namespace
* Ready for review & merge (LCG picks up patch from PR)


### [WIP] Add ccache to workflows to speed up CI and add more Key4hep workflows
* https://github.com/AIDASoft/podio/pull/764
* Needs updates to `run-lcg-view` action
* Triggered by [this comment](https://github.com/AIDASoft/podio/pull/620#issuecomment-2793451009)
    * Lost compatibility with gcc11 in [#626](https://github.com/AIDASoft/podio/pull/626)
    * Switched to c++20 at the same time
    * [ ] Document new requirement or try to fix?
    * #626 only made tests fail to build, #620 also breaks "normal" build
    * [ ] Open issue to announce drop of gcc11 support
    * spack CI still uses gcc11

### Store the collection information in a struct instead of a tuple
* https://github.com/AIDASoft/podio/pull/711
* Makes `TTree` version store information as `struct` instead of `std::tuple`
* RNTuple version already stores info without `std::tuple`, **but in different format**
* [x] Harmonize formats
* Now stored as a `vector` of structs. Previosly was AoS (effectively) for RNTuple
* Now also store the data format that hits the disk (i.e. `vector<XYZData>`)
* [ ] Let Pere double check if this is as desired
    
### Add `RVec` to root dictionaries
* https://github.com/AIDASoft/podio/pull/750
* Triggered by https://github.com/key4hep/EDM4hep/issues/416
* `RVec` dictionaries are currently (partially) defined via FCCAnalyses
* [ ] Where should the go (formally)?
    * FCCAnalyses is effectively "polluting" the global environment. If we provide this, it should be done centrally
    * In principle our files are not part of the public API (no schema evolution, no link resolution)
    * No easy way to figure out where dictionary definitions come from via ROOT.
        * E.g. bisecting `LD_LIBRARY_PATH`
* Features are used nevertheless. Users don't care about all the features always
    * Other use cases: ML related studies partially use uproot
    * Reality: Users do what they want, regardless of what we officially support
* We could improve usability with plain ROOT
    * E.g. more schema evolution support (that we partially need to do in any case)
* [ ] Discuss again in slightly larger round

### Formalize common interface collections with a concept
* https://github.com/AIDASoft/podio/pull/758
* Better compile time checks for collection interace parts that are commonly generated but go beyond `CollectionBase`
* Make it public?
    * Currently keep it as a test to make sure that all collections provide the same interface. We have had some issues with inconsistencies in the past 
* [ ] Clarifying (documentation) comment missing
    

## EDM4hep
* https://github.com/key4hep/EDM4hep/pulls
* https://github.com/key4hep/EDM4hep/issues
* https://github.com/orgs/key4hep/projects/5 


### Merged PRs
* Use link collection `typeName` [#414](https://github.com/key4hep/EDM4hep/pull/414)
* Run pre-commit on Key4hep to have a recent enough clang-format [#425](https://github.com/key4hep/EDM4hep/pull/425)
* add missing units in datamodel yaml file [#422](https://github.com/key4hep/EDM4hep/pull/422)

### Clarification of trackerHitPlane members
* https://github.com/key4hep/EDM4hep/issues/421
* TrackerHitPlane in LCIO originally meant for tracking without geometry(?)
* Directions that span the measurement plan are a property of the sensor, not the hit?
* Is TrackerHitPlane incomplete or overlapping with information from sensor to do proper tracking
* [ ] Ping Frank and clarify documentation

### Pointer persistency in EDM4hep
* https://github.com/key4hep/EDM4hep/issues/418
* Interplay, e.g. with Pandora that expects a pointer
    * Pandora is not touching this, just using this to transport information along
    * E.g. could be `std::any` and Handle is kept alive because `std::any` is constructed by value (via new member)
* Need to have a way for EDM4hep
    * Current workarounds would require filling a temporary vector to keep pointers to handles alive
    * No really user friendly way at the moment 
* Do we need a nice way? Should only appear in few central places
    * [ ] Document workaround with vector
    * [ ] Point uses to issue and make them indicate they use this
    * [ ] Decide on number of actual users whether it's worth the effort to do a "proper solution"

### Limit the RNTuple backwards compatibility to podio 1.2 
* https://github.com/key4hep/EDM4hep/pull/413
* Necessary after https://github.com/AIDASoft/podio/pull/711

### [WiP] Replace the spin by a helicity property for the MCParticle
* https://github.com/key4hep/EDM4hep/pull/404
* Debugging special schema evolution that ideally stores `spin.z` into `helicty`
    * [x] Do we want / need this? -> Yes, at least would be worthwile to see if schema evolution can deal with this
    * Quite a few existing LCIO files with `spin`, but could be absorved in the LCIO to EDM4hep converter
    * [ ] Are there existing EDM4hep files with this?
* [ ] Need to check where this breaks downstream
* `9` as default value for "unset", why?
    * Historically what some generators have been using

### add python bindings for PIDHandler
* https://github.com/key4hep/EDM4hep/pull/397
* Some details still to be clarified(?)
* Lifetimes with `std::optional` (and others) are not behaving intuitively on the python side (with cppyy)
    * Could add throwing overloads for wrappers because there is better support for that. But naming could become confusing because we have `get_optional` in other cases and would need a different name here for the throwing version.
* Many use cases dictated by podio usage of `optional` (e.g. parametes). 
* [ ] Potentially need to discuss / decide on how our interfaces should look like in general

### Use pre-built hooks in pre-commit
* https://github.com/key4hep/EDM4hep/pull/402
* Streamlines setup
    * "Looks like other repositories"
    * clang-tidy not easily packaged (no issue for EDM4hep, but other packages use it)
* [x] Can pre-commit use a different config file than `.pre-commit-config.yaml`
    * Yes, can use the `--config` flag
    
## Converter & MarlinWrapper

* Ensure that all EDM4hep collections are discovered for conversion [#233](https://github.com/key4hep/k4MarlinWrapper/pull/233)
* Replace InitDD4hep processor with Gaudi serivces during steering file conversion [#230](https://github.com/key4hep/k4MarlinWrapper/pull/230)
* CellID encoding string missing in conversion from LCIO -> EDM4hep
    * Standalone: https://github.com/key4hep/k4EDM4hep2LcioConv/issues/114
    * MarlinWrapper: https://github.com/key4hep/key4hep-tutorials/pull/21
    * [ ] `MetaDataHandle` read access behaving differently between `PodioDataSvc` and `IOSvc`? 

## AoB

* No time information in `SimCalorimeterHit` raised (again) from US FCC tutorial.
    * Even if calculation is not uniquely defined, how to store it?
    * Digitization can store information directly into `CalorimeterHit`
    * Timing (performance) studies might be interested in a truth time which would have to re-compute the *sim* time from the `CaloHitContribution` every time
    * Could use a `UserData` collection that runs in parallel to cache results and not have to re-compute it

* Generator output as EDM4hep?
    * Generator authors will most likely not switch to EDM4hep, they have HepMC3 as their common standard
    * Have `k4GeneratorsConfig` for converting HepMC3 to EDM4hep
    * Ongoing discussion to upstream that to HepMC3 (because HepMC3 does not have a proper plug-in system to keep it on our end)
    * Maybe keep this converter in Key4hep land for easier synchronization

## Next meeting:
* Apr 29, 09:00 CET

There are minutes attached to this event. Show them.