EDM4hep Discussion
Zoom
EDM4hep Live Notes
==================
Date: Feb 4, 2025
Indico: https://indico.cern.ch/event/1511117/
Connected: Mateusz, Juan, Leonhard, Thomas, Andre, Swathi, Aurora, Pere, Frank
Apologies:
## Introduction and General Points
### Upcoming workshops / conferences
https://github.com/orgs/key4hep/projects/4/views/1
## Progress and discussion
## Podio
* https://github.com/AIDASoft/podio/pulls
* https://github.com/AIDASoft/podio/issues
* https://github.com/orgs/AIDASoft/projects/2/views/1
### New tag `v01-02`
* Last tag with c++17 support
### Merged PRs
* Improve the detection of schema changes and add tests [#715](https://github.com/AIDASoft/podio/pull/715)
* Make collection iterators fulfill *LegacyInputIterator* and `input_iterator` [#626](https://github.com/AIDASoft/podio/pull/626)
* Make `LinkCollectionIterator` fulfill `std::input_iterator` concept [#725](https://github.com/AIDASoft/podio/pull/725)
* Require c\++20 and update to c\++20 [#698](https://github.com/AIDASoft/podio/pull/698)
* Make sure RNTupleReader builds with ROOT > 6.34 [#719](https://github.com/AIDASoft/podio/pull/719)
* Add support for reading several rntuple files [#708](https://github.com/AIDASoft/podio/pull/708)
* Improve exception message for untracked relations [#728](https://github.com/AIDASoft/podio/pull/728)
* Make RelationRange a proper view [#727](https://github.com/AIDASoft/podio/pull/727)
### Make it possible to read only a subset of available collections in readers
* https://github.com/AIDASoft/podio/pull/504
* Non-existent collections -> exception
* Needs a bump to ROOT 6.32 for RNTuple support as previous versions don't deal well with switching on / off fields in a model
* Allows to ignore collections that cannot be read (e.g. due to schema changes)
* Ready for review / merge
### Add globbing support in `makeReader` and `CreateDataSource`
* https://github.com/AIDASoft/podio/pull/729
* via `glob.h` on c++ side
* via `glob.glob` on python side
* Different globbing results for different parts (e.g. python vs. c++ vs. ROOT)
* Needs to be part of podio?
* Python could be done on the user side since `glob.glob` is easily available
* A bit more cumbersome to get `glob.h` to work as expected
* [ ] Make it a public utility
* Would allow to check what it expands to
* Already present, need to move out of `detail` namespace into a public namespace and people can do what they want
* Single argument `makeReader` keeps globbing, vector argument version doesn't glob
* Can't simply go through ROOT functionality because we open the first file
* E.g. TChain would expand globs
* [ ] Remove globbing from python reader
* Keep possibility to pass list
* [ ] Harmonize python and c++ API for reading?
### [WIP] Store the collection information in a struct instead of a tuple
* https://github.com/AIDASoft/podio/pull/711
* Makes `TTree` version store information as `struct` instead of `std::tuple`
* RNTuple version already stores info without `std::tuple`, **but in different format**
* [ ] Harmonize formats
## EDM4hep
* https://github.com/key4hep/EDM4hep/pulls
* https://github.com/key4hep/EDM4hep/issues
* https://github.com/orgs/key4hep/projects/5
### Merged PRs
* Switch to templated links and remove explicit declarations from YAML file [#373](https://github.com/key4hep/EDM4hep/pull/373)
* Install old schema libraries [#391](https://github.com/key4hep/EDM4hep/pull/391)
* Install rootmap for old schema dictionaries [#394](https://github.com/key4hep/EDM4hep/pull/394)
* Remove the colorFlow member from the MCParticle [#389](https://github.com/key4hep/EDM4hep/pull/389)
* Add documentation and convenience overload for ParticleID utilities [#395](https://github.com/key4hep/EDM4hep/pull/395)
* Make the event number 64 bits and unsigned [#398](https://github.com/key4hep/EDM4hep/pull/398)
* Use YAML multiline strings for extra code [#401](https://github.com/key4hep/EDM4hep/pull/401)
### add python bindings for PIDHandler
* https://github.com/key4hep/EDM4hep/pull/397
* Some details still to be clarified(?)
* Lifetimes with `std::optional` (and others) are not behaving intuitively on the python side (with cppyy)
### Use pre-built hooks in pre-commit
* https://github.com/key4hep/EDM4hep/pull/402
* Streamlines setup
* "Looks like other repositories"
### Add `edm4hep::Tensor` type for us in ML training and inference
* https://github.com/key4hep/EDM4hep/pull/388
* Via EIC colleagues
* Do we need metadata attached to this somehow? How to know where things are in this tensor?
* E.g. shape parameters for Clusters
* Very (too?) generic
* What is the use case for EIC currently?
* Support for pytorch?
* "Waste" of unused data type should be rather small as vector will just be empty
* Cannot do a `union` in podio generated datamodels
### Drift chamber digi
* https://github.com/key4hep/EDM4hep/pull/385
* Add new `SensitiveWireHit` that results from drift chamber digitization
* Before ambiguity resolution (no real 3D position defined)
* Distance to wire is a circle (defined by time of arrival and drift velocity)
* Added to `TrackerHit` interface
* Position is not well defined for `SensitiveWireHit` but need `position` for interface
* `type` and `quality` unused at the moment in `SensitiveWireHit` but probably OK to leave in
* Position is the wire that is closest to the hit.
* `position` could be renamed to something more specific and then `getPosition` can be defined in `ExtraCode`
* Might still need some adjustments
* Used in digitization but reconstruction will potentially require some changes
* Working in extension becomes cumbersome for low level types
* Need to redefine things *on top of it* for working with it
* Essentially creating a parallel data format
* Might be workable if interfaces work across extensions
* No uncertainties stored yet
* Attach uncertainties even if they are the same for all hits? -> For agnostic datamodel, yes.
* Check other hits for consistency, e.g. can `position` / `direction` be expressed in 2D + 3D position of the wire (from `cellID`)?
* [ ] Try to find names for uncertainties to see if concepts are clear
* Usable for other "drift chamber like" detectors, e.g. straw tubes
* `TrackerHitPlane` has a `position` in 3D which is redundant (i.e. can be used without `cellID`)
### RawHits removed from TrackerHits
* https://github.com/key4hep/EDM4hep/issues/382
* Breaks usage for EIC
* Can probably be brought back as `podio::ObjectID` instead of `edm4hep::ObjectID`
* [ ] Confirm schema evolution is technically possible
* [ ] Decide on timeline for this (after EDM4hep v01-00)
* [ ] Can we use `links` for this?
### Added SimDRCalorimeterHit for dual-readout
* https://github.com/key4hep/EDM4hep/pull/380
* Cannot use `SimCalorimeterHit` because crystals create two different signals **in the same volume**
* Clarify whether time is needed in CalorimeterHit or whether it can be taken from the contribution
* Do we need an interface that unifies this with `SimCalorimeterHit`?
* No immediate use case
* Does the digitization use the differences between scintillation and cerenkov photons?
* Otherwise we will just duplicate information uselessly
* Need to understand digitization in order for better judging this
* Have Lorenzo in the loop to compare how fiber based DR did this
### Zenodo Batch, Author / Contributor list
* https://github.com/key4hep/EDM4hep/pull/375
* [ ] Prepare a `CITATION.cff` file to point to some publication (once it's ready).
* Decide on a case by case basis on whether inclusion to the authors is warranted
### Run information storage
* Created issue for further: https://github.com/key4hep/EDM4hep/issues/386
## Converter & MarlinWrapper
## AoB
* DD4hep EDM4hepReader merged
* https://github.com/AIDASoft/DD4hep/pull/1371
* [ ] Documentation for PIDHandler should be more accessible
* e.g. from some "loose pages" tab on the EDM4hep doxygen main page?
## Next meeting:
* Feb 17, 09:00 CET