EDM4hep Live Notes
==================
Date: April 05, 2022
Indico: https://indico.cern.ch/event/1151858
This is a document for taking notes during EDM4hep meetings.
Connected: Graeme, Benedikt, Thomas, Frank, Clement, Juraj, Placido, Tao Lin, Valentin, Weidong, Wenxing, Xunwu
Apologies: Andre
## Introduction and General Points
### Upcoming workshops / conferences
* ECFA Higgs Factories: 1st Topical Meeting on Reconstruction
- https://indico.cern.ch/event/1124095/
- May 04-05 at DESY (i.e. in person)
- Key4hep expert meeting May 05-06
* keep in person, not allow for remote participants (or at least not foresee explicit remote participation)
* FCC detectors workshop
* June 20-22
* Key4hep hackathon
* June 21-23 at CERN (in person), <https://indico.cern.ch/event/1148507/>
* Please register
## Progress and discussion
## Podio
* https://github.com/AIDASoft/podio/pulls
* https://github.com/AIDASoft/podio/issues
#### Changes to dereferenced collection iterators are not reflected in the underlying collection
* https://github.com/AIDASoft/podio/issues/281
* Need to discuss what we want to support
* What should actually happen when we assign a new handle to an existing one?
* Things that look reasonable should work as expected.
* Everything that doesn't work should fail at compile time!
#### Make Collection iterators usable with stl algorithms
* https://github.com/AIDASoft/podio/pull/273
* From Wouter (EIC), quite some discussion already there
* We can make the iterators [`std::bidirectional_iterator`](https://en.cppreference.com/w/cpp/iterator/bidirectional_iterator) -> Things like `std::find`, etc. work
* Only work properly if we do not try to mutate the underlying collection (e.g. `std::fill` will compile but will not work as intended)
* No easy fix to make that work (or not compile)
* Having things compile but not behave as expected is an invitation for trouble.
* `delete`-ing things manually is quite a bit of maintenance burden
* Document properly and merge or block?
* Not having this might force users out of podio ecosystem, collections could become storage only facilities
#### Allow setting default initializers on member definition
* https://github.com/AIDASoft/podio/issues/266
* https://github.com/AIDASoft/podio/pull/268
* proposal, using `gcc` under the hood to verify valid initialization.
* Validation can become dominating for generator runtime
* Allow syntax in yaml file and make it non-mandatory there
* [x] Default initialize everything under the hood, if not set by the user
- We already do this! (via `{}`)
* How to proceed? (with or without extended validation?)
* Doing this at generation time is overkill, because it is in any case caught at compilation (or at the latest CI).
#### Allow easier extension of data models (e.g. EIC builds on edm4hep)
* https://github.com/AIDASoft/podio/issues/263
* Discuss at Key4hep meeting next week, when Wouter can be present
* Mentioned that they still want this, but they have a workaround for the moment, so not extremely urgent
#### Remove `std::string` from allowed member types
* https://github.com/AIDASoft/podio/pull/276
* Don't seem to have (m)any use case
* Simplifies code generation (and potentially default initialization)
#### Templated access function for `GenericParameters`
* https://github.com/AIDASoft/podio/pull/262
* Add templated `[gs]etValue` methods to `GenericParamters` and deprectaed non-templated ones
* Preparatory work for `Frame`
#### Frame
* https://github.com/AIDASoft/podio/pull/278
* Started breaking up my initial prototype into easier to digest pieces and make it less of a prototype
* Plan to keep chunks small(ish) to make it possible to review them
* Make dedicated branch until a first version is finished?
* Currently only in-memory, no I/O yet
* Need to touch quite a bit of podio for that, will probably break "everything"
* External locking policy (e.g. removing internal locking)
* If you are working on a single thread, you are probably not looking for the most performance
#### Licence
* podio is licenced as GPL currently
* Need to confirm with all contributors (one pending) if changing is OK
#### Schema evolution
* Staged approach: Introduce the basic infrastructure and add the easy cases (adding/dropping members first)
* More complicated changes (e.g. re-ordering members) later
* Technically: transform "old" POD read from file to "new" POD. Users only see one version
## EDM4hep
* https://github.com/key4hep/EDM4hep/pulls
* https://github.com/key4hep/EDM4hep/issues
### Missing schema evolution
- Missing schema evolution is becoming an issue for some studies as PRs are blocked (e.g. `dN/dx` for FCC studies)
- Benedikt currently on parental leave -> unclear timeline as to when schema evolution arrives
- How to proceed?
* Use UserData for now and see if we can add that with schema evolution later.
* Do things really break for older files? E.g. just adding a member at the end?
- [ ] Try with ROOT I/O (only concern for this)
- Removing members works with ROOT
- Definitely breaks with SIO
- [ ] Check if adding a VectorMember works
- [ ] Can RDataFrame work with this directly?
* Have two builds with different EDM4hep tags / versions?
* Need to find someone who can work on schema evolution now.
* This should be unblocked by first stage of schema evolution by end of April
### Extending EDM4hep
* (Radical proposal of) splitting EDM4hep into a "core EDM4hep" and a few extensions that serve different purposes.
* What is the overhead of having unused types
* [ ] Have to check, but probably not too much (library size, computational overhead)
* Potentially less maintenance burden with a core EDM and a few extensions (but could also be different)
* Breaks promise of having one common EDM4hep.
* Large danger of fragmenting the EDM phase space
* Not having the extensibility is part of the EDM4hep design
* How would you solve it technically?
* Make podio able to generate code using "external" EDM. Then link together different libraries
* Current pain point is speed of progress
* Slow progress can become frustrating
* Current situation also a special case because missing schema evolution is a blocking (technical) factor and not the discussions that will be there for future additions
* We need to strike the right balance between extensibility and stability
* Stability should not prohibit physics studies
* But stable EDM is something we want to have and that is crucial for success of future studies
### Where to store Event/Run number?
* See discussion in: https://github.com/AIDASoft/DD4hep/pull/903
* Keep in `EventHeader` for now and check again once `Frame` is here, if there is a more fitting place to put this
#### SimCalorimeterHit and CaloHitContribution: missing fields used in EIC production
* https://github.com/key4hep/EDM4hep/issues/147
* From datamodel POV simple to add it
* Some physics questions behind it, e.g. electronics, timing. I.e. what do you actually fill into that field
* Could also store the MC truth time
* Could also depend on simulation, e.g. do you need each Geant4 hit (digitial calorimeters)
* Not necessarily the lowest time of all CaloHitContributions
* LCIO also covers use case from detailed simulations for Calorimeter testbeam studies. In the linear collider world `SimCalorimeterHit` is not yet digitized, and just holds all data from the Geant4 simulation that migth be necessary for digitization (which is done at reconstruction time).
#### PDG field in ReconstructedParticle is desirable in EIC workflows
* https://github.com/key4hep/EDM4hep/issues/146
* Think about renaming `type` to `PDG`
* [x] Check usage first
* [ ] Schema evolution necessary?
* What happens, e.g. for Jets?
* not setting type https://github.com/iLCSoft/MarlinFastJet/blob/4bee01f859c26b7faee90317947b9d8ee2e5c373/include/FastJetUtil.h#L556-L577
#### RecoParticleVertexAssociation: rename relation members for consistency between association
* https://github.com/key4hep/EDM4hep/pull/134
* Make naming consistent over all associations
* Potential to get confusing (without proper associations in podio, see https://github.com/AIDASoft/podio/pull/257)
* Which member should be `from` and which one should be `to`?
* "Higher level" should be `from` and "lower level" (i.e. MC) should be `to`
* Want a templated version and maybe some utitlities to make working with associations easier (e.g. create a map on the fly from the contents of the collection like in LCIO)
#### track length in LCIO / iLCSoft
* No track length data member in Track, computed on the fly via utilities from TrackStates and TrackerHits
* Wasn't needed in LC
* Stored as PID (vector of floats) in Track at the moment
* Can use `UserData` for this, but `UserData` is not part of the EDM in the end
### AOB:
* EDM4hep meeting with EIC compatible time slot. Check with them on April 26 for their opinion
* Collect list of potential topics for expert meeting / Key4hep workshop?
- e.g. track length discussion
- discuss with people present
## Next meeting:
* No meeting on May 3 (ECFA reconstruction workshop + expert meeting + holiday in China)
* Try to put together minutes / summary for people not present
* May 17