EDM4hep Live Notes
==================

Date: April 05, 2022
Indico: https://indico.cern.ch/event/1151858
This is a document for taking notes during EDM4hep meetings.

Connected: Graeme, Benedikt, Thomas, Frank, Clement, Juraj, Placido, Tao Lin, Valentin, Weidong, Wenxing, Xunwu

Apologies: Andre

## Introduction and General Points

### Upcoming workshops / conferences
    
* ECFA Higgs Factories: 1st Topical Meeting on Reconstruction
    - https://indico.cern.ch/event/1124095/
    - May 04-05 at DESY (i.e. in person)
    - Key4hep expert meeting May 05-06
        * keep in person, not allow for remote participants (or at least not foresee explicit remote participation)

* FCC detectors workshop
    * June 20-22

* Key4hep hackathon
    * June 21-23 at CERN (in person), <https://indico.cern.ch/event/1148507/>
    * Please register
    
   
## Progress and discussion

## Podio
* https://github.com/AIDASoft/podio/pulls
* https://github.com/AIDASoft/podio/issues

#### Changes to dereferenced collection iterators are not reflected in the underlying collection
* https://github.com/AIDASoft/podio/issues/281
* Need to discuss what we want to support
    * What should actually happen when we assign a new handle to an existing one?
    * Things that look reasonable should work as expected.
* Everything that doesn't work should fail at compile time!


#### Make Collection iterators usable with stl algorithms
* https://github.com/AIDASoft/podio/pull/273
* From Wouter (EIC), quite some discussion already there
* We can make the iterators [`std::bidirectional_iterator`](https://en.cppreference.com/w/cpp/iterator/bidirectional_iterator) -> Things like `std::find`, etc. work
* Only work properly if we do not try to mutate the underlying collection (e.g. `std::fill` will compile but will not work as intended)
    * No easy fix to make that work (or not compile)
    * Having things compile but not behave as expected is an invitation for trouble.
    * `delete`-ing things manually is quite a bit of maintenance burden
    * Document properly and merge or block?
    * Not having this might force users out of podio ecosystem, collections could become storage only facilities

#### Allow setting default initializers on member definition
* https://github.com/AIDASoft/podio/issues/266
* https://github.com/AIDASoft/podio/pull/268
    * proposal, using `gcc` under the hood to verify valid initialization.
    * Validation can become dominating for generator runtime
* Allow syntax in yaml file and make it non-mandatory there
* [x] Default initialize everything under the hood, if not set by the user
    - We already do this! (via `{}`)
* How to proceed? (with or without extended validation?)
    * Doing this at generation time is overkill, because it is in any case caught at compilation (or at the latest CI).

#### Allow easier extension of data models (e.g. EIC builds on edm4hep)
* https://github.com/AIDASoft/podio/issues/263
* Discuss at Key4hep meeting next week, when Wouter can be present
    * Mentioned that they still want this, but they have a workaround for the moment, so not extremely urgent
    
#### Remove `std::string` from allowed member types
* https://github.com/AIDASoft/podio/pull/276
* Don't seem to have (m)any use case
* Simplifies code generation (and potentially default initialization) 

#### Templated access function for `GenericParameters`
* https://github.com/AIDASoft/podio/pull/262
* Add templated `[gs]etValue` methods to `GenericParamters` and deprectaed non-templated ones
* Preparatory work for `Frame`
    
#### Frame
* https://github.com/AIDASoft/podio/pull/278
* Started breaking up my initial prototype into easier to digest pieces and make it less of a prototype
    * Plan to keep chunks small(ish) to make it possible to review them
    * Make dedicated branch until a first version is finished?
* Currently only in-memory, no I/O yet
    * Need to touch quite a bit of podio for that, will probably break "everything"
* External locking policy (e.g. removing internal locking)
    * If you are working on a single thread, you are probably not looking for the most performance
    
#### Licence
* podio is licenced as GPL currently
* Need to confirm with all contributors (one pending) if changing is OK

#### Schema evolution
* Staged approach: Introduce the basic infrastructure and add the easy cases (adding/dropping members first)
* More complicated changes (e.g. re-ordering members) later
* Technically: transform "old" POD read from file to "new" POD. Users only see one version


## EDM4hep
* https://github.com/key4hep/EDM4hep/pulls
* https://github.com/key4hep/EDM4hep/issues

### Missing schema evolution
- Missing schema evolution is becoming an issue for some studies as PRs are blocked (e.g. `dN/dx` for FCC studies)
- Benedikt currently on parental leave -> unclear timeline as to when schema evolution arrives
- How to proceed?
    * Use UserData for now and see if we can add that with schema evolution later.
    * Do things really break for older files? E.g. just adding a member at the end?
        - [ ] Try with ROOT I/O (only concern for this)
            - Removing members works with ROOT
        - Definitely breaks with SIO
        - [ ] Check if adding a VectorMember works
        - [ ] Can RDataFrame work with this directly?
     * Have two builds with different EDM4hep tags / versions?
     * Need to find someone who can work on schema evolution now.
     * This should be unblocked by first stage of schema evolution by end of April
     
### Extending EDM4hep
* (Radical proposal of) splitting EDM4hep into a "core EDM4hep" and a few extensions that serve different purposes.
* What is the overhead of having unused types
    * [ ] Have to check, but probably not too much (library size, computational overhead)
    * Potentially less maintenance burden with a core EDM and a few extensions (but could also be different)
* Breaks promise of having one common EDM4hep.
    * Large danger of fragmenting the EDM phase space
    * Not having the extensibility is part of the EDM4hep design
* How would you solve it technically?
    * Make podio able to generate code using "external" EDM. Then link together different libraries
* Current pain point is speed of progress
    * Slow progress can become frustrating
    * Current situation also a special case because missing schema evolution is a blocking (technical) factor and not the discussions that will be there for future additions
* We need to strike the right balance between extensibility and stability
    * Stability should not prohibit physics studies
    * But stable EDM is something we want to have and that is crucial for success of future studies


### Where to store Event/Run number?
* See discussion in: https://github.com/AIDASoft/DD4hep/pull/903
* Keep in `EventHeader` for now and check again once `Frame` is here, if there is a more fitting place to put this

#### SimCalorimeterHit and CaloHitContribution: missing fields used in EIC production
* https://github.com/key4hep/EDM4hep/issues/147
* From datamodel POV simple to add it
* Some physics questions behind it, e.g. electronics, timing. I.e. what do you actually fill into that field
    * Could also store the MC truth time
* Could also depend on simulation, e.g. do you need each Geant4 hit (digitial calorimeters)
* Not necessarily the lowest time of all CaloHitContributions
* LCIO also covers use case from detailed simulations for Calorimeter testbeam studies. In the linear collider world `SimCalorimeterHit` is not yet digitized, and just holds all data from the Geant4 simulation that migth be necessary for digitization (which is done at reconstruction time).


#### PDG field in ReconstructedParticle is desirable in EIC workflows
* https://github.com/key4hep/EDM4hep/issues/146
* Think about renaming `type` to `PDG`
    * [x] Check usage first
    * [ ] Schema evolution necessary?
* What happens, e.g. for Jets?
    * not setting type https://github.com/iLCSoft/MarlinFastJet/blob/4bee01f859c26b7faee90317947b9d8ee2e5c373/include/FastJetUtil.h#L556-L577
    
#### RecoParticleVertexAssociation: rename relation members for consistency between association
* https://github.com/key4hep/EDM4hep/pull/134
* Make naming consistent over all associations
* Potential to get confusing (without proper associations in podio, see https://github.com/AIDASoft/podio/pull/257)
    * Which member should be `from` and which one should be `to`?
    * "Higher level" should be `from` and "lower level" (i.e. MC) should be `to`
    * Want a templated version and maybe some utitlities to make working with associations easier (e.g. create a map on the fly from the contents of the collection like in LCIO)
    
#### track length in LCIO / iLCSoft
* No track length data member in Track, computed on the fly via utilities from TrackStates and TrackerHits
* Wasn't needed in LC
* Stored as PID (vector of floats) in Track at the moment
* Can use `UserData` for this, but `UserData` is not part of the EDM in the end

### AOB:
* EDM4hep meeting with EIC compatible time slot. Check with them on April 26 for their opinion
* Collect list of potential topics for expert meeting / Key4hep workshop?
    - e.g. track length discussion
    - discuss with people present

## Next meeting:
* No meeting on May 3 (ECFA reconstruction workshop + expert meeting + holiday in China)
    * Try to put together minutes / summary for people not present
* May 17