EDM4hep Discussion
Vidyo
EDM4hep Live Notes
==================
Date: October 27, 2020
Indico: https://indico.cern.ch/event/969468/
This is a document for taking notes during EDM4hep meetings.
Connected: Andre, Frank, Placido, Valentin, Graeme, Clement, Thomas, Benedikt, Gerri
Apologies: Tao, Weidong, Xingtao,
## Introduction and General Points
## Progress and Discussion
### HSF podio Presentation
* Thomas submitted the abstract
### Spack installation
* Build entire stack once in a while?
* When needed, versioned or dated
### Nightly Builds and CI
* Entire build CI set up by Joseph
* docker image: key4hep-container on dockerhub
* http://dev1.bitquant.com.hk:8011/
* add tests?
* Now possible to build with commits as versions
* Also built for CVMFS
* /cvmfs/sw.hsf.org/key4hep/setup.sh
* Can this repository be part of the defaults?
* https://gitlab.cern.ch/key4hep/k4-deploy/
* mirror of k4-spack in order to use CERN's gitlab/openstack infrastructure
* (ask Valentin to get access)
* [x] publish on CVMFS, need way to copy new versions and create a "setup.sh" script
* bundle package: use "rolling release model" and tag by date, or pin versions
* [ ] Symlink to latest setup.sh not fully there yet
* EDM4hep CI
* Solved: building podio on the fly, but ddsim is not picking up the new library
* lib instead of lib64 exported
## Podio
### PRs
* https://github.com/AIDASoft/podio/pulls
### Using SIO as backend
* Frank and Thomas looking into it
* https://github.com/AIDASoft/podio/pull/130
* IO and tests working
* Small differences in behaviour of ROOTReader/Writer and SIOReader/Writer
* Split test datamodel into small libraries separating SIO/ROOT/Datamodel?
* How to load podioSioIO on demand?
* standalone GaudiPluginManager
* https://github.com/hegner/PM4hep
* Add option to make RootIO optional on build/cmake time
* Cmake: better isolation, reduce number of IFs
* CMake functions to created libraries for different backends
* Exercised with EDM4hep, make a PR with the changes
* podio is backward compatible
* Ready for review
* Approved
### issue w/ ROOT and (vectors of) non-copyable collections
* happens in ROOT 6.22
* PM: there is a patch available in LCG repository
* ROOT team is working on a general solution
### Multi-Threading
* More than one event "in flight" should be possible
* TM: root reader seems to be capable, root writer maybe not
* FG: easier to handle with "event class"
* BH: *Big Picture*: transient part and persistent part
* transient: operations on part in memory. Whatever was put into the event cannot be changed any more. Could also allow changes of existing collection based on policy, but shouldn't mix
* persitent: reader: take buffer out of a file -> pods -> put into event, reader is done; reader can read as many events, all separate entities. Outputsystem: takes ownership of events, drives multiple writers (different parts, different output formats)
* FG: How does, e.g., a digitiser announce its output collection?
* BH: digitiser "commits" to the reader/writer and then others can consume its output. Partly implemented in podio, partly in Gaudi. At the moment Input/Output has to be declared at the start for scheduling reasons.
* FG: Should be able to use podio outside of Gaudi
* BH: How strict do we have to be? Do you have to declare before or not?
* FG: Declaration makes things easier for multi-threading
* BH: also for IO parts.
* TM: All sounds reasonable
* FG: In LCIO/Marlin, events could be different, but usually are not. Could also attach conditions event per event, which wouldn't work in the "declaration" way. Would need some changes to conditions treatment
* BH: declaration allows for dependency tree (DAG) for scheduling
* FG: Feature of podio, or feature of framework?
* BH: Feature of the framework, not bind too tightly to given scheduling approach. Should declare all objects that will be altered to set policy to right one
* FG: Allow for non-const access to collections?
* GS: podio shouldn't bind to any multi-threading model. Allow locking of objects in events, let's the framework support mutable and immutable objects.
* FG: at object or collection level? LCIO was at object level
* GS: at the level, where the processors or algorithms access data. Maybe collection level.
* BH: probably at collection level. If two algorithms would lock a collection, this would give conflicts.
* FG: If something is read from a file, is it immutable?
* BH: CMS is very strict. E.g., if skimming is involved this caused a lot of hassle. So for stand alone podio (user w/o framework), skimming is a lot easier with mutable collections
* FG: "eventModifier" let's one easily fix mistakes, e.g., filling missing paramters
* BH: Want to support these kind of use cases
* GS: From Atlas experience: rename object on read and store again
* BH: provenance tracing gets complicated if things are renamed. Need to properly document
* PF: keep original and modified object? Flag as modified and later sort things out?
* BH: Probably more complicated
* GG: Should list use-cases to know what we need to support. Can we get input from the experiments what they are doing these days now with the experience from multi-threading.
* BH: Might be useful. But for multi-threading making things mutable doesn't work.
* GG: Fixing things isn't a day-to-day workflow. Should understand what is being done in the experiments.
* FG: To make things immutable flag the collections. Just support the occasional reparation.
* BH: Should look at potential non-LHC users, support smaller experiments.
* BH: Non-mutable is the default, mutable corner case.
* FG: yes, now should see how this affects API, event store, decoupling from readers/writers
* BH: now what about persistency: defining ownership of the PODs. Reader creates the PODs -> event store owns the PODs -> writer owns the PODs. How does the handover look like? Bookkeeping? Output queue?
* FG: Agree with the approach. That what is understood with "event", which owns things. How to define the API,classes, code?
* BH: Complicated piece is interaction with the framework. Who creates the event? Who deletes the event? Facade on top?
* VV: Make things flexible about who owns. E.g., Collection owns its data?
* BH/FG: Collection owns its data, something else owns the collection.
* TM: Do we really only have collections to pass around, or also std::vector of MCParticles? Usecase not forseen? Sometimes want to "sort" particles
* VV: Agree, usecase does show up. Though makes memory more complicated
* BH: pointers in vectors can become invalid, references to objects in vector can be complicated. Create example to see what needs to be done (->VV/TM)
* VV: Not being able to use the types causes some limits for edm4hep
* FG: example?
* VV: RDataFrame assumes one can fill container with type. Being forced to use collections makes it tricky to use RDataFrame.
* FG: Collections don't fulfil container contracts?
* VV: Not about collections, but about MCParticle type, because it doesn't own its data, but points to POD
--> Open issue(s) for further discussion
* BH: Some things can be deplouled (RDataFrame), discuss/work on Output with TM
* BH: If there is multi-threading in the library, how to glue to it to the framework? Assume control, library, tasks?
* FG: Do not do it in podio, give examples, but the library itself should not do automatic multi-threading.
* BH: but use thread-safe containers, use from c++ standard? Not implement thread-safe queue?
* FG: thread-safe queue as an example
* VV: Mix podio and framework, example for thread-safe container sounds good.
#### "event class" in podio
* Currently being perceived
### Meta Data
#### Usage of "metadata" for user defined data
* need to check if current implementation addresses all use cases
* need test use-cases
### Issues following MetaData Developments
* cannot write out event data previously read from file
* Issue: https://github.com/AIDASoft/podio/issues/103
* Test: https://github.com/AIDASoft/podio/pull/102
* TM: have fix, but break framework core handling of collections
* there seems to be a fix in the Gaudi ROOTWriter in FCC-EDM !?
* historical reason, fixes and developments should be back ported to PODIO
* ...
#### Writing second file with another tree
* https://github.com/key4hep/K4FWCore/issues/10
* this problem only happens in the GAUDI framework
* if user tries to write an additional ROOT file
* to be addressed
### EventStore
### Schema Evolution
- Version for object descriptions, etc.
- Open issue: https://github.com/AIDASoft/podio/issues/86
### Features
* Subset collections?
## LCIOConverters
* Conversion from LCIO to EDM4HEP is almost completed, but we need more testing for the three associations:
* MCRecoParticleAssociation
* MCRecoCaloAssociation
* MCRecoTrackerAssociation
* LCIOInput, an algorithm wrapper in Gaudi/K4FWCore
* https://github.com/ihep-sft-group/LCIOInput
* K4LCIOReader:
* https://github.com/ihep-sft-group/K4LCIOReader
* Used to reconstruct clusters with Pandora
* Publish Pandora Interface on github
* working on spack recipe for CEPC software. Using `spack install cepcsw` to install CEPCSW.
* https://github.com/key4hep/k4-spack/pull/73
## EDM4hep
https://github.com/key4hep/EDM4hep/pulls
### Tracker Hit
* Q: Tracker hit input to tracking algorithms?
* A: In LCIO different traker hits: Planar and Cylindrical using inheritance. Still needs to be adressed how to do this in EDM4hep.
* Q: Is inheritance needed?
* Q: What to use for Driftchambers?
* Open issue:
### Need review of EDM4hepDelphes output
* Output is not stored if no tracks or towers created (?)
* Allow also other types to be part of the list
* Allow construction in a more general way (VV)
* Need to assume there is a complete list of reconstructed particles?
* Relations:
* FCCSW: Issue with more than one RecoParticle pointing to one MCParticle
* In LCIO to Delphes, conversion happens later than in edm4hep, maybe delphes does de-duplication?
* TODO: open issue in edm4hep with reproducer (VV)
### PRs
* Fix delphes EDM4Hep plugin
* https://github.com/key4hep/EDM4hep/pull/79
* Merged
* Add a mention of the example to the key4hep docs:
* https://github.com/key4hep/key4hep-doc
### Release 1.0
* Need:
* ~~Plugin~~
* ~~Eventheader~~
* ~~Meta Data (Event / Run Parameters)~~
## AOB
### Dual Read-out calorimeter for FCC
* Special data structure used for that simulation
* Tried to use edm4hep, some issues
* Present in a future meeting
### Conditions handling in Belle2
* Benedikt, or Martin Ritter
### Feedback from FCC tutorial for snowmass
* Common question: What are the different branches in the root file?
* ``uproot`` was also used to access the root file
* Basically re-implemented event store
* Associations
* Try to get this into the, e.g., podio repository