EDM4hep Live Notes
==================
Date: March 9, 2021
Indico: https://indico.cern.ch/event/1014178/
This is a document for taking notes during EDM4hep meetings.
Connected: Joseph, Weidong, Tao, Wenxing, Andre, Jiaheng, Thomas, Placido, Benedikt, Paul, Frank, Teng, Valentin, Birgit, Wenxing, Clement, Gerri,
Apologies:
## Introduction and General Points
## HSF Metadata Discussion
* See slides
* From the data analysis working group meetings
* Frank: What is missing is edm4hep/podio related is metadata describing unstructured data inside edm4hep, e.g, list of floats for particle ID, and then metadata giving meaning to the numbers.
* Benedikt: In the LHC experiments, this is solved by having the configuration for the algorithm which produced these numbers.
* Frank: How much policy needs to be defined in Key4hep?
* Benedikt: podio should provide the means to store the information, e.g., strings, tuples
* Weidong: In ATLAS they extract subset of data for specific analysis group. Can we support that in EDM4hep?
* Benedikt: The question is if this still happens in EDM4hep
* Weidong: Extracting from EDM4hep and adding userdata.
* Benedikt: Problem is that file level metadata gets lost
* Frank: Should store as much metadat close to the data, i.e. in the file
* Benedikt: Also makes debugging easier.
* Paul: But also don't want to read all files to do metadata manipulations
* Benedikt: Paul, what is your take on the conditions?
* Paul: For belle2 the central database stores organisational information software versions and conditions to use. The actual conditions are stored elsewhere. Fixed schema, as done in ATLAS, not so great.
* Frank: for LC have not used conditions for Physics studies, but related test beams did need and used conditions database.
* Paul: Should fix the APIs, what is underneath can change later. I.e. start with a file and later move to something else
* Benedikt: need a full featureset
* Clement: Could start with studying mis-aligned geometry. Next can fix the particleID information for jet flavour tagging?
* Frank: need to be flexible for flavour tagging, or jet clustering, LCIO solution with utilities to pick up the desired value. Using generic named parameters of vectors of strings/floats/ints
* Benedikt: Can you show a few concrete examples?
* Frank: Yes, together with Thomas
* Gerri: Should have a look at DDCond
* Clement: Maybe can ask Markus to present this?
### Event weights
* Clement: Reached limitations in EDM4hep to store them, e.g. from Delphes
* Birgit: Problem is that the event header is not stored in files from k4SimDelphes
* Thomas: so just have to store the event header, or do we need more information than is available from Delphes?
* Clement: Not sure, probably the information was properly propagated from the Generator to Delphes. But can only store one weight at the moment
* Thomas: can we get the weight from Delphes? Then easy to store it via k4SimDelphes, otherwise probably need to use the framework.
## Progress and Discussion
## Podio
### Benchmarking
[Thomas] Added some scripts and tools to have almost automatic benchmarking and some results using k4SimDelphes to "generate" data for I/O. Could be made fully automatic with a bit more work and if we want it to be fully automatic.
* https://github.com/tmadlener/podio_benchmarking
* Example results: https://github.com/tmadlener/podio_benchmarking/tree/master/results/k4SimDelphes/ee_Z_bbbar
* https://root-forum.cern.ch/t/serious-degradation-of-i-o-performance-from-6-20-04-to-6-22-06/43584
### Issues/PRs
#### Make a tag after all the warnings are fixed
#### Heap-use-after-free
* https://github.com/AIDASoft/podio/issues/174
* Not a problem in frameworks, but if collections used outside of them
* Deep inside the memory management of podio, so not easy to fix
* Happens more often with clang than with gcc, but could be compiler options.
* Flagged by address-sanitizer
* Compare with the DD4hep "Handle"s
* LINK:
#### c++ concepts
* BH: add compile time checks for class behaviours: e.g., movable
#### issue w/ ROOT and (vectors of) non-copyable collections
* happens in ROOT 6.22
* PM: there is a patch available in LCG repository
* ROOT team is working on a general solution
#### What are the different branches in the root file?
* Related to use in RDataFrame
* Encode more information in the _relation_ branch names?
#### Multi-Threading
See minutes of https://indico.cern.ch/event/969468/
--> Open issue(s) for further discussion
* Thomas and Benedikt discussing and started to work, nothing to report yet
#### "event class" in podio
* Currently being perceived
### PRs
* https://github.com/AIDASoft/podio/pulls
### Meta Data
#### Usage of "metadata" for user defined data
* need to check if current implementation addresses all use cases
* need test use-cases
### EventStore
### Schema Evolution
- Version for object descriptions, etc.
- Open issue: https://github.com/AIDASoft/podio/issues/86
### Features
* Subset collections?
## LCIOConverters
* https://github.com/key4hep/k4LCIOReader
## EDM4hep
https://github.com/key4hep/EDM4hep/pulls
### EDM4hep tools
https://github.com/key4hep/EDM4hep-utils
### Issues
### Need review of EDM4hepDelphes output
* Output is not stored if no tracks or towers created (?)
* Allow also other types to be part of the list
* Allow construction in a more general way (VV)
* Need to assume there is a complete list of reconstructed particles?
* Relations:
* FCCSW: Issue with more than one RecoParticle pointing to one MCParticle
* In LCIO to Delphes, conversion happens later than in edm4hep, maybe delphes does de-duplication?
* TODO: open issue in edm4hep with reproducer (VV)
--> Moved to separate repository
### PRs
https://github.com/key4hep/EDM4hep/pulls/
## AOB
### Next meeting:
* March 23, 2021
### TODO
### New tags for k4SimDelphes
* Updated podio
* Updated Delphes version
* Update spack fork and main spack