EDM4hep Discussion

Europe/Zurich
Vidyo

Vidyo

EDM4hep Live Notes
==================

Date: October 27, 2020
Indico: https://indico.cern.ch/event/969468/

This is a document for taking notes during EDM4hep meetings.

Connected: Andre, Frank, Placido, Valentin, Graeme, Clement, Thomas, Benedikt, Gerri

Apologies: Tao, Weidong, Xingtao,

## Introduction and General Points


## Progress and Discussion

### HSF podio Presentation

* Thomas submitted the abstract

### Spack installation

* Build entire stack once in a while?
    * When needed, versioned or dated

### Nightly Builds and CI

* Entire build CI set up by Joseph
    * docker image: key4hep-container on dockerhub
    * http://dev1.bitquant.com.hk:8011/
    * add tests?

* Now possible to build with commits as versions

* Also built for CVMFS
    * /cvmfs/sw.hsf.org/key4hep/setup.sh
        * Can this repository be part of the defaults?
    * https://gitlab.cern.ch/key4hep/k4-deploy/
        * mirror of k4-spack in order to use CERN's gitlab/openstack infrastructure
        * (ask Valentin to get access)
    * [x] publish on CVMFS, need way to copy new versions and create a "setup.sh" script
    * bundle package: use "rolling release model" and tag by date, or pin versions
    * [ ] Symlink to latest setup.sh not fully there yet

* EDM4hep CI
    * Solved: building podio on the fly, but ddsim is not picking up the new library
        * lib instead of lib64 exported

## Podio

### PRs
* https://github.com/AIDASoft/podio/pulls


### Using SIO as backend

* Frank and Thomas looking into it
    * https://github.com/AIDASoft/podio/pull/130
    * IO and tests working
    * Small differences in behaviour of ROOTReader/Writer and SIOReader/Writer
    * Split test datamodel into small libraries separating SIO/ROOT/Datamodel?
        * How to load podioSioIO on demand?
        * standalone GaudiPluginManager
        * https://github.com/hegner/PM4hep
    * Add option to make RootIO optional on build/cmake time
    * Cmake: better isolation, reduce number of IFs
    * CMake functions to created libraries for different backends
        * Exercised with EDM4hep, make a PR with the changes
        * podio is backward compatible
    * Ready for review
    * Approved

### issue w/ ROOT and (vectors of) non-copyable collections
* happens in ROOT 6.22
* PM: there is a patch available in LCG repository
    * ROOT team is working on a general solution

 


### Multi-Threading

* More than one event "in flight" should be possible
* TM: root reader seems to be capable, root writer maybe not
* FG: easier to handle with "event class"

* BH: *Big Picture*: transient part and persistent part
    * transient: operations on part in memory. Whatever was put into the event cannot be changed any more. Could also allow changes of existing collection based on policy, but shouldn't mix
    * persitent: reader: take buffer out of a file -> pods -> put into event, reader is done; reader can read as many events, all separate entities. Outputsystem: takes ownership of events, drives multiple writers (different parts, different output formats)
* FG: How does, e.g., a digitiser announce its output collection?
* BH: digitiser "commits" to the reader/writer and then others can consume its output. Partly implemented in podio, partly in Gaudi. At the moment Input/Output has to be declared at the start for scheduling reasons.
* FG: Should be able to use podio outside of Gaudi
* BH: How strict do we have to be? Do you have to declare before or not?
* FG: Declaration makes things easier for multi-threading
* BH: also for IO parts.
* TM: All sounds reasonable
* FG: In LCIO/Marlin, events could be different, but usually are not. Could also attach conditions event per event, which wouldn't work in the "declaration" way. Would need some changes to conditions treatment
* BH: declaration allows for dependency tree (DAG) for scheduling
* FG: Feature of podio, or feature of framework?
* BH: Feature of the framework, not bind too tightly to given scheduling approach. Should declare all objects that will be altered to set policy to right one
* FG: Allow for non-const access to collections?
* GS: podio shouldn't bind to any multi-threading model. Allow locking of objects in events, let's the framework support mutable and immutable objects.
* FG: at object or collection level? LCIO was at object level
* GS: at the level, where the processors or algorithms access data. Maybe collection level.
* BH: probably at collection level. If two algorithms would lock a collection, this would give conflicts.
* FG: If something is read from a file, is it immutable?
* BH: CMS is very strict. E.g., if skimming is involved this caused a lot of hassle. So for stand alone podio (user w/o framework), skimming is a lot easier with mutable collections
* FG: "eventModifier" let's one easily fix mistakes, e.g., filling missing paramters
* BH: Want to support these kind of use cases
* GS: From Atlas experience: rename object on read and store again
* BH: provenance tracing gets complicated if things are renamed. Need to properly document
* PF: keep original and modified object? Flag as modified and later sort things out?
* BH: Probably more complicated
* GG: Should list use-cases to know what we need to support. Can we get input from the experiments what they are doing these days now with the experience from multi-threading.
* BH: Might be useful. But for multi-threading making things mutable doesn't work.
* GG: Fixing things isn't a day-to-day workflow. Should understand what is being done in the experiments.
* FG: To make things immutable flag the collections. Just support the occasional reparation.
* BH: Should look at potential non-LHC users, support smaller experiments.
* BH: Non-mutable is the default, mutable corner case.
* FG: yes, now should see how this affects API, event store, decoupling from readers/writers
* BH: now what about persistency: defining ownership of the PODs. Reader creates the PODs -> event store owns the PODs -> writer owns the PODs. How does the handover look like? Bookkeeping? Output queue?
* FG: Agree with the approach. That what is understood with "event", which owns things. How to define the API,classes, code?
* BH: Complicated piece is interaction with the framework. Who creates the event? Who deletes the event? Facade on top?
* VV: Make things flexible about who owns. E.g., Collection owns its data?
* BH/FG: Collection owns its data, something else owns the collection.
* TM: Do we really only have collections to pass around, or also std::vector of MCParticles? Usecase not forseen? Sometimes want to "sort" particles
* VV: Agree, usecase does show up. Though makes memory more complicated
* BH: pointers in vectors can become invalid, references to objects in vector can be complicated. Create example to see what needs to be done (->VV/TM)
* VV: Not being able to use the types causes some limits for edm4hep
* FG: example?
* VV: RDataFrame assumes one can fill container with type. Being forced to use collections makes it tricky to use RDataFrame.
* FG: Collections don't fulfil container contracts?
* VV: Not about collections, but about MCParticle type, because it doesn't own its data, but points to POD

--> Open issue(s) for further discussion

* BH: Some things can be deplouled (RDataFrame), discuss/work on Output with TM
* BH: If there is multi-threading in the library, how to glue to it to the framework? Assume control, library, tasks?
* FG: Do not do it in podio, give examples, but the library itself should not do automatic multi-threading.
* BH: but use thread-safe containers, use from c++ standard? Not implement thread-safe queue?
* FG: thread-safe queue as an example
* VV: Mix podio and framework, example for thread-safe container sounds good.

#### "event class" in podio

* Currently being perceived

### Meta Data

#### Usage of "metadata" for user defined data
* need to check if current implementation addresses all use cases
* need test use-cases

### Issues following MetaData Developments

* cannot write out event data previously read from file
    * Issue: https://github.com/AIDASoft/podio/issues/103
    * Test: https://github.com/AIDASoft/podio/pull/102
    * TM: have fix, but break framework core handling of collections

* there seems to be a fix in the Gaudi ROOTWriter in FCC-EDM !?
    * historical reason, fixes and developments should be back ported to PODIO
    * ...

#### Writing second file with another tree

* https://github.com/key4hep/K4FWCore/issues/10
* this problem only happens in the GAUDI framework
    * if user tries to write an additional ROOT file
    * to be addressed

### EventStore

### Schema Evolution

- Version for object descriptions, etc.
- Open issue: https://github.com/AIDASoft/podio/issues/86

### Features

* Subset collections?

## LCIOConverters

* Conversion from LCIO to EDM4HEP is almost completed, but we need more testing for the three associations:
    * MCRecoParticleAssociation
    * MCRecoCaloAssociation
    * MCRecoTrackerAssociation
* LCIOInput, an algorithm wrapper in Gaudi/K4FWCore
    * https://github.com/ihep-sft-group/LCIOInput
* K4LCIOReader:
    * https://github.com/ihep-sft-group/K4LCIOReader
* Used to reconstruct clusters with Pandora
    * Publish Pandora Interface on github

* working on spack recipe for CEPC software. Using `spack install cepcsw` to install CEPCSW.
    * https://github.com/key4hep/k4-spack/pull/73

## EDM4hep
https://github.com/key4hep/EDM4hep/pulls

### Tracker Hit

* Q: Tracker hit input to tracking algorithms?
* A: In LCIO different traker hits: Planar and Cylindrical using inheritance. Still needs to be adressed how to do this in EDM4hep.
* Q: Is inheritance needed?
* Q: What to use for Driftchambers?
* Open issue:


### Need review of EDM4hepDelphes output

* Output is not stored if no tracks or towers created (?)
    * Allow also other types to be part of the list
    * Allow construction in a more general way (VV)
* Need to assume there is a complete list of reconstructed particles?

* Relations:
    * FCCSW: Issue with more than one RecoParticle pointing to one MCParticle
    * In LCIO to Delphes, conversion happens later than in edm4hep, maybe delphes does de-duplication?
    * TODO: open issue in edm4hep with reproducer (VV)


### PRs

* Fix delphes EDM4Hep plugin
    * https://github.com/key4hep/EDM4hep/pull/79
    * Merged
    * Add a mention of the example to the key4hep docs:
        * https://github.com/key4hep/key4hep-doc

### Release 1.0

* Need:
    * ~~Plugin~~
    * ~~Eventheader~~
    * ~~Meta Data (Event / Run Parameters)~~

## AOB

### Dual Read-out calorimeter for FCC

* Special data structure used for that simulation
* Tried to use edm4hep, some issues
* Present in a future meeting

### Conditions handling in Belle2
* Benedikt, or Martin Ritter

### Feedback from FCC tutorial for snowmass

* Common question: What are the different branches in the root file?
* ``uproot`` was also used to access the root file
    * Basically re-implemented event store
        * Associations
    * Try to get this into the, e.g., podio repository

There are minutes attached to this event. Show them.
    • 09:00 09:05
      Introduction 5m
      Speakers: Andre Sailer (CERN), Frank-Dieter Gaede (Deutsches Elektronen-Synchrotron (DE))
    • 09:25 10:00
      Discussion 35m
      Speaker: Dr All