2nd WG meeting (online)


16.6.2023 — Online Meeting Data Generation WG

Meeting coordinates: zoom room online, 16 June 2023, 14:00-15:30

Present: Adnan Ghribi (CEA/CNRS), Adrian Oeftiger (GSI), Andrea Santamaria Garcia (LAS/KIT), Andrew Mistry (GSI), Barbara Dalena (CEA/IRFU), Chenran Xu (KIT), Francis Osswald (CNRS/IPHC), Gianluca Valentino (UM), Hayg Guler (CNRS / IJCLab), Kevin Cassou (CNRS), Simon Hirlaender (PLUS)

Minute Taking: Adnan Ghribi

Moderation: Adrian Oeftiger

Discussion aspects:

Andrew Misty joins us from nuclear physics data community and introduces us to the way they manage metadata. 

Presentation by Andrew (to put in the box)

Highlights/questions : 

Data publication : interim solution vs final solution at the European level?

zenodo as an interim solution for <50GB (max <200GB for paid solutions according to Kevin) to publish data sets with DOI

Kevin: they have large PIC simulation data sets from plasma wakefield community (for the first time facing this challenge), looking to publish >TB level

Adnan: portal to explore data before downloading them, storage for 5 years. Some links to projects working on this:



Adrian: CERN has integrated data + small evaluation kernel capabilities (NXCALS), implemented this a couple of years ago already and is gathering experience; could be an interesting solution to explore for international level?

Adnan: feature engineering just as important as metadata labeling (becomes dynamic during evaluation of data-driven models)

Adnan: discipline oriented data bases: do we have to create our own in the accelerator community?

Adnan: Note the guide to HMC better metadata booklet https://oceanrep.geomar.de/id/eprint/55270/7/2022_HMC-metadata_in_briefs_1_web.pdf

Adnan: Could be useful to have that table of metadata structure for information

Andrea : in contact with MT DMA (data management) - OpenPMD standard. Have contact with the the Helmholtz medatada initiative (they also have project calls: https://helmholtz-metadaten.de/en/projects/hmc-project-calls). Ongoing project : B2share provided by eudat, local instance of sat repo. Search for metadata and find data locally.  - https://b2share.eudat.eu/

Adrian : Our initiative have been included in slide/pres during the JENA workshop and have been well received.

Chenran: using at KIT a solution that "publishes" internally the data in a local copy of the database and only then provides it externally connecting to the public database (but same software!), which then tags with a DOI

Adrian: suggestion to use a hackathon format to organise ourselves along the study cases during the programme over the next years, implement data publication/management strategies in one group, active learning strategies in another one but working on the same study case altogether.

Kevin : IDRIS/Nvidia - hackathon - deadline june 30th - starting in sept. https://www.ins2i.cnrs.fr/fr/cnrsinfo/appel-projets-pour-beneficier-de-laccompagnement-dingenieurs-en-intelligence-artificielle

Adrian: Hackathon that can we organize as an output of the white paper and within the call

Adrian: reviews the white paper. Reminder to please fill your study case and institute.

Adnan: add if your data catalogues are ready or if you need to generate them.

Adrian: Review of the study case "Exploring Resonance Diagrams"

Francis: Review of the study case "Enhanced emittance evaluation"

Gianluca: Review of the study case "Surrogate modeling of beam losses in the LHC collimation hierarchy"

Chenran: Review of the study case "Surrogate Modelling of low-energy linac"

Kevin, Andrea, Adnan, Barbara and Hayg will add their study cases soon

Adrian: discussing some workshop organisation. 


There are minutes attached to this event. Show them.