1st WG meeting (online)


26.5.2023 — Online Meeting Data Generation WG

Meeting coordinates: zoom room online, 26 May 2023, 16:00-17:30

Present: Adnan Ghribi (CEA/CNRS), Adrian Oeftiger (GSI), Amelia Pollard (ASTEC/STFC), Andrea Santamaria Garcia (LAS/KIT), Barbara Dalena (CEA/IRFU), Chenran Xu (KIT), Gianluca Valentino (UM), Hayg Guler (CNRS / IJCLab), Pierre Schnitzer (HZB), Samuel Marini (CEA/CNRS), Tatiana Pieloni (EPFL)

Minute Taking & Moderation: Adrian Oeftiger

Purpose: first online meeting (after kickoff at IPAC23) to brainstorm the direction of the data generation & simulations working group within a response to the EU horizon24 infratec call (INFRA-2024-TECH-01-01)

Discussion aspects:

Adrian: an idea for a programme for this WG: share data from different relevant projects among collaborators, investigate family of methods. A lot of data generation comes from simulations, also some experimental data, push for active learning methods to improve on grid & random parameter searches.

Pierre: important to generate metadata, successful outcome of proposal depends a lot on user stories (“what do you want to do?”)

Adrian: emphasise ecological and economical aspects of active learning, deeper reach / more focused results with guided parameter searches.

Andrea: active learning looks very relevant to a lot of beam dynamics studies, such as space charge, dynamic aperture studies, FCC design

Tatiana: a lot of simulations are run repetitively for very similar if not same parameters e.g. for FCC design from different institutes and people, could be much more ecologic. She presents some aspects on xboinc project, where they want to use active learning in this existing and funded project to incorporate . Tatiana emphasises comparison to machine experience and merging of machine & simulation data. She brings up data augmentation, e.g. they need more simulations to cover parameter space instead of (in addition to?) running more (costly LHC) machine experiments.

Idea of a shared infrastructure between institutes, a kind of interface which is consulted for simulation parameters before they are run on HPC hardware in order to point to existing results — this could avoid a lot of duplicate simulations

Adnan: can build on existing & funded projects and find weaknesses to suggest improvements

Barbara: these kind of approaches can be applied to dynamic aperture & nonlinear optics design just as well as to plasma acceleration, underlines similarity on metadata level

Adnan: feature engineering is a crucial but complex task to obtain successful results, involve company with dedicated specialists in call response?

Adrian: also at universities a lot of interesting experts could help from other (mathematical/statistical) departments and join this proposal

Adnan: suggests to write a joint white paper to gather exactly what (physics) cases we would like to study from different collaborators’ perspectives, could help in these discussions with external experts

Adnan: suggests to involve Andrew Mistry (GSI) who is an expert on data management and already works in EUROLABS & on nuclear science data management.

Action Adrian: get in touch with Andrew

Adrian: benefit of orienting WG proposal programme along defined study cases from different institutes would be to build a common database organically during the programme. When every study case gets looked at by the other collaborators (to try out different methods), we will need to exchange data and will necessarily fill a database construction with life (since there’s an individual net outcome!).

In white paper, action for everyone: gather study cases with data type, input/output dimensions, number of samples; also methods we want to try?

Andrea: highlights augmenting and adversial data generation

Tatiana: suggests to check for existing platforms for this type of data exchange, what platform to use?

Action Adrian: will open an overleaf document for the white paper draft, topics of study cases:

Heavy / extended simulation studies

Data augmentation for machine data

Metadata discussion

Amy: mentions data exchange platforms on peer-to-peer basis

Action items:

Adrian: invite Andrew Mistry as data management expert to next meeting

Adrian: open overleaf document for white paper draft

Everyone: add to white paper draft a description of relevant study case which can be carried out (not necessarily exclusively) within framework of this proposal:

data type (simulation/experiment)

input/output dimensions

number of samples (order of magnitude)

methods to try during data generation

