KM3NeT ML meeting

Friday 26 Jul 2024, 15:00 → 17:00 Europe/Zurich

Description

zoom link : https://cnrs.zoom.us/j/96584188752?pwd=66ItReI10jna4EfI8vqBbWa3eh9KLj.1

Hide

A big thanks to Mehdy B. for the minutes !

Lukas Hennig

Need to produce a new MC dataset in order to benchmark our new ML models

Test robustness of MC data agreement and improving it

IceCube has those kind of test

2 papers from IceCube, one was published in science on GNNs

Data/ MC agreement test, why are they important for our models ?

Model might be sensible to MC mismodelling so we need to be more attentive about the Neural Nets Models

problems :

multivariate disagreements, some of the disagrements are visible only if we look at all the variables together

(use usual metrics for classification problems ROCs, AUC (lot of random guessing))

talk about variable importance, but what kind of importance ? regarding which kind of scores ?

Provide additional features about the pulses

Questions :

Change Calibrations ? Yes can be done.
Do we understand the difference between the real data and data monte carlo ? How can we understand at the graph level, our monte carlo modelling.

The outputs of the GNN comes from the data and not the MC simulation.

The different systematics inform us if some models are sensible to some features, for instance in the CNN is not sensible to time features

How many statistics do we use ? Should standard ML prod should be bigger since we need statistics for the models

data generation proposal for a first assessment :

1 standard ML Prod
1 ML prod with +10% PMT eff
1 ML prod with -10% PMT eff
1 ML prod with +10% light absorption
1 ML prod with -10% light absorption
1 ML prod with wrong time calibration

Ivan Mozùn

Transfer learning from ORCA115 to ORCA6, ORCA10, ORCA15

If we don't have much data to train on a specific geometry of the telescope we can use transfer learning from the geometry of ORCA115 to the other geometry.

Allow us to see how the perfomance scales with the size of the dector and estimate on future configurations.

Questions :

Transfered Model achieve 20% better AUROC than model trained from scratch
Different version of MC, are you aware there was a bug in the data ?

• Aware there was error in the production, but since he uses v9 production should be fine.

Santiago do you know what kind of effect you should get ?

• Event start outside the can, you should some edge feature if there was an effect

Slide 7, black line ROC-AUC achieved with the transformer ORCA115. And with the finetune model you can achieve a really great performance with ORCA6 training for track shower.
Weight the evenements with energy
Davit : in terms of configuration the number of hits is different, so how do you handle with the sequence length ? Fix the sequence to 300 hits, look at the distribution of hits and chose 300 because it was optimal.

Got to keep the same sequence length of ORCA6, some sequences are not 'complete' (more sparse than others)

Santiago

Domain adversarial adaptation for GNNs

Questions :

How do you plan to implement domain adversarial training ?

• Got scraps from before, but don't know to continue on OrcaNet or GraphNet.

Questions & Answers :

What python packages do we need to maintain for ML ? Graphnet, what about Orcanet ?

Ivan : Graphnet is more flexible the only thing we should keep from Orca km3py but not necessary because we can uproot and use tools from km3io or orcason.

Lukas : Graphnet is better than Orcanet because larger user community, more tools. Master student has to do few month research project, had the task to implement tau neutrino identification, had to implement it with graphnet and he did pretty well.

Ecap / Jutta : Would like to advocate to have a writing format from graphnet to the km3net internal format.

Antonin : keeping maintenance of km3pip and km3io is needed, but also need to find people within the group to help maintain those packages

Ivan : Graphnet work only for supervised learning, find easy to take utilities from graphnet like dataloading or data featurization and use them for km3net
Lukas said he has a master student which is trying to reproduce what you have done, he is not trying to do optimization of what lukas has done and work with the hyperparameters that lukas has found. When he finished his report, he will make it available for the collaboration.

Santiago to Jutta : Going the way of implementing our reader or writers, make open this tools ?

Jutta : Sqlite format used by graphnet, make our tools more understandable and more accessible.

Ivan : For graphnet they directly take the root files and transform into graphnet files. There's a repo where we have our own branch where is everything is implemented, it's the km3net branch of the repo.

Jutta : We can maintain our own version of graphnet but it would be unofficial and what would be great is that our changes are taken in the official version of graphnet. A way forward would be to maintain the readers and writers as a KM3NeT package to control version and version of grpahnet it is compatible with.

Ivan : not easy to take into account the change in graphnet and in km3net, and put the km3net changes into the official branch of graphnet. So yes it would be better that we do it ourself.

Jutta : from rootfile to sqlite file with nothing inbetween. We are the only one using root files. Maintaning something that is not part of the collaboration will be trickier than if we do it ourself.

Ivan : Mix of dependencies in all possibilities.

Jutta : If it's our own writer we can handle the dependencies in a easier manner.

Antonin : Graphnet seems good way to go, to lower the barrier for starting and developing. Pause meetings now and restart ML meetings second half of August or early September. Aim to have an update on rolling out MLFLOW and discuss that more widely with people, but it's looking good.

Jutta : End of August meeting with icecube, thinking of using this method for neutrino oscillations or machine learning, data formats. Should be last friday of August.
Antonin, AI summer school on that week in Caen but we can prepare the meeting so ML and data format is discussed. Will try to be available in the afternoon.

There are minutes attached to this event. Show them.

- 15:00 → 17:00
  Meeting presentations
  - 15:00
    
    Tests for GNN robustness against systematics and for data/MC agreement 20m
    
    Speaker: Lukas Hennig
    
    Lukas_Hennig_ML_240726.pdf
  - 15:20
    
    Transfer learning with transformer 20m
    
    Speaker: Ivan MozunMateo
    
    Soft&Comp_OM_26_07_2024_IMM.pdf
  - 15:40
    
    Reducing GNN data/MC differences with Domain-Adversarial methods 20m
    
    Speaker: Santiago PenaMartinez
    
    GNN_follow_up_ML_20240726.pdf
  - 16:00
    
    Using GraphNeT and python packages for ML 20m

Choose timezone

KM3NeT ML meeting

Lukas Hennig

Ivan Mozùn

Questions :

Santiago