Analysis Facility Pilot (Weekly Discussion )

Name: Analysis Facility Pilot (Weekly Discussion )
Start: 2025-03-07T15:00:00+01:00
End: 2025-03-07T16:00:00+01:00
Location: CERN

Friday 7 Mar 2025, 15:00 → 16:00 Europe/Zurich

513/1-024 (CERN)

513/1-024

CERN

Show room on map

Description

Useful information and links:

e-mail list: cern-analysis-facility@cern.ch

Overall description and useful information

Mattermost Channel

Workbook

Minutes

- 15:00 → 16:00
  
  Discussion with Ricardo on ml.cern.ch and the AF 1h
  
  Discussion on the differences in scope and technology of ml.cern.ch and the AF pilot.
  
  The AF team's current understanding:
  
  The effort behind serving Kubeflow is mainly to provide:
  
  ML pipelines that are not possible elsewhere e.g. distributed training
  
  Automated hyperparameter optimisation
  
  Model serving (use cases for classic ML but also for LLMs)
  
  Notebooks with GPUs (like happens in SWAN) are interesting, but not the main focus.
  
  There is an ongoing work related to dynamic GPU allocation in which the idea is to aggregate in a common pool all GPUs so that services like SWAN, batch and ml.cern.ch draw resources from the same place.
  This should decrease the time resources are idle or under used and make the overall service more (cost) efficient.
  
  We should discuss how ml workflows potentially work end to end and what environment the user is expected to use for each step. How different are the environments for the users. Is this likely to work for experienced people in the experiments as well as for new people (Phd students etc.)?
  
  -- Preparation of trainings data
  -- AOD --> DAODs--> ntuples from data
  -- MC generation of trainings data
  -- Developing a model
  -- Training the model with the data from step 1
  -- including hyperparameter optimisation
  -- Testing the model with production data
  -- see first step
  -- Making sense of the results ( plots etc. )
  -- Loop over all steps
  -- Scaling up to a production analysis