Analysis Facility Pilot (Weekly Discussion )

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Description

Useful information and links:

e-mail list: cern-analysis-facility@cern.ch

Overall description and useful information

Mattermost Channel

Workbook

Minutes

Zoom Meeting ID
61085982895
Host
Markus Schulz
Alternative host
Ben Jones
Useful links
Join via phone
Zoom URL
    • 15:00 16:00
      Discussion with Ricardo on ml.cern.ch and the AF 1h

      Discussion on the differences in scope and technology of ml.cern.ch and the AF pilot.

      The AF team's current understanding:

      The effort behind serving Kubeflow is mainly to provide:

      ML pipelines that are not possible elsewhere e.g. distributed training

      Automated hyperparameter optimisation

      Model serving (use cases for classic ML but also for LLMs)

      Notebooks with GPUs (like happens in SWAN) are interesting, but not the main focus.

      There is an ongoing work related to dynamic GPU allocation in which the idea is to aggregate in a common pool all GPUs so that services like SWAN, batch and ml.cern.ch draw resources from the same place.
      This should decrease the time resources are idle or under used and make the overall service more (cost) efficient.

      We should discuss how ml workflows potentially work end to end and what environment the user is expected to use for each step. How different are the environments for the users. Is this likely to work for experienced people in the experiments as well as for new people (Phd students etc.)?

      -- Preparation of trainings data
      -- AOD --> DAODs--> ntuples from data
      -- MC generation of trainings data
      -- Developing a model
      -- Training the model with the data from step 1
      -- including hyperparameter optimisation
      -- Testing the model with production data
      -- see first step
      -- Making sense of the results ( plots etc. )
      -- Loop over all steps
      -- Scaling up to a production analysis