AI RCS Strategy Workshop

Name: AI RCS Strategy Workshop
Start: 2025-09-15T08:09:00+02:00
End: 2025-09-19T18:00:00+02:00
Location: CERN

15–19 Sept 2025

CERN

Europe/Zurich timezone

Kubeflow backed by CVMFS: Efficient ML Model distribution for the Grid

16 Sept 2025, 12:20

40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

100

Show room on map

5. Infrastructure for AI Deployment Infrastructure for AI Deployment

Valentin Volkl (CERN)

The infrastructure to deploy both training data and final models in a distributed computing environment like the WLCG is essential in order to make optimal use of ML/AI in offline computing. CVMFS is the de-facto standard to deploy software binaries, and could bring its advantages to ML operations, in particular with respect to software preservation.

As ML models used for inference are commonly stored in OCI registries CVMFS can make use of existing container tools to cache and distribute them, integrating with other platforms such as Kubeflow. This is therefore no re-invention of existing industry tools, but an enhancement of state-of-the-art tools.. However, since the access pattern of these model files differs from other software binaries, proxies and caches need to be tuned to work effectively for this use case. A central “model-registry.cern.ch” repository will be created as a service for the community, similar to unpacked.cern.ch, to make its use similarly accessible and transparent.

CERN group/ Experiment

EP-SFT

Working area	Area 5: Infrastructure for AI Deployment
Project goals	Improve ML operations in distributed environments like the grid; Integrate CVMFS with industry ML-Ops tools in order to leverage its efficiency and data preservation capabilities.
Timeline	* 2 Months: Deployment of prototype “model-registry.cern.ch” repository, first benchmarks. * 6 Months: Prototype Integration with Kubeflow container registry / mlflow.cern.ch, further performance engineering. * 1 Year: Production Release: Feedback of operators and community integrated, Documentation, Investigation of new registry-side publication mechanisms.
Available person power	0.1 FTE
Additional person power request	1 Graduate (over 1 year)
Is this an already ongoing activity?	No
Indicative hardware resources needs	Publisher machine and S3 storage ( provided centrally by IT )

Valentin Volkl (CERN)

There are no materials yet.

AI RCS Strategy Workshop

Kubeflow backed by CVMFS: Efficient ML Model distribution for the Grid

40/S2-A01 - Salle Anderson

CERN

Speaker

Description

CERN group/ Experiment

Author

Presentation materials

Choose timezone

AI RCS Strategy Workshop

Speaker

Description

CERN group/ Experiment

Author

Presentation materials