Mar 17 – 19, 2026
University of Oslo
Europe/Zurich timezone

Fueling the AI Factory: The Central Role of the Metadata Catalog

Mar 19, 2026, 11:55 AM
15m
Gamle Festsal (University of Oslo)

Gamle Festsal

University of Oslo

Karl Johans gate 47
AI and storage AI and Storage

Speaker

Jean-Thomas Acquaviva (DDN Storage)

Description

The rise of industrial "AI Factories," exposes a critical bottleneck that transcends raw storage performance: managing the data itself. 
As AI/ML pipelines ingest and process vast, heterogeneous datasets, the complexity of data discovery, lineage, and governance becomes a primary inhibitor to scaling operations. Traditional storage systems fail to answer vital questions: "Where is the verified, compliant dataset for training?", "What is the exact data lineage of this deployed model?", and "How do we optimize data placement across a distributed infrastructure?"
This presentation details the design and implementation of the intelligent metadata catalog at the heart of the European project DaFAB. We demonstrate that a metadata-driven approach is the key to unlocking efficient, reproducible, and sovereign AI. We will cover: (1) The use of semantic search and active metadata for federated data discovery; (2) Automated data lineage and versioning to ensure model reproducibility and compliance; and (3) How the catalog integrates with the storage layer to orchestrate data movement, respecting data gravity to optimize for performance and cost.
By treating metadata as a primary asset, the DaFAB catalog transforms the storage infrastructure from a passive repository into an active, intelligent component of the modern AI factory. The metadata catalog underlying technology used in DaFab, is the Rucio open-source technology originated from CERN. We will conclude by drawing some perspective with industrial solutions.

Suggested Contribution Type Regular Talk (15-30 min)

Author

Jean-Thomas Acquaviva (DDN Storage)

Presentation materials