19–21 Mar 2025
LMU
Europe/Zurich timezone

Using Federated Data Infrastructure for a European Open Web Index

20 Mar 2025, 17:15
15m
LMU

LMU

Presentation CS3 federations and synergies with eResearch infrastructures. Data sharing infrastrcutures

Speaker

Mohamad Hayek

Description

In an era where web search serves as a cornerstone driving the global digital economy, an open, impartial and transparently produced web index is a key opportunity for Europe and beyond. Currently, the landscape is dominated by a select few gatekeepers who provide their web search services with minimal scrutiny from the general public. Moreover, web data has emerged as a pivotal element in the development of AI systems, particularly Large Language Models. The efficacy of these models depends upon both the quantity and quality of the data available. Consequently, restricted access to web data and search capabilities severely curtails the innovation potential, particularly for smaller innovators and researchers who lack the resources to manage petabyte platforms.
In this talk, we present the OpenWebSearch.eu project which is currently developing the core of a European Open Web Index (OWI) as a basis for a new Internet Search in Europe. We mainly focus on the setup of a Federated Data Infrastructure leveraging geographically distributed data and computing resources at top-tier supercomputing centres across Europe. This data infrastructure leverages MINIO/S3, iRODS, EUDAT (B2SAFE, B2HANDLE) and our previous work on the LEXIS Platform for distributed computing and data management. The system developed facilitates efficient execution of complex processing and indexing workflows.

Author

Co-authors

Andreas Wagner (CERN) Jan Martinovič (VSB - Technical University of Ostrava) Katja Mankinen (CSC – IT Center for Science) Martin Golasowski Megi Sharikadze Michael Granitzer Ms Noor Afshan Fathima (CERN) Saber Zerhoudi (University of Passau) Sebastian Heineking (Webis-Group) Stephan Hachinger (Leibniz Supercomputing Centre (LRZ) of the BAdW)

Presentation materials

There are no materials yet.