EOS 2025 Workshop
The 9th EOS workshop is in preparation to bring together the EOS community.
The two and half day in-person event is organized to provide a platform forย exchange between developers, users and sites running EOS. We are in particular welcoming newcomers to join the community.ย
The workshop takes place at CERN.
This workshop is part ofย
TechWeekStorage25ย "Spotlight on Storage & Data Technologies at CERN"
taking place from 24th to 28th of March 2025.ย
The workshop will cover a wide range of topics related to EOS development, operations, deployments, applications, collaborations and diverse use-cases!
Agenda Highlights:
- EOS Project Roadmap
- EOS Development and Operations at CERN
- EOS Deployment and Operations world-wide
Recordings
ย
All presentation will be recorded and published with previous agreement of the speaker.
Fees
The workshop participation will be without fee.
Registrations
Registration is open to anyone at this link.
If you are interested in joining the EOS community,ย this is the perfect occasion!
We look forward to having you at the in-person workshop in March 2025 during TechWeek25!
ย
Your CERN EOS team.
-
-
15:00
→
16:05
-
15:00
EOS 5.2 and 5.3 Status / Overview 20m
This presentation will give a short overview of the past releases and significant changes, new features and bug fixes.
Speaker: Elvin Alin Sindrilaru (CERN) -
15:20
EOS and XRootD HTTP improvements 25m
With the continuous growth in the use of the HTTP protocol for file transfers within the WLCG community, several enhancements and optimisations have been introduced to the EOS HTTP and XRootD HTTP stacks.
From updates to the SciTags and packet marking specifications to addressing libcurl internal modifications, 2024 presented a number of challenges that required targeted solutions.
This presentation will provide an overview of the key changes and new features implemented to enhance the handling of HTTP file transfers in EOS and XRootD.
Speaker: Cedric Caffy (CERN) -
15:45
Storage Tiering in EOS 20m
We will give an overview of new features for storage tiering in EOS version 5.3
Speaker: Andreas Joachim Peters (CERN)
-
15:00
-
16:05
→
16:25
-
16:25
→
17:35
-
16:25
QClient Improvements for the next fastest Metadata 20m
Every operation that modifies/queries the metadata from the persistent metadata storage QuarkDB goes via QClient. We look at some current bottlenecks and improvements that v5.3 offers with various configurations.
Speaker: Mr Abhishek Lekshmanan (CERN) -
16:45
Advancements in FSCK for EOS 20m
One of a critical components in EOS is fsck, responsible for scanning, verifying, and repairing inconsistencies in the filesystem.
This talk will provide an in-depth exploration of fsck in EOS, covering its architecture, scanning mechanisms, and repair strategies. We will discuss recent improvements, including the introduction of a best-effort mode, and enhancements in erasure-coded file scanning, which significantly boost performance while minimizing the impact on the running instance.
Speaker: Gianmaria Del Monte (CERN) -
17:05
XRootD File Cloning 15m
A software development motivated by an EOS use case is explained: file cloning to facilitate updates of erasure-coded files.
Speaker: David Smith (CERN) -
17:20
Status of the S3 Interface for EOS 15m
We will present an overview of the current state of the S3 gateway for EOS.
Speaker: Andreas Joachim Peters (CERN)
-
16:25
-
18:30
→
22:30
Dinner: EOS Auberge de Meyrin
Auberge de Meyrin
Avenue de Vaudagne 13bis 1217 Meyrin-
18:30
Social Dinner 2h 30m
Social Dinner in Meyrin Village.
-
18:30
-
15:00
→
16:05
-
-
09:30
→
11:00
Operational Tools & Configuration: EOS 40/S2-D01 - Salle Dirac
-
09:30
Deploying an EOS Instance from Scratch: A Practical Guide 20m
EOS is a powerful and flexible storage system, but setting up a new instance from scratch requires a solid understanding of its configuration and operational best practices. This talk will provide a step-by-step guide to deploying EOS, covering key components and essential configurations.
We will walk through the setup process, including storage provisioning, replication, erasure coding, and balancing strategies. The session will also touch on best practices for performance tuning and ensuring reliability in production environments.
This talk is ideal for system administrators and operators looking to gain practical insights into EOS deployment, whether for testing, small-scale clusters, or large production environments.
Speaker: Gianmaria Del Monte (CERN) -
09:50
Data Federations with EOS 20m
Data federations with EOS offers various approaches to seamlessly integrate and manage distributed storage across heterogeneous environments. This presentation explores multiple federation techniques and namespace aggregation with remote EOS instances. We will discuss the advantages and trade-offs of each method, considering factors such as performance, scalability, security, and ease of management. Real-world use cases and best practices will be highlighted to help organisations choose the most suitable strategy for their needs.
Speaker: Luca Mascetti (CERN) -
10:10
A Distributed Probe for EOS: Real-Time Availability Monitoring and Alerting 20m
Ensuring the availability of EOS instances is crucial for large-scale storage operations. To enhance monitoring and incident response, we have developed a new distributed probe designed to detect and alert operators about instance malfunctions in real-time.
This talk will introduce the architecture and functionality of the probe, which runs across multiple nodes to provide redundancy and reliability. Alerts are dispatched via multiple channels, including SMS, email, Mattermost, and CERN ITโs General Services Availability. Additionally, all availability events are published on a NATS-based pub-sub channel, enabling future integrations with operational tools such as EOS Diagnostic Tool.
Speaker: Gianmaria Del Monte (CERN) -
10:30
Diagnostic tool for submitting useful information for future debugging 25m
For a stuck/non responsive EOS MGM, some simple diagnostic information can go a long way. We look at a new eos-diagnostic-tool for dumping stacktraces etc. for submitting useful bug reports. We also invite discussions on how to improve the tooling for the future.
Speaker: Abhishek Lekshmanan (CERN)
-
09:30
-
11:00
→
11:20
-
11:20
→
12:20
-
11:20
A Distributed Storage Odyssey: from CentOS7 to ALMA9 20m
On the 30th of June 2024, the end of CentOS 7 support marked a new era for the operation of the multi-petabytes distributed disk storage system used by CERN physics experiments. The EOS infrastructure at CERN is composed of aproximately 1000 disk servers and 50 metadata management nodes. Their transition from CentOS 7 to Alma 9 was not as straightforward as anticipated.
This presentation will be all about explaining this transition. From the change of supported certificate and kerberos key signature lengths and algorithms, to openssl library hiccups and Linux kernel crashes, the EOS operation team had to take on different challenges to ensure a seamless operating system transition of the infrastructure while maintaining uninterrupted CERN experimentsโ data transfers.
Speaker: Cedric Caffy (CERN) -
11:40
Evaluating Jumbo frames performance across LHC experiments 20m
This work presents an evaluation of JUMBO frame tests conducted at CERN to assess their impact on data transfer performance across different physics workflows. Preliminary internal tests were carried out to analyze potential benefits and challenges, followed by collaborative testing involving the ATLAS, CMS, and LHCb experiments. The goal was to measure the advantages of JUMBO frames in terms of efficiency and throughput while identifying and resolving any issues arising from their deployment. The study provides insights into the feasibility of JUMBO frames for large-scale scientific data transfers, aiming to optimize network performance for high-energy physics experiments.
Speaker: Dr Maria Arsuaga Rios (CERN) -
12:00
Refurbishing the Meyrin Data Centre: Storage Juggling and Operations 20m
The 50-year-old Meyrin Data Centre (MDC), still remains indispensable due to its strategic geographical location and unique electrical power resilience even if CERN IT recently commissioned the Prรฉvessin Data Centre (PDC), doubling the organizationโs hosting capacity in terms of electricity and cooling. The Meyrin Data Centre (Building 513) retains an essential role for the CERN Tier-0 Run 4 commitments, notably as primary hosting location for the tape archive and the disk storage. The inevitable investments to the infrastructure (UPS and Cooling) are now triggering the refurbishment of the two main rooms where all the storage equipment is hosted. This presentation will delve into the architectural advancements and operational strategies implemented for and during the Meyrin data centre refurbishment. We will explore how these developments will impact our storage and how the storage operations team will ensure EOSโs performance, scalability, and reliability in the coming years.
Speaker: Octavian-Mihai Matei (CERN)
-
11:20
-
14:00
→
15:30
-
14:00
EOS for Physics at CERN: Operational Insights, Achievements, and Future Directions 25m
This work presents an overview of the EOS operations at CERN, focusing on its role in supporting physics data processing and storage. EOS is a high-performance distributed storage system designed to handle the vast volumes of scientific data generated by CERN experiments. This study examines key performance metrics, recent achievements, and strategic objectives for the current year, emphasizing improvements in efficiency, reliability, and scalability. Special attention is given to the impact of EOS on physics workflows, ensuring seamless data access and analysis. By evaluating past accomplishments and future goals, this work highlights the continuous evolution of EOS to meet the growing demands of physics research at CERN.
Speaker: Dr Maria Arsuaga Rios (CERN) -
14:25
EOS Status at IHEP 20m
In this talk, we want to share our experiences of EOS at IHEP, including migration from CentOS 7 to Almalinux 9, construction of Alice EOS, and dual-site deployment of LHCb T1 EOS.
Speaker: Dr Yujiang BI (Institute of High Energy Physics, Chinese Academy of Sciences) -
14:45
EOS site report of the Joint Research Centre 20m
The Joint Research Centre (JRC) of the European Commission is running the Big Data Analytics Platform (BDAP) to enable the JRC projects and scientists to store, process, and analyze a wide range and large amount of data, and to share and disseminate data products.
EOS is the main system of BDAP for storing scientific data. The BDAP services are actively used by more than 100 JRC projects, covering a wide range of data analytics activities. The EOS instance at JRC has been implemented in 2016 and has currently a gross capacity of 43 PB. It is composed of heterogeneous commodity hardware components which has been extended noticeably over time.
The talk will present the EOS service at JRC as storage back-end of the Big Data Analytics Platform. The presentation covers the EOS setup, configuration and current status. It describes the activities over the last year, presents experiences made and issues discovered, and gives an outlook of planned activities during 2025.
Speaker: Armin Burger -
15:05
Planning an EOS Data Federation to deal with Climate Change using AI 20m
The National Institute for Space Research - INPE (Brazil) is leading a research program: Intelligent Early Warning System for Climate Extremes - SIPEC. The project aims at predicting the likelihood of climate extremes, months in advance using a diverse source of data coming from satellites and an array of intelligent sensors spread across the country. Such data streams will feed both classical meteorological models and AI machine learning algorithms for the ultimate early warning of climate extremes.
Given the number of institutions producing large amounts of data needed to train the ML algorithms by scientists dealing with different parts of the problem, at different places, we are implementing an EOS Data Federation in Brazil. The implementation of the EOS family of tools, in addition to being capable to deal with large volumes of distributed data, also takes care of security controls for who has access to what portions of the datasets.
Speakers: Dr Paulo Nobre (INPE), Wanderley Mendes (INPE)
-
14:00
-
15:30
→
15:50
-
15:50
→
17:05
-
15:50
Cloud-Native EOS Deployment for ATLAS T2 on Kubernetes 20m
I will discuss our Kubernetes-based EOS deployment as it approaches production readiness for our ATLAS T2 site, as well as evaluation of EOS for several astronomy projects.
Speaker: Ryan Taylor (University of Victoria (CA)) -
16:10
CERNBox and EOSHPM status update 20m
CERNBox and EOS HOME/PROJECT(/MEDIA) operational issues seen in 2024 and expected in 2025.
Speakers: Jan Iven (CERN), Diogo Castro (CERN) -
16:30
You still have those QDB backups, right? (Practical example of disaster recovery of EOS deployment) 20m
In December of 2024 the EOS cluster at Purdue University suffered a security incident which wiped out all metadata of our production deployment. In this brief talk we will give a step-by-step example of what it takes to recover from such setback, and discuss the best backup practices.
Speaker: Stefan Piperov (Purdue University (US))
-
15:50
-
09:30
→
11:00
-
-
09:30
→
10:40
Benchmarking & Hardware Evolution: EOS 40/S2-D01 - Salle Dirac
-
09:30
Analysis Benchmarking with EOS/RNTuple 30m
This presentation will report about the benchmarking results of various EOS setups at CERN using the new RNTuple framework.
Speaker: Andreas Joachim Peters (CERN) -
10:00
XRootD Update & Parallel Socket Benchmarking 20mSpeaker: Guilherme Amadio (CERN)
-
10:20
Storage Hardware at CERN 20m
Current & future storage hardware at CERN.
Speaker: Luca Mascetti (CERN)
-
09:30
-
10:40
→
11:00
-
11:00
→
12:00
-
11:00
The EOS Development Workplan & Roadmap 20m
We will outline the EOS development roadmap, highlighting key milestones, upcoming features, and future plans. This presentation will provide insights into ongoing improvements, strategic goals, and the evolving direction of EOS.
Speakers: Andreas Joachim Peters (CERN), Elvin Alin Sindrilaru (CERN) -
11:20
Discussion, Proposals, Feature Requests 40m
Survey Topics
- Ansible Configuration for EOS
- SquashFS as small File Repository
-
11:00
-
14:00
→
16:00
-
14:00
How to benchmark EOS with bash? 30mSpeaker: Andreas Joachim Peters (CERN)
-
14:30
How to setup authentication Front-ends? 30mSpeaker: Elvin Alin Sindrilaru (CERN)
-
15:00
How to configure TLS & ZTN in XRootD? 30mSpeaker: Guilherme Amadio (CERN)
-
15:30
How to transtion from MQ to no MQ? 30mSpeaker: Elvin Alin Sindrilaru (CERN)
-
14:00
-
09:30
→
10:40