EOS 2025 Workshop

Europe/Zurich
40/S2-D01 - Salle Dirac (CERN)

40/S2-D01 - Salle Dirac

CERN

115
Show room on map
Andreas Joachim Peters (CERN), Jakub Moscicki (CERN), Luca Mascetti (CERN)
Description




The 9th EOS workshop is in preparation to bring together the EOS community.

The two and half day in-person event is organized to provide a platform forย exchange between developers, users and sites running EOS. We are in particular welcoming newcomers to join the community.ย 

The workshop takes place at CERN.


This workshop is part ofย 
TechWeekStorage25ย "Spotlight on Storage & Data Technologies at CERN"
taking place from 24th to 28th of March 2025.ย 

The workshop will cover a wide range of topics related to EOS development, operations, deployments, applications, collaborations and diverse use-cases!

Agenda Highlights:

  • EOS Project Roadmap
  • EOS Development and Operations at CERN
  • EOS Deployment and Operations world-wide

Recordings
ย 

All presentation will be recorded and published with previous agreement of the speaker.

Fees

The workshop participation will be without fee.

Registrations

Registration is open to anyone at this link.

If you are interested in joining the EOS community,ย this is the perfect occasion!

We look forward to having you at the in-person workshop in March 2025 during TechWeek25!

ย 

Your CERN EOS team.

Surveys
EOS Workshop Proposal
Webcast
There is a live webcast for this event
Zoom Meeting ID
67689194660
Host
Jakub Moscicki
Alternative host
Michael Davis
Useful links
Join via phone
Zoom URL
    • 15:00 16:05
      Development: EOS 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
      • 15:00
        EOS 5.2 and 5.3 Status / Overview 20m

        This presentation will give a short overview of the past releases and significant changes, new features and bug fixes.

        Speaker: Elvin Alin Sindrilaru (CERN)
      • 15:20
        EOS and XRootD HTTP improvements 25m

        With the continuous growth in the use of the HTTP protocol for file transfers within the WLCG community, several enhancements and optimisations have been introduced to the EOS HTTP and XRootD HTTP stacks.

        From updates to the SciTags and packet marking specifications to addressing libcurl internal modifications, 2024 presented a number of challenges that required targeted solutions.

        This presentation will provide an overview of the key changes and new features implemented to enhance the handling of HTTP file transfers in EOS and XRootD.

        Speaker: Cedric Caffy (CERN)
      • 15:45
        Storage Tiering in EOS 20m

        We will give an overview of new features for storage tiering in EOS version 5.3

        Speaker: Andreas Joachim Peters (CERN)
    • 16:05 16:25
      Coffee Break 20m 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
    • 16:25 17:35
      Development: EOS 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
      • 16:25
        QClient Improvements for the next fastest Metadata 20m

        Every operation that modifies/queries the metadata from the persistent metadata storage QuarkDB goes via QClient. We look at some current bottlenecks and improvements that v5.3 offers with various configurations.

        Speaker: Mr Abhishek Lekshmanan (CERN)
      • 16:45
        Advancements in FSCK for EOS 20m

        One of a critical components in EOS is fsck, responsible for scanning, verifying, and repairing inconsistencies in the filesystem.

        This talk will provide an in-depth exploration of fsck in EOS, covering its architecture, scanning mechanisms, and repair strategies. We will discuss recent improvements, including the introduction of a best-effort mode, and enhancements in erasure-coded file scanning, which significantly boost performance while minimizing the impact on the running instance.

        Speaker: Gianmaria Del Monte (CERN)
      • 17:05
        XRootD File Cloning 15m

        A software development motivated by an EOS use case is explained: file cloning to facilitate updates of erasure-coded files.

        Speaker: David Smith (CERN)
      • 17:20
        Status of the S3 Interface for EOS 15m

        We will present an overview of the current state of the S3 gateway for EOS.

        Speaker: Andreas Joachim Peters (CERN)
    • 18:30 22:30
      Dinner: EOS Auberge de Meyrin

      Auberge de Meyrin

      Avenue de Vaudagne 13bis 1217 Meyrin
      • 18:30
        Social Dinner 2h 30m

        Social Dinner in Meyrin Village.

    • 09:30 11:00
      Operational Tools & Configuration: EOS 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
      • 09:30
        Deploying an EOS Instance from Scratch: A Practical Guide 20m

        EOS is a powerful and flexible storage system, but setting up a new instance from scratch requires a solid understanding of its configuration and operational best practices. This talk will provide a step-by-step guide to deploying EOS, covering key components and essential configurations.

        We will walk through the setup process, including storage provisioning, replication, erasure coding, and balancing strategies. The session will also touch on best practices for performance tuning and ensuring reliability in production environments.

        This talk is ideal for system administrators and operators looking to gain practical insights into EOS deployment, whether for testing, small-scale clusters, or large production environments.

        Speaker: Gianmaria Del Monte (CERN)
      • 09:50
        Data Federations with EOS 20m

        Data federations with EOS offers various approaches to seamlessly integrate and manage distributed storage across heterogeneous environments. This presentation explores multiple federation techniques and namespace aggregation with remote EOS instances. We will discuss the advantages and trade-offs of each method, considering factors such as performance, scalability, security, and ease of management. Real-world use cases and best practices will be highlighted to help organisations choose the most suitable strategy for their needs.

        Speaker: Luca Mascetti (CERN)
      • 10:10
        A Distributed Probe for EOS: Real-Time Availability Monitoring and Alerting 20m

        Ensuring the availability of EOS instances is crucial for large-scale storage operations. To enhance monitoring and incident response, we have developed a new distributed probe designed to detect and alert operators about instance malfunctions in real-time.

        This talk will introduce the architecture and functionality of the probe, which runs across multiple nodes to provide redundancy and reliability. Alerts are dispatched via multiple channels, including SMS, email, Mattermost, and CERN ITโ€™s General Services Availability. Additionally, all availability events are published on a NATS-based pub-sub channel, enabling future integrations with operational tools such as EOS Diagnostic Tool.

        Speaker: Gianmaria Del Monte (CERN)
      • 10:30
        Diagnostic tool for submitting useful information for future debugging 25m

        For a stuck/non responsive EOS MGM, some simple diagnostic information can go a long way. We look at a new eos-diagnostic-tool for dumping stacktraces etc. for submitting useful bug reports. We also invite discussions on how to improve the tooling for the future.

        Speaker: Abhishek Lekshmanan (CERN)
    • 11:00 11:20
      Coffee Break 20m 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
    • 11:20 12:20
      Site Evolution: EOS 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
      • 11:20
        A Distributed Storage Odyssey: from CentOS7 to ALMA9 20m

        On the 30th of June 2024, the end of CentOS 7 support marked a new era for the operation of the multi-petabytes distributed disk storage system used by CERN physics experiments. The EOS infrastructure at CERN is composed of aproximately 1000 disk servers and 50 metadata management nodes. Their transition from CentOS 7 to Alma 9 was not as straightforward as anticipated.

        This presentation will be all about explaining this transition. From the change of supported certificate and kerberos key signature lengths and algorithms, to openssl library hiccups and Linux kernel crashes, the EOS operation team had to take on different challenges to ensure a seamless operating system transition of the infrastructure while maintaining uninterrupted CERN experimentsโ€™ data transfers.

        Speaker: Cedric Caffy (CERN)
      • 11:40
        Evaluating Jumbo frames performance across LHC experiments 20m

        This work presents an evaluation of JUMBO frame tests conducted at CERN to assess their impact on data transfer performance across different physics workflows. Preliminary internal tests were carried out to analyze potential benefits and challenges, followed by collaborative testing involving the ATLAS, CMS, and LHCb experiments. The goal was to measure the advantages of JUMBO frames in terms of efficiency and throughput while identifying and resolving any issues arising from their deployment. The study provides insights into the feasibility of JUMBO frames for large-scale scientific data transfers, aiming to optimize network performance for high-energy physics experiments.

        Speaker: Dr Maria Arsuaga Rios (CERN)
      • 12:00
        Refurbishing the Meyrin Data Centre: Storage Juggling and Operations 20m

        The 50-year-old Meyrin Data Centre (MDC), still remains indispensable due to its strategic geographical location and unique electrical power resilience even if CERN IT recently commissioned the Prรฉvessin Data Centre (PDC), doubling the organizationโ€™s hosting capacity in terms of electricity and cooling. The Meyrin Data Centre (Building 513) retains an essential role for the CERN Tier-0 Run 4 commitments, notably as primary hosting location for the tape archive and the disk storage. The inevitable investments to the infrastructure (UPS and Cooling) are now triggering the refurbishment of the two main rooms where all the storage equipment is hosted. This presentation will delve into the architectural advancements and operational strategies implemented for and during the Meyrin data centre refurbishment. We will explore how these developments will impact our storage and how the storage operations team will ensure EOSโ€™s performance, scalability, and reliability in the coming years.

        Speaker: Octavian-Mihai Matei (CERN)
    • 14:00 15:30
      Site Reports: EOS 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
      • 14:00
        EOS for Physics at CERN: Operational Insights, Achievements, and Future Directions 25m

        This work presents an overview of the EOS operations at CERN, focusing on its role in supporting physics data processing and storage. EOS is a high-performance distributed storage system designed to handle the vast volumes of scientific data generated by CERN experiments. This study examines key performance metrics, recent achievements, and strategic objectives for the current year, emphasizing improvements in efficiency, reliability, and scalability. Special attention is given to the impact of EOS on physics workflows, ensuring seamless data access and analysis. By evaluating past accomplishments and future goals, this work highlights the continuous evolution of EOS to meet the growing demands of physics research at CERN.

        Speaker: Dr Maria Arsuaga Rios (CERN)
      • 14:25
        EOS Status at IHEP 20m

        In this talk, we want to share our experiences of EOS at IHEP, including migration from CentOS 7 to Almalinux 9, construction of Alice EOS, and dual-site deployment of LHCb T1 EOS.

        Speaker: Dr Yujiang BI (Institute of High Energy Physics, Chinese Academy of Sciences)
      • 14:45
        EOS site report of the Joint Research Centre 20m

        The Joint Research Centre (JRC) of the European Commission is running the Big Data Analytics Platform (BDAP) to enable the JRC projects and scientists to store, process, and analyze a wide range and large amount of data, and to share and disseminate data products.

        EOS is the main system of BDAP for storing scientific data. The BDAP services are actively used by more than 100 JRC projects, covering a wide range of data analytics activities. The EOS instance at JRC has been implemented in 2016 and has currently a gross capacity of 43 PB. It is composed of heterogeneous commodity hardware components which has been extended noticeably over time.

        The talk will present the EOS service at JRC as storage back-end of the Big Data Analytics Platform. The presentation covers the EOS setup, configuration and current status. It describes the activities over the last year, presents experiences made and issues discovered, and gives an outlook of planned activities during 2025.

        Speaker: Armin Burger
      • 15:05
        Planning an EOS Data Federation to deal with Climate Change using AI 20m

        The National Institute for Space Research - INPE (Brazil) is leading a research program: Intelligent Early Warning System for Climate Extremes - SIPEC. The project aims at predicting the likelihood of climate extremes, months in advance using a diverse source of data coming from satellites and an array of intelligent sensors spread across the country. Such data streams will feed both classical meteorological models and AI machine learning algorithms for the ultimate early warning of climate extremes.

        Given the number of institutions producing large amounts of data needed to train the ML algorithms by scientists dealing with different parts of the problem, at different places, we are implementing an EOS Data Federation in Brazil. The implementation of the EOS family of tools, in addition to being capable to deal with large volumes of distributed data, also takes care of security controls for who has access to what portions of the datasets.

        Speakers: Dr Paulo Nobre (INPE), Wanderley Mendes (INPE)
    • 15:30 15:50
      Coffee Break 20m 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
    • 15:50 17:05
      Site Reports 40/S2-D01 - Salle Dirac

      40/S2-D01 - Salle Dirac

      CERN

      115
      Show room on map
      • 15:50
        Cloud-Native EOS Deployment for ATLAS T2 on Kubernetes 20m

        I will discuss our Kubernetes-based EOS deployment as it approaches production readiness for our ATLAS T2 site, as well as evaluation of EOS for several astronomy projects.

        Speaker: Ryan Taylor (University of Victoria (CA))
      • 16:10
        CERNBox and EOSHPM status update 20m

        CERNBox and EOS HOME/PROJECT(/MEDIA) operational issues seen in 2024 and expected in 2025.

        Speakers: Jan Iven (CERN), Diogo Castro (CERN)
      • 16:30
        You still have those QDB backups, right? (Practical example of disaster recovery of EOS deployment) 20m

        In December of 2024 the EOS cluster at Purdue University suffered a security incident which wiped out all metadata of our production deployment. In this brief talk we will give a step-by-step example of what it takes to recover from such setback, and discuss the best backup practices.

        Speaker: Stefan Piperov (Purdue University (US))