5th Rucio Community Workshop

Europe/London
Private Dining Room, County South Building (Lancaster University, UK)

Private Dining Room, County South Building

Lancaster University, UK

Description

Rucio is a software framework that provides functionality to organize, manage, and access large volumes of scientific data using customisable policies. The data can be spread across globally distributed locations and across heterogeneous data centers, uniting different storage and network technologies as a single federated entity. Rucio offers advanced features such as distributed data recovery or adaptive replication, and is highly scalable, modular, and extensible. Rucio has been originally developed to meet the requirements of the high-energy physics experiment ATLAS, and is continuously extended to support LHC experiments and other diverse scientific communities.

For this 5th edition of our workshop we will meet at Lancaster University, UK. The workshop is co-located with the Worldwide LHC Computing GRID (WLCG) Workshop (Nov 7-9).
 

We have created a mailing list, to which you can subscribe, where we will send more details about the program in the coming weeks.


There is also a Slack channel dedicated to the workshop discussion on the Rucio Slack workspace (Invitation Link). Join #workshop.

Follow us on Twitter @RucioData

 

  • Thursday, 10 November
    • 08:30 09:00
      Registration 30m Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

    • 09:00 10:00
      Welcome and Introduction Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

      Welcome and introduction to the workshop, along with State of the Union of Rucio

    • 10:00 10:30
      Coffee Break 30m Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

    • 10:30 12:30
      Community Talks Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

      Talks from experiments and organizations that are actively using Rucio

      Convener: Michael Kirby (Fermi National Accelerator Laboratory)
      • 10:30
        Championing Scientific Data Management: Rucio and the ESCAPE project experience 20m

        The computing needs of Research Infrastructures is rapidly evolving, glimpsing a challenging future involving experiments, facilities and service providers. The ESCAPE project acted as an anchor for diverse scientific communities to address next generation distributed computing needs, including Data Management and Data Access. ESCAPE provided a fully working framework (Data Lake) fostering cross-fertilisation and knowledge transfer to address arising needs in DOMA. Rucio as a High level Data Management System is at the core of the framework, being evaluated and adopted broadly within the RIs involved in ESCAPE. This talk will provide a summary of the project, its goals and achievements and a forward look to the future.

        Speaker: Xavier Espinal (CERN)
      • 10:50
        The Rucio Experience: ATLAS & CMS 20m
        Speakers: Eric Vaandering (Fermi National Accelerator Lab. (US)), Mario Lassnig (CERN)
      • 11:10
        Rucio and the interTwin project (Remote) 20m

        InterTwin is an EU-funded project that has only just started (September 2022). The project will work with domain experts in building the technology to support the emerging field of scientific digital twins. The project will develop, deploy and “road harden” a blueprint for an infrastructure to support a diverse set of science use-cases, in the domains of radio telescopes (Meerkat), particle physics (CERN/LHC and Lattice-QCD), gravitational waves (Einstein telescope), and climate research and environment monitoring (e.g., predicting flooding and other extreme weather from climate change).

        By adopting the DataLake model from ESCAPE, interTwin data management will be based on Rucio. Some of the challenges will involve gaining compatibility with HPC facilities (e.g., EuroHPC) and with a related project Destination Earth, with a rather different approach to data management. In addition, new science use-cases may bring new ways of approaching data-management.

        In this talk I will give an overview of the interTwin project and how we hope Rucio will help solve the different science use-cases.

        Speaker: Paul Millar
      • 11:30
        On the intended use of Rucio for LSST 20m

        The Vera C. Rubin observatory is preparing the execution of the most ambitious astronomical survey ever attempted, the Legacy Survey of Space and Time. Due to start operations late 2024 for 10 years, it will nightly scan the southern sky and collect images of the entire visible sky every 4 nights. Detection of celestial objects will be performed on those images to progressively build an astronomical catalog composed of 20 billion galaxies and 17 billion stars and their associated physical properties.

        In this contribution we will present how we envisage using Rucio and FTS to drive continuous push-based replication of raw and derived data among the 3 Rubin data facilities (one in the US and two in Europe) where images are to be stored and processed. We will also address the ongoing development work to integrate Rucio and the LSST-specific data management tools.

        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 11:50
        Fermilab Rucio Operations 20m

        In this talk I will discuss Fermilab's operational situation with regards to the deployment and support of Rucio for DUNE, ICARUS, and Rubin Observatory operations. The current state of the deployments as well as future development plans will be explained, while elucidating the experiment needs for the service going forward.

        Speaker: Brandon White
    • 12:30 13:30
      Lunch Break 1h Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

    • 13:30 15:15
      Technology Sessions: Tech Stack Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

      Discussions of specific technologies and implementations within Rucio.

      Convener: Martin Barisits (CERN)
      • 13:30
        Rucio & Cloud Storage 15m
        Speaker: Mario Lassnig (CERN)
      • 13:45
        CERN Tape Archive status and plans 15m

        CTA is the CERN Tape Archive solution in production for LHC Run-3.

        This presentation will focus on the current CTA production deployment and will provide an up-to-date snapshot of CTA achievements since the start of Run-3.

        It will also cover production deployment plans.

        Speaker: Mr Julien Leduc (CERN)
      • 14:00
        Database Overview 15m
        Speaker: Cedric Serfon (Brookhaven National Laboratory (US))
      • 14:15
        The New Rucio WebUI (Remote) 15m

        The good old Rucio WebUI has served our community well over the last several years. However, it lacks the architecture, frameworks, and technologies to sustain it over the next decade. That's where the efforts to revamp the old WebUI step in. This talk discusses the plan, progress, and what the users can expect from the new Rucio WebUI.

        Speaker: Mayank Sharma (University of Texas at Arlington (US))
      • 14:30
        Multi-VO Rucio 15m

        Multi-Vo Rucio allows a single instance of Rucio serve several VOs. Each VO is separated into its own namespace, providing complete separation between the VOs data, and each VO with their own customisability with policy packages.

        Work is underway to ensure Multi-VO capabilities for the new WebUI, and integration with DIRAC workflow management software.
        The Multi-VO instance of Rucio is also in a transition phase moving it to a Kubernetes deployment.

        Speaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
      • 14:45
        Study of the integration of Rucio with DIRAC (Remote) 15m

        This document presents the work done during the Escape’s project by the CTA team in order to evaluate the integration of the ESCAPE’s Data Lake with the Workload Manager (DIRAC).
        The work was distributed across differents teams:
        -PIC in Barcelona was in charge of providing the Data Lake and storage infrastructure
        -DIRAC’s team has set up the DIRAC instance in CC-IN2P3
        -LAPP was in charge of testing the integration by running processing’s jobs of CTA’s raw data.

        The integration of DIRAC and Rucio is done using the interface “Rucio catalog”. The implementation of this interface has been, mostly, done for the Belle2 experiment. As part of this study, we highlight which methods are missing from the implementation to meet CTA’s requirements.

        Speaker: Frederic Gillardo (Centre National de la Recherche Scientifique (FR))
      • 15:00
        Discussion 15m
    • 15:15 15:45
      Coffee Break 30m Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

    • 15:45 17:15
      Technology Sessions: Discussion of Metadata Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

      Discussions of specific technologies and implementations within Rucio.

      Convener: Rob Barnsley
      • 15:45
        SIG Metadata update 15m
        Speaker: Rob Barnsley
      • 16:00
        IVOA and beyond 15m
        1. A brief overview of what the International Virtual Observatory (IVOA) is and what it has acheived.
        2. Describing work in progress looking at adapting IVOA services to include metadata from Rucio.
        3. Exploring opportunities for the future, using the IVOA architecture as a model for other domains.
        Speaker: David Morris
      • 16:15
        MetaCat - Metadata Catalog for Rucio-based Data Management Systems (Remote) 15m

        Metadata management is one of three major areas of scientific
        data management along with replica management and workflow
        management. Metadata is the information describing the data stored in a data
        item such as a file or an object. It includes the data item provenance, recording
        conditions, format and other attributes. MetaCat is a metadata management
        database designed and developed for High Energy Physics experiments.
        As Rucio is becoming a popular product to be used as the replica management
        component, MetaCat was desinged to be conceptually compatible with Rucio and
        to be able to work with Rucio as the replica management component.

        Main objectives of MetaCat are:

        • Provide a flexible mechanism to store and manage file dataset metadata of arbitrary complexity
        • Provide a mechanism to retrieve the metadata for a file or a dataset
        • Efficiently query the metadata database for files or datasets matching user
          defined criteria expressed in terms of the metadata
        • Provide a transparent mechanism to access external metadata sources
          to logically incorporate the external metadata into queries without copying them

        One of the MetaCat features is MQL - metadata query language developed specifically
        for this application. The article will discuss the architecture, functionality and
        features of MetaCat as well as the current status of the project.

        Speaker: Igor Mandrichenko (Fermi National Accelerator Lab. (US))
      • 16:30
        Metadata tests in Belle II 15m
        Speaker: Cedric Serfon (Brookhaven National Laboratory (US))
      • 16:45
        Discussion 30m
    • 17:15 18:15
      Technology Sessions: Discussion of Tokens Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

      Discussions of specific technologies and implementations within Rucio.

      Convener: Cedric Serfon (Brookhaven National Laboratory (US))
      • 17:15
        DUNE Rucio Deployment and Plans for the Token Era (Remote) 10m

        The DUNE Distributed Data management currently has 16 Rucio Storage Elements spread around the world, all currently using X.509 authentication. We have a local physics groups storage deployed at Fermilab that is using token authentication. We will present our current schema for file permissions at distributed storage elements, and how we expect this to change in the era of tokens. We will also mention some details of the US-based CILogon token issuer as well as the US software timeline, and how these considerations can and should impact the testing plan for Rucio and FTS3.

        Speakers: Doug Benjamin (Brookhaven National Laboratory (US)), Steven Timm (Fermi National Accelerator Lab. (US))
      • 17:25
        Rucio Token plans 20m
        Speakers: Martin Barisits (CERN), Dimitrios Christidis (CERN)
      • 17:45
        Summary from WLCG AAI session (Remote) 10m
        Speaker: Maarten Litmaath (CERN)
      • 17:55
        Discussion 20m
    • 19:30 22:00
      Workshop Dinner 2h 30m

      The Borough, Lancaster
      https://www.theboroughlancaster.co.uk/

  • Friday, 11 November
    • 09:00 10:30
      Technology Sessions: Data Transfers Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

      Discussions of specific technologies and implementations within Rucio.

      Convener: Mario Lassnig (CERN)
      • 09:00
        Automated Network Services for Exascale Data Movement 15m

        As distributed scientific collaborations reach the Exabyte scale, moving data becomes more challenging and thus, the capacity of managing priorities among the different transfer requests over the network becomes a necessity.
        Using network services to build bandwidth guaranteed paths and fixed routes can be used to prioritize transfer requests.
        We have built a system that creates such priority paths on demand triggered by the insertion of rules in Rucio.

        Speaker: Diego Davila Foyo (Univ. of California San Diego (US))
      • 09:15
        ALTO/TCN: Rucio/FTS Control with Deeper Network Visibility (Remote) 15m
        Speaker: Y. Richard Yang
      • 09:30
        Network Packet Marking and Flow Labeling: the Technical Details 15m

        Analyzing the HEP traffic flows in detail is critical for understanding how the various complex systems developed by the LHC experiments are actually using the network. In this talk we will describe the work of the Research Networking Technical Working Group in the area of packet marking and flow labeling. The goal is to enable identification of any research and education network traffic anywhere along its path. We will describe our current status, the technical details about marking and labeling and our near term planned activities.

        Speaker: Shawn Mc Kee (University of Michigan (US))
      • 09:45
        Transfers: the new bearings of the Conveyor 15m

        The transfer subsystem of Rucio, also known as 'conveyor', was heavily reworked in the past year. This talk starts with a high level overview of Rucio transfers, followed by a deep dive into architectural changes and improvements implemented in recent Rucio releases.

        Speaker: Radu Carpa (CERN)
      • 10:00
        FTS 2023: Plans and direction 15m

        FTS presentation about the current state of affairs and planned changes for 2023 and after.

        Speaker: Mihai Patrascoiu (CERN)
      • 10:15
        Discussion 15m
    • 10:30 11:00
      Coffee Break 30m Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

    • 11:00 12:30
      Community Talks: Coommunity Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

      Talks from experiments and organizations that are actively using Rucio

      Convener: Eric Vaandering (Fermi National Accelerator Lab. (US))
      • 11:00
        SKA Regional Centre Data Management Update 18m

        The SKA Observatory will be the largest radio telescope ever built, and will produce science data products at a rate of approximately 2PB per day. Our user community will therefore access and process this data through a distributed network of data centres (SKA Regional Centres, or SRCs), currently being developed across the SKAO member countries. We are currently investigating the use of Rucio by SRCs to manage distributed data. We will provide a brief update on our experiences with tokens, metadata, and our next steps.

        Speaker: Rohini Joshi
      • 11:18
        Belle II 18m
        Speaker: Cedric Serfon (Brookhaven National Laboratory (US))
      • 11:36
        Rucio framework in the Bulk Data Management System for the CTA Archive (Remote) 18m

        In this talk we are going to describe the operational and conceptual design of the bulk archive management system involved in prototyping activities of the Cherenkov Telescope Array Observatory (CTAO) Bulk Archive. This particular archive in the CTA Observatory will take care of storage and management of the lower data level products coming from the Cherenkov telescopes, incuding their cameras, auxiliary subsystems and simulations. Scientific raw data produced from the two CTAO telescope sites, one in the Northern hemisphere and the second in the Southern, will be transferred to four off-site data centers where they will be accessed and automatically reduced to higher level data products. This Archive system will provide a set of tools based on the OAIS (Open Archive) standards, including a data transfer system, a general and replicated catalog to be queried, an easy interface to retrieve and access data as well as a customized and versatile data organization depending on the user requirements. With our partners we identified RUCIO as the best framework candidate to provide a basis for such features and archival services to meet according to the CTA Observatory Requirements.

        Speaker: Dr Georgios Zacharis (INAF/OAR/CTA-BDMS)
      • 11:54
        Rucio & dCache as a multi-site solution for the Science Data Centre hosting inhomogeneous solar data 18m

        The Science Data Centre (SDC) is a new strategic infrastructure at the Leibniz-Institut für Sonnenphysik (KIS) in Freiburg, Germany, for archiving and disseminating raw and calibrated ground-based high-resolution multiwavelength spectropolarimetric and imaging data obtained primarily at the German Solar Telescopes in Tenerife, Spain, and soon from other observatories. Additionally, SDC develops data analysis tools and generates high-level science-ready products, e.g. multi-dimensional sets of physical and statistical parameters characterising the solar atmosphere, such as vector magnetic fields, doppler velocities, temperature, and their evolution. Being produced by a wide variety of different instruments, solar data is very diverse, inhomogeneous, and metadata-heavy.

        We are transitioning our original single-site in-house solution based on MongoDB and GridFS into a multi-site solution based on Rucio and dCache; the challenges include:

        • Implementing a suitable naming scheme for existing and future data (including versioning) within Rucio's flat namespace,
        • mapping our tailored instrument-dependent data grouping into observations to Rucio containers,
        • tying MongoDB, which we still use for organising and storing metadata and volatile data, to Rucio, and
        • implementing site- and access-method independent embargoes using x509 authentication and dCache.

        The SDC concept based on Rucio and dCache also serves as a prototype and testbed for the upcoming ESFRI 4m-class European Solar Telescope (EST).

        Speaker: Peter Caligari (Leibniz-Institut fuer Sonnenphysik (KIS))
      • 12:12
        RUCIO service for Gamma-ray astronomy projects at PIC (Remote) 18m

        PIC (Port d'Informació Científica) is a Data Center based in Barcelona managed and funded by a public consortium of two spanish institutes CIEMAT and IFAE. It is mainly dedicated to support Tier-1 activities for the WLCG experiments. Other activities in disciplines like Cosmology groups in EUCLID mission, PAUs Survey or the Gamma-ray astronomy supporting the MAGIC Data Center since 2009 and more recently CTA activities to operate and support the CTA Large Size Telescopes and the CTA-Observatory computing infrastructures.

        In the context of the ESCAPE H2020 project, PIC has been involved in the work package 2 to test and deploy solutions to build a Data Lake prototype for this European ESFRIs collaboration such as WLCG, CTA, SKA, Km3Net or Jive among others. Our main contributions were focussed to develop solutions for CTA use cases for long haul data transfers from the observatory to the off-sites data centers using RUCIO. Additionally in collaboration with In2P3 we implemented an integration of RUCIO and DIRAC.

        We have developed a dedicated infrastructure for test and production purposes deployed on a Kubernetes platform taking advantage of the CI/CD pipelines, designed for gitlab. This has been applied in the context of MAGIC and CTA taking into account the needs, the data volumes, the data organization of each project separately.

        The results and the learning process have been very successful. From one side, we completed a full development of a new version of the MAGIC Data Transfer system based on RUCIO, including the dedicated infrastructure, the client scripts, the monitoring dashboard or the users interface to provide high level information of the data transfer system. On the other hand, we have successfully fulfilled CTA requirements to implement data transfers from the on-site computing facilities to the off-site data centers, implementing rules for data replication and a detailed monitoring of the transfers.

        This contribution aims to expose the work we did in the context of the ESCAPE project, the obtained results, and the outcomes to a post-ESCAPE escenario where we can expand the usage of RUCIO to a multi-project environment.

        Speaker: Jordi Delgado
    • 12:30 13:30
      Lunch 1h Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

    • 13:30 14:30
      Discussion and Closing: Closing Private Dining Room, County South Building

      Private Dining Room, County South Building

      Lancaster University, UK

      Discussion and Closing of workshop