3rd Rucio Community Workshop

US/Central
One West (WH1W) (Fermilab)

One West (WH1W)

Fermilab

Description

Rucio is a software framework that provides functionality to organize, manage, and access large volumes of scientific data using customisable policies. The data can be spread across globally distributed locations and across heterogeneous data centers, uniting different storage and network technologies as a single federated entity. Rucio offers advanced features such as distributed data recovery or adaptive replication, and is highly scalable, modular, and extensible. Rucio has been originally developed to meet the requirements of the high-energy physics experiment ATLAS, and is continuously extended to support LHC experiments and other diverse scientific communities.

Important Travel Advisory:
In case you were planning to come in person to Fermilab, please take note of Fermilabs travel restrictions due to the Coronavirus (COVID-19) situation.

In particular, we’d like to ask you to connect remotely if you are traveling from, transiting through or have visited in the last 14 days, the following countries:

  • China
  • Iran
  • South Korea
  • Italy
  • Japan

We sincerely apologize for the inconvenience this will cause.
In case of questions feel free to contact the organizers at rucio-workshop-2020-loc@cern.ch.

For the 3nd Rucio Community Workshop we will meet at the Fermilab LPC (LHC Physics Center), close to Chicago, USA. The Workshop will be devoted to the information exchange between the Rucio developers, service administrators and the various interested communities in order to collect feedback and requirements.

On Monday March 9th there will be a WLCG DOMA Face to Face meeting at Fermilab.

On Thursday/Friday March 12-13, directly after the workshop, there will be a Rucio Coding Camp for Rucio developers and people interested in Rucio development. 

We set up a mailing list to which you can subscribe and where we will send more details about the program in the coming weeks.

We also created a Slack channel dedicated to the workshop discussion on the Rucio Slack workspace (Invitation Link). Join #workshop.

All participants must follow Fermilab's code of conduct:

http://directorate-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=174

We would really appreciate if you could fill the following participants feedback form whether you attended this workshop live or offline, it will help us improve the LPC events offering:

https://forms.gle/hVdTVFTPzSk3stYK8

Participants
  • Akanksha Ahuja
  • Andres Moya Ignatov
  • Andrew Bohdan Hanushevsky
  • Andrew Malone Melo
  • Andrew Norman
  • Anil Panta
  • Arfath Pasha
  • Aristeidis Fkiaras
  • Armin Nairz
  • Arturo Sanchez Pineda
  • Bo Jayatilaka
  • Brandon White
  • Brian Yanny
  • Carlos Fernando Gamboa
  • Cedric Serfon
  • Chih-Hao Huang
  • David Schultz
  • Dimitrios Christidis
  • Dmitry Litvintsev
  • Donata Mielaikaite
  • Doug Benjamin
  • Douglas Tucker
  • Edward Karavakis
  • Eli Chadwick
  • Elizabeth Sexton-Kennedy
  • Eric Neilsen
  • Eric Vaandering
  • Evan Shockley
  • Federica Legger
  • Gabriele Benelli
  • Gabriele Gaetano Fronze'
  • Greg Daues
  • Hironori Ito
  • Hugo Alberto Becerril Gonzalez
  • Jennifer Kathryn Adelman-Mc Carthy
  • Joaquin BOGADO
  • Judith Stephen
  • Julien Leduc
  • Justas Balcas
  • Katy Ellis
  • Kenneth Herner
  • Kevin Pedro
  • Kevin Retzke
  • Liang Zhang
  • Lorena Lobato Pardavila
  • Marc Weinberg
  • Marguerite Belt Tonjes
  • Maria Acosta Flechas
  • Mario Lassnig
  • Markus Elsing
  • Martin Barisits
  • Matthew Snyder
  • Michel Villanueva
  • Michelle Butler
  • Michelle Gower
  • Muhammad Aleem Sarwar
  • Nick Smith
  • Oscar Fernando Garzon Miguez
  • Panos Paparrigopoulos
  • Paschalis Paschos
  • Paul James Laycock
  • Rafael Arturo Rocha Vidaurri
  • Robert Illingworth
  • Robert William Gardner Jr
  • Rohini Joshi
  • Sahar Allam
  • Shreyas Bhat
  • Siarhei Padolski
  • Simone Campana
  • Stefan Piperov
  • Stefano Belforte
  • Steven Timm
  • Thomas Beermann
  • Tobias Wegner
  • Tomas Javurek
  • Vincent Garonne
  • Vivek Nigam
  • Wilko Kroeger
  • Yujun Wu
  • Yuyi Guo
    • 09:00 10:30
      Welcome & Introduction One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Gabriele Benelli (Brown University (US))
    • 10:30 10:35
      Group Photo 5m
    • 10:35 11:00
      Break 25m
    • 11:00 12:30
      Community reports One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Bo Jayatilaka (Fermi National Accelerator Lab. (US))
      • 11:00
        ATLAS (Remote) 20m
        Speakers: David Michael South (Deutsches Elektronen-Synchrotron (DE)), Mario Lassnig (CERN)
      • 11:20
        ESCAPE Project (Remote) 20m

        The ESCAPE European Union funded project aims at integrating facilities of astronomy, astroparticle and particle physics into a single collaborative cluster or data lake. The data requirements of such data lake are in the exabyte scale and the data should follow the FAIR principles (Findable, Accessible, Interoperable, Re-usable). To fulfill those requirements significant RnD is foreseen with regards to data orchestration, management and access. To set up the ESCAPE data lake, Rucio will be used as a reference implementation. We are therefore contributing to the Rucio development, integration and commissioning effort, particularly for the functionalities needed by the ESCAPE partners.

        Speaker: Aristeidis Fkiaras (CERN)
      • 11:40
        DUNE Data Management Experience with Rucio 20m

        The DUNE collaboration has been using Rucio since 2018 to transport data to our many European remote storage elements. We currently have 13.8 PB of data under Rucio management at 13 remote storage elements. We present our experience thus far, as well as our future plans to make Rucio our sole file location catalog.
        We will present our planned data discovery system, and the role of Rucio in the data ingest system and data delivery of files to jobs. We will describe the associated metadata service which is in development. Finally we will describe some of the unique challenges of configuring Rucio to the tape-backed dCache/Enstore disk store at Fermilab.

        Speaker: Steven Timm (Fermi National Accelerator Lab. (US))
      • 12:00
        Using Rucio for LCLS (Remote) 20m

        We will describe our plans for using RUCIO within the data management system at the Linac Coherent Light Source (LCLS) at SLAC. An overview of the LCLS data management system will be presented and what role RUCIO will play for cataloging, distributing and archiving of the data files. We are still in the testing phase but plan to use RUCIO in production within the next few month.

        Speaker: Wilko Kroeger (SLAC National Accelerator Laboratory)
    • 12:30 13:30
      Lunch break 1h
    • 13:30 14:40
      Technical Discussions: Operations & Deployment One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Dr Kenneth Richard Herner (Fermi National Accelerator Laboratory (US))
      • 13:30
        Kubernetes & Rucio (Remote) 15m
        Speaker: Thomas Beermann (Bergische Universitaet Wuppertal (DE))
      • 13:45
        Tales From ATLAS DDM Operations (Remote) 15m
        Speaker: Dimitrios Christidis (University of Texas at Arlington (US))
      • 14:00
        CRIC: Computing Resource Information Catalogue as a topology system for computing infrastructures and an interface for effortless Rucio configuration (Remote) 20m

        CRIC is a high-level information system which provides flexible, reliable and complete topology and configuration description for a large scale distributed heterogeneous computing infrastructure. CRIC aims to facilitate distributed computing operations for HEP experiments and consolidate WLCG topology information. Being a topology framework, CRIC offers a generic solution with out of the box interfaces, APIs, authentication and authorisation mechanisms, advanced logging and much more. Every community, small or big, can take advantage of CRIC’s capabilities. In close collaboration with the Rucio team, CRIC can provide interfaces to configure Rucio and tie this configuration with the actual topology of the computing infrastructure of any Rucio user. Configuring RSEs, running on top of the same physical storage, through CRIC can drastically minimise the number of attributes that need to be filled by Rucio operators. The complex transfer matrix between all the RSEs can be bootstrapped and maintained through a simple table and all the information regarding Users and permissions can be organised through CRIC’s A&A system and propagated into Rucio.
        The contribution describes the overall CRIC architecture, the new lightweight-CRIC standalone service that can be easily installed and how with minimum effort one can fully exploit Rucio’s capabilities using the CRIC framework.

        Speaker: Panos Paparrigopoulos (CERN)
      • 14:20
        Discussion 20m
    • 14:40 15:30
      Community reports One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Nick Smith (Fermi National Accelerator Lab. (US))
      • 14:40
        Belle II (Remote) 20m
        Speaker: Cedric Serfon (Brookhaven National Laboratory (US))
      • 15:00
        Data Management Needs in Cancer Research (Remote) 20m

        MSKCC's Computational Oncology group performs prospective and retrospective studies on a number of cancer types with a focus on cancer evolution. The data being collected and managed for research comes from many sources. Broadly, the data may be categorized into molecular, imaging and clinical data types. The studies tend to be cross-sectional and longitudinal. Users require heterogenous permissions to the data with varying levels of control. The storage and compute infrastructure is expected to span on-premise clusters, public and private clouds. This presentation will elaborate on the group's data management needs in comparison to Rucio's current feature set.

        Speaker: Arfath Pasha (MSKCC)
    • 09:00 09:45
      Keynote: ESnet: DOE's data circulatory system 45m One West (WH1W)

      One West (WH1W)

      Fermilab

      Speaker: Inder Monga (ESNet)
    • 09:45 10:30
      Community reports One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Robert Illingworth (Fermi National Accelerator Lab. (US))
      • 09:45
        Rucio at RAL and the UK 20m
        Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
      • 10:05
        CMS Transition to Rucio 20m

        An update on the CMS transition to Rucio, expected to be completed this year, will be given.

        Results of scale tests, data consistency work, and improvements in the kubernetes infrastructure will be the focus of this talk.

        Speaker: Eric Vaandering (Fermi National Accelerator Lab. (US))
    • 10:30 11:00
      Break 30m
    • 11:00 11:20
      Community reports One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Eric Vaandering (Fermi National Accelerator Lab. (US))
      • 11:00
        EGI data management requirements and plans (Remote) 20m

        The Data Management requirements coming from the EGI and EOSC-Hub user communities have pictured Rucio (together with a Data transfer engine) as one of the possible solutions for their needs. Since the 2nd Rucio workshop a number of enhancements and new developments (in primis the support for OIDC and the kubernetes deployment improvements) have been implemented and they are going towards the direction of an easier integration of Rucio in the EGI and EOSC environment. We would like in this talk to highlight the progress done and the desired missing functionalities/integration activities.

        Speaker: Andrea Manzi
    • 11:20 12:30
      Technical Discussions: WFMS One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Eric Vaandering (Fermi National Accelerator Lab. (US))
      • 11:20
        iDDS: A New Service with Intelligent Orchestration and Data Transformation and Delivery (Remote) 20m

        The Production and Analysis system (PanDA system) has continuously been evolving in order to cope with rapidly changing computing infrastructure and paradigm. The system is required to be more dynamic and proactive to integrate emerging workflows such as data carousel and active learning, in contrast to conventional HEP workflows such as Monte-Carlo simulation and data reprocessing.

        Intelligent Data Delivery Service (iDDS) is an experiment agnostic service to orchestrate workload management and data management systems, in order to transform and deliver data and let clients consume data in near real-time. iDDS has been actively developed by ATLAS and IRIS-HEP. iDDS has a modular structure to separate core functions and workflow-specific plugins to meet a diversity of requirements in various workflows, simplify the development and operation of new workflows, and provide a uniform monitoring view. The goal of iDDS is the seamless integration of new workflows as well as to address performance issues and suboptimal resource usage in existing workflows.

        This talk will report architecture overview of iDDS, orchestration of PanDA and Rucio for optimal storage usage in data carousel, dynamic task chaining in ATLAS production system with instant decision making for active learning, data streaming with on-demand marshaling to minimize data delivery from data ocean to analysis facilities and users, integration of iDDS with other workload management systems, and plans for the future.

        Speaker: Wen Guan (University of Wisconsin (US))
      • 11:40
        Rucio-DIRAC Integration at Belle II (Remote) 10m
        Speaker: Cedric Serfon (Brookhaven National Laboratory (US))
      • 11:50
        Discussion 20m
    • 12:30 13:30
      Lunch break 1h
    • 13:30 15:30
      Operational Intelligence meeting Racetrack (WH7XO)

      Racetrack (WH7XO)

      Fermilab

      Conveners: Federica Legger (Universita e INFN Torino (IT)), Panos Paparrigopoulos (CERN), Alessandro Di Girolamo (CERN)
      • 13:30
        Operational Intelligence - General Introduction 20m

        In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastructures. On the other hand, logging information from computing services and systems is being archived on ElasticSearch, Hadoop, and NoSQL data stores. Such a wealth of information can be exploited to increase the level of automation in computing operations by using adequate techniques, such as machine learning (ML), tailored to solve specific problems. The Operational Intelligence project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations. We discuss how state-of-the-art technologies can be used to build general solutions to common problems and to reduce the operational cost of the experiment computing infrastructure.

        Speaker: Federica Legger (Universita e INFN Torino (IT))
      • 13:50
        Automation of Rucio operations 25m
        Speaker: Dimitrios Christidis (University of Texas at Arlington (US))
      • 14:15
        Framework Design 25m

        This contribution describes how and why we decided to create the “OpInt Framework", what it offers and how we architected it. Last year we began the development of the "Rucio OpInt" project in order to optimise the operational effort and minimize human interventions in the distributed data management.
        When we brought "Rucio OpInt" to the Operational intelligence forum we realized that there were a lot of shared requirements with other projects and there was a need for the creation of a framework that hosts all those shared components. After researching the open source market and realizing there was not an out of the box solution we decided to architect our own solution which offers APIs, authentication, authorization, source data fetching mechanisms and machine learning pipelines to the whole OpInt community.

        Speaker: Panos Paparrigopoulos (CERN)
      • 14:40
        JobsBuster 25m

        Reliable automatization of the root cause analysis procedure is an essential prerequisite for the Operational Intelligence deployment. That kind of data processing is important as an input for the automatic decision making and has its own value as an instrument for offloading shifters operations. The order of magnitude of failing rate in distributed computing, for instance in ATLAS experiment, is the tenth thousand jobs a day. This is why manual problem identification requires sufficient efforts. We created a prototype of the system, which finds the least common denominator for the computational jobs failures called Jobs Buster. In this talk, we provide an overview of this system, its current status and development plans.

        Speaker: Siarhei Padolski (BNL)
      • 15:05
        Job outcome prediction with Google's AutoML Tables 25m
        Speakers: Kevin Michael Retzke (Fermi National Accelerator Lab. (US)), Shreyas Bhat
    • 13:30 15:30
      Technical Discussions: Storage One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Eric Vaandering (Fermi National Accelerator Lab. (US))
      • 13:30
        dCache QoS and Storage Events 20m

        dCache is highly scalable distributed storage system that is used to
        implement storage elements with and without tape back-ends.
        dCache is offering a comprehensive RESTFul data management interface
        that uses language of QoS states and transitions to steer the data
        life-cycle. This interface provides functionality inspired by the
        experiences of the LHC and other data intensive experiments. Additionaly,
        dCache provides storage events - a publish-subscribe notification
        subsystem which lends itself to greater scalability compared to
        polling dCache for data states. A data management system like Rucio
        can take advantage of these features to provide a more robust,
        efficient and scalable data delivery solution.

        Speaker: Dmitry Litvintsev (Fermi National Accelerator Lab. (US))
      • 13:50
        CERN Tape Archive status and plans (Remote) 20m

        CTA is designed to replace CASTOR as the CERN Tape Archive solution, in order to face scalability and performance challenges arriving with LHC Run-3.

        This presentation will focus on the current CTA deployment and will provide an up-to-date snapshot of CTA achievements.

        It will also cover the final Run3 CTA Service architecture and underlying hardware that have been deployed at the end of 2019.

        Speaker: Julien Leduc (CERN)
      • 14:10
        Connecting Xcache and RUCIO for User Analysis (Remote) 20m
        Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
      • 14:30
        Discussion 20m
    • 15:30 16:00
      Break 30m
    • 16:00 17:00
      CANCELLED: Fermilab Colloquium: The Science of LSST and the big data it will produce 1h One West (WH1W)

      One West (WH1W)

      Fermilab

      https://events.fnal.gov/colloquium/events/event/open-17/

    • 20:00 22:00
      Conference dinner One West (WH1W)

      One West (WH1W)

      Fermilab

    • 09:00 10:30
      Community reports One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Andrew John Norman (Fermi National Accelerator Lab. (US))
      • 09:00
        IceCube 20m
        Speaker: David Schultz (University of Wisconsin-Madison)
      • 09:20
        Data management for the XENON Collaboration with Rucio 20m

        The search for Dark Matter in the XENON experiment at the LNGS laboratory in Italy enters a new phase, XENONnT in 2020. Managed by the University of Chicago, Xenon's Rucio deployment plays a central role in the data management between the collaboration's end points. In preparation for the new phase, there have been notable upgrades in components of the production and analysis pipeline and they way they interface with Rucio services as such the inclusion of processed data for distribution between the different RSEs. We will describe some of changes in the pipeline and focus in discussing the aDMiX wrapper which calls the rucio API directly to ingest data into Rucio from the DAQ.

        Speaker: Paschalis Paschos (University of Chicago)
      • 09:40
        Rucio at BNL and NSLS2 (Remote) 20m

        Rucio has evolved as a distributed data management system to be used by scientific communities beyond High Energy Physics. This includes disengaging its core code from a specific file transfer tool. In this talk I will discuss using Globus Online as a file transfer tool with Rucio, the current state of testing and the possibilities for the future in light of NSLSII's data ecosystem

        Speaker: Matthew Snyder (Brookhaven National Laboratory)
      • 10:00
        Rucio @ LIGO-Virgo-KAGRA (Remote) 25m
        Speaker: Mr Gabriele Gaetano Fronze' (University e INFN Torino (IT), Subatech Nantes (FR))
    • 10:30 11:00
      Break 30m
    • 11:00 12:20
      Technical Discussions: Rucio One West (WH1W)

      One West (WH1W)

      Fermilab

      Convener: Kevin Pedro (Fermi National Accelerator Lab. (US))
      • 11:00
        FTS news and plans (Remote) 20m

        The File Transfer Service (FTS) is distributing the majority of the LHC data across the WLCG infrastructure and, in 2019, it has transferred more than 800 million files and a total of 0.95 exabyte of data. It is used by more than 28 experiments at CERN and in other data-intensive sciences outside of the LHC and even the High Energy Physics domain.

        The FTS team has been very active in performing several significant performance improvements to its core to prepare for the LHC Run-3 data challenges, supporting the new CERN Tape Archival (CTA) system which has been stress tested by the ATLAS Data Carousel activity, supporting a more user-friendly authentication and delegation method using tokens and supporting the Third Party Copy WLCG DOMA activity. This talk will cover all these developments.

        Speaker: Edward Karavakis (CERN)
      • 11:20
        Multi-VO Rucio 20m
        Speaker: Eli Benjamin Chadwick (Science and Technology Facilities Council STFC (GB))
      • 11:40
        Rucio Roadmap (Remote) 20m
        Speaker: Martin Barisits (CERN)
      • 12:00
        Discussion 20m
    • 12:20 12:45
      Conclusion & Summary One West (WH1W)

      One West (WH1W)

      Fermilab

    • 12:45 14:00
      Lunch break 1h 15m