3rd Rucio Community Workshop

Name: 3rd Rucio Community Workshop
Start: 2020-03-10T09:00:00-05:00
End: 2020-03-12T15:00:00-05:00
Location: Fermilab

10 Mar 2020, 09:00 → 12 Mar 2020, 15:00 US/Central

One West (WH1W) (Fermilab)

One West (WH1W)

Fermilab

Description

Rucio is a software framework that provides functionality to organize, manage, and access large volumes of scientific data using customisable policies. The data can be spread across globally distributed locations and across heterogeneous data centers, uniting different storage and network technologies as a single federated entity. Rucio offers advanced features such as distributed data recovery or adaptive replication, and is highly scalable, modular, and extensible. Rucio has been originally developed to meet the requirements of the high-energy physics experiment ATLAS, and is continuously extended to support LHC experiments and other diverse scientific communities.

Important Travel Advisory:
In case you were planning to come in person to Fermilab, please take note of Fermilabs travel restrictions due to the Coronavirus (COVID-19) situation.

In particular, we’d like to ask you to connect remotely if you are traveling from, transiting through or have visited in the last 14 days, the following countries:

China
Iran
South Korea
Italy
Japan

We sincerely apologize for the inconvenience this will cause.
In case of questions feel free to contact the organizers at rucio-workshop-2020-loc@cern.ch.

For the 3nd Rucio Community Workshop we will meet at the Fermilab LPC (LHC Physics Center), close to Chicago, USA. The Workshop will be devoted to the information exchange between the Rucio developers, service administrators and the various interested communities in order to collect feedback and requirements.

On Monday March 9th there will be a WLCG DOMA Face to Face meeting at Fermilab.

On Thursday/Friday March 12-13, directly after the workshop, there will be a Rucio Coding Camp for Rucio developers and people interested in Rucio development.

We set up a mailing list to which you can subscribe and where we will send more details about the program in the coming weeks.

We also created a Slack channel dedicated to the workshop discussion on the Rucio Slack workspace (Invitation Link). Join #workshop.

All participants must follow Fermilab's code of conduct:

http://directorate-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=174

We would really appreciate if you could fill the following participants feedback form whether you attended this workshop live or offline, it will help us improve the LPC events offering:

https://forms.gle/hVdTVFTPzSk3stYK8

Contact

rucio-workshop-2020-pc@cern.ch

rucio-workshop-2020-loc@cern.ch

Participants

80 View full list

Tuesday 10 March
- Welcome & Introduction One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Gabriele Benelli (Brown University (US))
  - 1
    
    Welcome to Fermilab
    
    Speaker: Elizabeth Sexton-Kennedy (Fermi National Accelerator Lab. (US))
  - 2
    
    Logistics
    
    Speaker: Gabriele Benelli (Brown University (US))
    
    RucioCommunityWorkshopLogistics_v3.pdf
  - 3
    
    Introduction
    
    Speaker: Martin Barisits (CERN)
    
    Rucio-Introduction.pdf
  - 4
    
    Keynote I: Quantum Computing at Fermilab
    
    Speaker: Dr Adam Lyon (Fermilab)
    
    QuantumComputing_Rucio_workshop.pdf
- 10:30
  
  Group Photo
- 10:35
  
  Break
- Community reports One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Bo Jayatilaka (Fermi National Accelerator Lab. (US))
  - 5
    
    ATLAS (Remote)
    
    Speakers: David Michael South (Deutsches Elektronen-Synchrotron (DE)), Mario Lassnig (CERN)
    
    RucioAndATLAS2020.pdf
  - 6
    
    ESCAPE Project (Remote)
    
    The ESCAPE European Union funded project aims at integrating facilities of astronomy, astroparticle and particle physics into a single collaborative cluster or data lake. The data requirements of such data lake are in the exabyte scale and the data should follow the FAIR principles (Findable, Accessible, Interoperable, Re-usable). To fulfill those requirements significant RnD is foreseen with regards to data orchestration, management and access. To set up the ESCAPE data lake, Rucio will be used as a reference implementation. We are therefore contributing to the Rucio development, integration and commissioning effort, particularly for the functionalities needed by the ESCAPE partners.
    
    Speaker: Aristeidis Fkiaras (CERN)
    
    ESCAPE Project Use Case (2).pdf
  - 7
    
    DUNE Data Management Experience with Rucio
    
    The DUNE collaboration has been using Rucio since 2018 to transport data to our many European remote storage elements. We currently have 13.8 PB of data under Rucio management at 13 remote storage elements. We present our experience thus far, as well as our future plans to make Rucio our sole file location catalog.
    We will present our planned data discovery system, and the role of Rucio in the data ingest system and data delivery of files to jobs. We will describe the associated metadata service which is in development. Finally we will describe some of the unique challenges of configuring Rucio to the tape-backed dCache/Enstore disk store at Fermilab.
    
    Speaker: Steven Timm (Fermi National Accelerator Lab. (US))
    
    Rucio_workshop_2020.pptx
  - 8
    
    Using Rucio for LCLS (Remote)
    
    We will describe our plans for using RUCIO within the data management system at the Linac Coherent Light Source (LCLS) at SLAC. An overview of the LCLS data management system will be presented and what role RUCIO will play for cataloging, distributing and archiving of the data files. We are still in the testing phase but plan to use RUCIO in production within the next few month.
    
    Speaker: Wilko Kroeger (SLAC National Accelerator Laboratory)
    
    LCLS_3rd_RucioWorkshop.pdf
- 12:30
  
  Lunch break
- Technical Discussions: Operations & Deployment One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Dr Kenneth Richard Herner (Fermi National Accelerator Laboratory (US))
  - 9
    
    Kubernetes & Rucio (Remote)
    
    Speaker: Thomas Beermann (Bergische Universitaet Wuppertal (DE))
    
    Kubernetes and Rucio.pdf
  - 10
    
    Tales From ATLAS DDM Operations (Remote)
    
    Speaker: Dimitrios Christidis (University of Texas at Arlington (US))
    
    2020.03.10 Tales From ATLAS DDM Operations.pdf
  - 11
    
    CRIC: Computing Resource Information Catalogue as a topology system for computing infrastructures and an interface for effortless Rucio configuration (Remote)
    
    CRIC is a high-level information system which provides flexible, reliable and complete topology and configuration description for a large scale distributed heterogeneous computing infrastructure. CRIC aims to facilitate distributed computing operations for HEP experiments and consolidate WLCG topology information. Being a topology framework, CRIC offers a generic solution with out of the box interfaces, APIs, authentication and authorisation mechanisms, advanced logging and much more. Every community, small or big, can take advantage of CRIC’s capabilities. In close collaboration with the Rucio team, CRIC can provide interfaces to configure Rucio and tie this configuration with the actual topology of the computing infrastructure of any Rucio user. Configuring RSEs, running on top of the same physical storage, through CRIC can drastically minimise the number of attributes that need to be filled by Rucio operators. The complex transfer matrix between all the RSEs can be bootstrapped and maintained through a simple table and all the information regarding Users and permissions can be organised through CRIC’s A&A system and propagated into Rucio.
    The contribution describes the overall CRIC architecture, the new lightweight-CRIC standalone service that can be easily installed and how with minimum effort one can fully exploit Rucio’s capabilities using the CRIC framework.
    
    Speaker: Panos Paparrigopoulos (CERN)
    
    CRIC - Rucio Workshop (5).pdf
  - 12
    
    Discussion
- Community reports One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Nick Smith (Fermi National Accelerator Lab. (US))
  - 13
    
    Belle II (Remote)
    
    Speaker: Cedric Serfon (Brookhaven National Laboratory (US))
    
    2020-03-09 - Belle2 Distributed Data Management.pdf
  - 14
    
    Data Management Needs in Cancer Research (Remote)
    
    MSKCC's Computational Oncology group performs prospective and retrospective studies on a number of cancer types with a focus on cancer evolution. The data being collected and managed for research comes from many sources. Broadly, the data may be categorized into molecular, imaging and clinical data types. The studies tend to be cross-sectional and longitudinal. Users require heterogenous permissions to the data with varying levels of control. The storage and compute infrastructure is expected to span on-premise clusters, public and private clouds. This presentation will elaborate on the group's data management needs in comparison to Rucio's current feature set.
    
    Speaker: Arfath Pasha (MSKCC)
    
    MSK_cancer_research_needs.pdf
Wednesday 11 March
- 15
  
  Keynote: ESnet: DOE's data circulatory system One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Speaker: Inder Monga (ESNet)
  
  ESnet Keynote Rucio Workshop Chicago 2020.pptx
- Community reports One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Robert Illingworth (Fermi National Accelerator Lab. (US))
  - 16
    
    Rucio at RAL and the UK
    
    Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
    
    UKupdate_RucioWorkshopMar2020.pdf
  - 17
    
    CMS Transition to Rucio
    
    An update on the CMS transition to Rucio, expected to be completed this year, will be given.
    
    Results of scale tests, data consistency work, and improvements in the kubernetes infrastructure will be the focus of this talk.
    
    Speaker: Eric Vaandering (Fermi National Accelerator Lab. (US))
    
    Rucio Workshop 2020.pdf
- 10:30
  
  Break
- Community reports One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Eric Vaandering (Fermi National Accelerator Lab. (US))
  - 18
    
    EGI data management requirements and plans (Remote)
    
    The Data Management requirements coming from the EGI and EOSC-Hub user communities have pictured Rucio (together with a Data transfer engine) as one of the possible solutions for their needs. Since the 2nd Rucio workshop a number of enhancements and new developments (in primis the support for OIDC and the kubernetes deployment improvements) have been implemented and they are going towards the direction of an easier integration of Rucio in the EGI and EOSC environment. We would like in this talk to highlight the progress done and the desired missing functionalities/integration activities.
    
    Speaker: Andrea Manzi
    
    EGI - Rucio Workshop 2020.pdf
    
    EGI - Rucio Workshop 2020.pptx
- Technical Discussions: WFMS One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Eric Vaandering (Fermi National Accelerator Lab. (US))
  - 19
    
    iDDS: A New Service with Intelligent Orchestration and Data Transformation and Delivery (Remote)
    
    The Production and Analysis system (PanDA system) has continuously been evolving in order to cope with rapidly changing computing infrastructure and paradigm. The system is required to be more dynamic and proactive to integrate emerging workflows such as data carousel and active learning, in contrast to conventional HEP workflows such as Monte-Carlo simulation and data reprocessing.
    
    Intelligent Data Delivery Service (iDDS) is an experiment agnostic service to orchestrate workload management and data management systems, in order to transform and deliver data and let clients consume data in near real-time. iDDS has been actively developed by ATLAS and IRIS-HEP. iDDS has a modular structure to separate core functions and workflow-specific plugins to meet a diversity of requirements in various workflows, simplify the development and operation of new workflows, and provide a uniform monitoring view. The goal of iDDS is the seamless integration of new workflows as well as to address performance issues and suboptimal resource usage in existing workflows.
    
    This talk will report architecture overview of iDDS, orchestration of PanDA and Rucio for optimal storage usage in data carousel, dynamic task chaining in ATLAS production system with instant decision making for active learning, data streaming with on-demand marshaling to minimize data delivery from data ocean to analysis facilities and users, integration of iDDS with other workload management systems, and plans for the future.
    
    Speaker: Wen Guan (University of Wisconsin (US))
    
    idds_20200311_rucio_workshop(2).pdf
  - 20
    
    Rucio-DIRAC Integration at Belle II (Remote)
    
    Speaker: Cedric Serfon (Brookhaven National Laboratory (US))
    
    2020-03-10 - Rucio-DIRAC Integration at Belle II.pdf
  - 21
    
    Discussion
- 12:30
  
  Lunch break
- Operational Intelligence meeting Racetrack (WH7XO)
  
  Racetrack (WH7XO)
  
  Fermilab
  
  Conveners: Federica Legger (Universita e INFN Torino (IT)), Panos Paparrigopoulos (CERN), Alessandro Di Girolamo (CERN)
  - 22
    
    Operational Intelligence - General Introduction
    
    In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastructures. On the other hand, logging information from computing services and systems is being archived on ElasticSearch, Hadoop, and NoSQL data stores. Such a wealth of information can be exploited to increase the level of automation in computing operations by using adequate techniques, such as machine learning (ML), tailored to solve specific problems. The Operational Intelligence project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations. We discuss how state-of-the-art technologies can be used to build general solutions to common problems and to reduce the operational cost of the experiment computing infrastructure.
    
    Speaker: Federica Legger (Universita e INFN Torino (IT))
    
    Rucio OpInt - 110320.pdf
  - 23
    
    Automation of Rucio operations
    
    Speaker: Dimitrios Christidis (University of Texas at Arlington (US))
    
    2020.03.11 Automation of Rucio Operations.pdf
  - 24
    
    Framework Design
    
    This contribution describes how and why we decided to create the “OpInt Framework", what it offers and how we architected it. Last year we began the development of the "Rucio OpInt" project in order to optimise the operational effort and minimize human interventions in the distributed data management.
    When we brought "Rucio OpInt" to the Operational intelligence forum we realized that there were a lot of shared requirements with other projects and there was a need for the creation of a framework that hosts all those shared components. After researching the open source market and realizing there was not an out of the box solution we decided to architect our own solution which offers APIs, authentication, authorization, source data fetching mechanisms and machine learning pipelines to the whole OpInt community.
    
    Speaker: Panos Paparrigopoulos (CERN)
    
    OpInt framework - FNAL Workshop (1).pdf
  - 25
    
    JobsBuster
    
    Reliable automatization of the root cause analysis procedure is an essential prerequisite for the Operational Intelligence deployment. That kind of data processing is important as an input for the automatic decision making and has its own value as an instrument for offloading shifters operations. The order of magnitude of failing rate in distributed computing, for instance in ATLAS experiment, is the tenth thousand jobs a day. This is why manual problem identification requires sufficient efforts. We created a prototype of the system, which finds the least common denominator for the computational jobs failures called Jobs Buster. In this talk, we provide an overview of this system, its current status and development plans.
    
    Speaker: Siarhei Padolski (BNL)
    
    RucioWorkshopJB.pdf
  - 26
    
    Job outcome prediction with Google's AutoML Tables
    
    Speakers: Kevin Michael Retzke (Fermi National Accelerator Lab. (US)), Shreyas Bhat
    
    Job Outcome Prediction with Google's AutoML Tables.pdf
- Technical Discussions: Storage One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Eric Vaandering (Fermi National Accelerator Lab. (US))
  - 27
    
    dCache QoS and Storage Events
    
    dCache is highly scalable distributed storage system that is used to
    implement storage elements with and without tape back-ends.
    dCache is offering a comprehensive RESTFul data management interface
    that uses language of QoS states and transitions to steer the data
    life-cycle. This interface provides functionality inspired by the
    experiences of the LHC and other data intensive experiments. Additionaly,
    dCache provides storage events - a publish-subscribe notification
    subsystem which lends itself to greater scalability compared to
    polling dCache for data states. A data management system like Rucio
    can take advantage of these features to provide a more robust,
    efficient and scalable data delivery solution.
    
    Speaker: Dmitry Litvintsev (Fermi National Accelerator Lab. (US))
    
    dCacheQos.pdf
    
    dCacheQos.pptx
  - 28
    
    CERN Tape Archive status and plans (Remote)
    
    CTA is designed to replace CASTOR as the CERN Tape Archive solution, in order to face scalability and performance challenges arriving with LHC Run-3.
    
    This presentation will focus on the current CTA deployment and will provide an up-to-date snapshot of CTA achievements.
    
    It will also cover the final Run3 CTA Service architecture and underlying hardware that have been deployed at the end of 2019.
    
    Speaker: Julien Leduc (CERN)
    
    200311_rucioWS_CTA_status_and_plans.pdf
  - 29
    
    Connecting Xcache and RUCIO for User Analysis (Remote)
    
    Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
    
    Rucio and Xcache for User Analysis
    
    Rucio and Xcache for User Analysis.pdf
  - 30
    
    Discussion
- 15:30
  
  Break
- 31
  
  CANCELLED: Fermilab Colloquium: The Science of LSST and the big data it will produce One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  https://events.fnal.gov/colloquium/events/event/open-17/
  
  https://events.fnal.gov/colloquium/events/event/open-17/
- Conference dinner One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Details
Thursday 12 March
- Community reports One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Andrew John Norman (Fermi National Accelerator Lab. (US))
  - 32
    
    IceCube
    
    Speaker: David Schultz (University of Wisconsin-Madison)
    
    Rucio and IceCube v1.pdf
  - 33
    
    Data management for the XENON Collaboration with Rucio
    
    The search for Dark Matter in the XENON experiment at the LNGS laboratory in Italy enters a new phase, XENONnT in 2020. Managed by the University of Chicago, Xenon's Rucio deployment plays a central role in the data management between the collaboration's end points. In preparation for the new phase, there have been notable upgrades in components of the production and analysis pipeline and they way they interface with Rucio services as such the inclusion of processed data for distribution between the different RSEs. We will describe some of changes in the pipeline and focus in discussing the aDMiX wrapper which calls the rucio API directly to ingest data into Rucio from the DAQ.
    
    Speaker: Paschalis Paschos (University of Chicago)
    
    Rucio_Workshop_Xenon.pdf
  - 34
    
    Rucio at BNL and NSLS2 (Remote)
    
    Rucio has evolved as a distributed data management system to be used by scientific communities beyond High Energy Physics. This includes disengaging its core code from a specific file transfer tool. In this talk I will discuss using Globus Online as a file transfer tool with Rucio, the current state of testing and the possibilities for the future in light of NSLSII's data ecosystem
    
    Speaker: Matthew Snyder (Brookhaven National Laboratory)
    
    rucio-workshop-2020.odp
    
    rucio-workshop-2020.pdf
  - 35
    
    Rucio @ LIGO-Virgo-KAGRA (Remote)
    
    Speaker: Mr Gabriele Gaetano Fronze' (University e INFN Torino (IT), Subatech Nantes (FR))
    
    Slides
- 10:30
  
  Break
- Technical Discussions: Rucio One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Convener: Kevin Pedro (Fermi National Accelerator Lab. (US))
  - 36
    
    FTS news and plans (Remote)
    
    The File Transfer Service (FTS) is distributing the majority of the LHC data across the WLCG infrastructure and, in 2019, it has transferred more than 800 million files and a total of 0.95 exabyte of data. It is used by more than 28 experiments at CERN and in other data-intensive sciences outside of the LHC and even the High Energy Physics domain.
    
    The FTS team has been very active in performing several significant performance improvements to its core to prepare for the LHC Run-3 data challenges, supporting the new CERN Tape Archival (CTA) system which has been stress tested by the ATLAS Data Carousel activity, supporting a more user-friendly authentication and delegation method using tokens and supporting the Third Party Copy WLCG DOMA activity. This talk will cover all these developments.
    
    Speaker: Edward Karavakis (CERN)
    
    FTS_RucioWorkshop.odp
    
    FTS_RucioWorkshop.pdf
  - 37
    
    Multi-VO Rucio
    
    Speaker: Eli Benjamin Chadwick (Science and Technology Facilities Council STFC (GB))
    
    Multi-VO Rucio.pdf
  - 38
    
    Rucio Roadmap (Remote)
    
    Speaker: Martin Barisits (CERN)
    
    Rucio Internals.pdf
  - 39
    
    Discussion
- Conclusion & Summary One West (WH1W)
  
  One West (WH1W)
  
  Fermilab
  
  Rucio Workshop Summary.pdf
- 12:45
  
  Lunch break

Choose timezone

3rd Rucio Community Workshop

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

Racetrack (WH7XO)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab

One West (WH1W)

Fermilab