21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Name: 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)
Start: 2015-04-13T09:00:00+09:00
End: 2015-04-17T16:00:00+09:00
Location: OIST

13–17 Apr 2015

OIST

Asia/Tokyo timezone

Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio

Not scheduled

15m

OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495

poster presentation Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing

Dr Mario Lassnig (CERN)

This contribution details the deployment of Rucio, the ATLAS Distributed Data Management system. The main complication is that Rucio interacts with a wide variety of external services, and connects globally distributed data centres under different technological and administrative control, at an unprecedented data volume. It is therefore not possibly to create a duplicate instance of Rucio for testing or integration. Every software upgrade or configuration change is thus potentially disruptive and requires fail-safe software and automatic error recovery. Rucio uses a three-layer scaling and mitigation strategy based on quasi-realtime monitoring. This strategy mainly employs independent stateless services, automatic failover, and service migration. The technologies used for deployment and mitigation include OpenStack, Puppet, Graphite, HAProxy, Apache, and nginx. In this contribution, the reasons and design decisions for the deployment, the actual implementation, and an evaluation of all involved services and components are discussed.

Dr Mario Lassnig (CERN) Ralph Vigne (University of Vienna (AT))

Cedric Serfon (CERN) Martin Barisits (CERN) Thomas Beermann (Bergische Universitaet Wuppertal (DE)) Vincent Garonne (CERN)

Poster

Scalable_and_fail-safe_deployment_of_Rucio.pdf

21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio

OIST

Speaker

Description

Authors

Co-authors

Presentation materials