EOS workshop

Name: EOS workshop
Start: 2022-03-07T08:00:00+01:00
End: 2022-03-10T21:00:00+01:00
Location: CERN

7–10 Mar 2022

CERN

Europe/Zurich timezone

There is a live webcast for this event.

Contribution List

1. Introduction

Andreas Joachim Peters (CERN)

07/03/2022, 09:00

EOS Core Development

10 Minutes

EOS 1

36. EOS 5 highlights and functionality consolidation

Elvin Alin Sindrilaru (CERN)

07/03/2022, 09:15

EOS Core Development

10 Minutes

EOS 1

41. EOS service @CERN 2022

Dr Maria Arsuaga Rios (CERN)

07/03/2022, 09:35

EOS Operations

EOS 1

General description of the EOS service @CERN

51. High-capacity, high-throughput EOS storage for ALICE data taking

Latchezar Betev (CERN)

07/03/2022, 09:55

Erasure Encoding

10 Minutes

EOS 1

The ALICE detector and data acquisition system was substantially upgraded for Run3 and beyond. One of the main elements of the upgrade was the O2 processing cluster, which compresses the detector data in real time. The output of the compression is then written to EOS buffer for subsequent asynchronous data processing and archival. The requirements for the EOS storage are substantial: 120GB/sec...

16. XRootD5 landscape

Michal Kamil Simon (CERN)

07/03/2022, 10:35

EOS Core Development

15 Minutes

EOS 3

General update from XRootD project.

33. C++ Atomics: An Overview

Abhishek Lekshmanan

07/03/2022, 10:55

EOS Core Development

10 Minutes

EOS 3

std::atomic introduced since C++11 is used as a building block for lock free programming. However while the default flags provide the maximum consistency; the do come with a performance penalty and may not be what you want in all cases. We will look under the hood, at a top level on what the processor sees when an atomic is encountered, the acquire and release semantics, which are...

15. EOS monitoring of finished transfers

Dr Jaroslav Guenther (CERN)

07/03/2022, 11:05

EOS Core Development

15 Minutes

EOS 3

Improving EOS monitoring of finished transfers. Hands-on eos io stat output.

3. Prometheus EOS exporter

Aritz Brosa Iartza (CERN)

07/03/2022, 11:20

EOS Operations

10 Minutes

EOS 3

Prometheus is a modern, simple and scalable monitoring system with an easy to use query language based in labels. EOS Operators team has developed a fully-functional EOS Prometheus exporter in Golang to monitor all EOS metrics. This includes space, group, node, filesystem, I/O and namespace stats collectors. In this talk, the tool will be showcased and made available to the EOS Community.

17. Record and Replay

Michal Kamil Simon (CERN)

07/03/2022, 11:30

EOS Core Development

15 Minutes

EOS 3

Presentation on the new recording plug-in that allows I/O sampling and the replay tool.

29. Benchmarking TBits/s

Andreas Joachim Peters (CERN)

07/03/2022, 11:45

Erasure Encoding

15 Minutes

EOS 3

With 100GE technology and erasure coding we discovered new bottlenecks and challenges. This presentation will recap the state of the art of the ALICEO2 EOS instance and show benchmarks including a real and and replayed physics analysis use case.

56. ScienceBox 2.0: From EOS Storage to Jupyter notebooks in Kubernetes

Enrico Bocchi (CERN)

07/03/2022, 12:05

Sites and Deployments

15 Minutes

EOS 3

This contribution reports on the recent revamping of ScienceBox: The container-based stack for science with EOS, CERNBox, and SWAN services for Kubernetes-orchestrated clusters.
ScienceBox has been rebuilt from its foundations using modern cloud-native technologies for better service configuration and improved reliability, without compromising on deployment flexibility. Rethinking the whole...

42. LHC Data Storage: RUN 3 Data Taking Commissioning

Dr Maria Arsuaga Rios (CERN)

07/03/2022, 15:45

EOS Operations

EOS 2

LHC Data Storage: RUN 3 Data Taking Commissioning

53. EOS site report Vienna

Erich Birngruber (Austrian Academy of Sciences (AT))

07/03/2022, 16:05

Sites and Deployments

EOS 2

Update on the setup and operations at the Vienna Tier-2 site.

52. EOS at the Fermilab LHC Physics Center

Dan Szkola (Fermi National Accelerator Lab. (US))

07/03/2022, 16:25

Sites and Deployments

10 Minutes

EOS 2

Fermilab has been running an EOS instance since testing began in June 2012. By May 2013, before becoming production storage, there was 600TB allocated for EOS. Today, there is approximately 13PB of storage available in the EOS instance.

An update of our current experiences and challenges running an EOS instance for use by the Fermilab LHC Physics Center (LPC) computing cluster. The LPC...

49. EOS deployment at Purdue

Stefan Piperov (Purdue University (US))

07/03/2022, 16:40

Sites and Deployments

10 Minutes

EOS 2

As part of its storage migration plan, the CMS Tier-2 center at Purdue University is preparing an EOS deployment of ~10PB, which will serve as the main Storage Element of the site, as well as a basis for the future Analysis Facility that’s in development at the moment. We adopted a fully containerized approach with Kubernetes, which allows us to better share available hardware resources...

43. EOS and Ceph integration with Kubernetes

Federico Fornari

07/03/2022, 16:55

Sites and Deployments

10 Minutes

EOS 2

Due to the increasing interest on data management services capable to cope with very large data resources, allowing the future e-infrastructures to address the needs of the next generation extreme scale scientific experiments, the national center of INFN (Italian Institute for Nuclear Physics) dedicated to Research and Development on Information and Communication Technologies (CNAF) and the...

46. Data flowing on the Stream

Cristian Contescu (CERN)

07/03/2022, 17:10

EOS Operations

20 Minutes

EOS 2

In this talk we will highlight the operational challenges we faced while bringing up a high-throughput EOS instance for the Run 3 ALICE data acquisition. The journey started in 2020 and we are still perfecting the instance to this day.
During this time all storage nodes got migrated from CentOS 7 to CentOS 8 and, later on, CentOS Stream 8, and not without inherent challenges which we are...

35. WLCG tokens integration and support in EOS

Elvin Alin Sindrilaru (CERN)

07/03/2022, 17:30

EOS Operations

10 Minutes

EOS 2

38. Operation status of Custodial Disk Storage for the ALICE experiment

Sang Un Ahn (Korea Institute of Science & Technology Information (KR))

08/03/2022, 09:00

EOS Operations

10 Minutes

EOS 1

This is going to be a brief presentation regarding the operation status of Custodial Disk Storage (CDS) system provided for the ALICE experiment as a Tape. The CDS system is basically using EOS with its erasure coding implementation (RAIN) for the data protection. The CDS joined the WLCG Tape Challenges in the previous year and about a PB of data has been transferred from the experiment. A...

12. EOS deployment at GRIF

Dr Emmanouil Vamvakopoulos (Université Paris-Saclay (FR))

08/03/2022, 09:15

Sites and Deployments

15 Minutes

EOS 1

In this communication, we are going to present the deployment project of the EOS storage software solution at the GRIF site. GRIF is a distributed site made of four (4) different subsites, in different locations of the Paris region. The worst network latency between the subsites is within 2-4 msec with 3 of them connected with a 100G connection. The objective is to consolidate the four (4)...

10. EOS site report of the Joint Research Centre

Armin Burger (JRC)

08/03/2022, 09:35

Sites and Deployments

15 Minutes

EOS 1

The Joint Research Centre (JRC) of the European Commission is running the Big Data Analytics Platform (BDAP) to enable the JRC projects to process and analyze a wide range of data, providing knowledge and insights in support of EU policy making.

EOS is the main storage system of the BDAP for scientific data. It is in use at JRC since 2016. The gross capacity of 20 PB is currently in the...

32. EOS GroupBalancer improvements

Abhishek Lekshmanan

08/03/2022, 09:55

EOS Core Development

15 Minutes

EOS 1

This is a talk introducing the GroupBalancer and what it does. We also cover about the current in place GroupBalancer improvements introduced from 4.8.78 release, the ways to configure this for deployments, some figures from existing deployements and what the roadmap for the future holds with these functionalities.

14. EOS migration tools

Dr Jaroslav Guenther (CERN)

08/03/2022, 10:15

EOS Operations

15 Minutes

EOS 1

Migrating the AMS experiment data from EOSPUBLIC to EOSAMS02 stimulated development of tools which might be useful in general for similar exercises in the future. We will show the work in progress.

27. Direct IO, IO priority and Bandwidth Policies in EOS

Andreas Joachim Peters (CERN)

08/03/2022, 10:55

EOS Core Development

10 Minutes

EOS 3

In preparation for Run-3 we have faced the following problem: we have to balance the usage of IO resources between individual activities, which has led to the implementation of IO priorities and bandwidth regulation policies. While commissioning the ALICEO2 EOS instance we have observed, that write performance using the buffer cache is a bottleneck on storage nodes. Direct IO helps to improve...

25. Encryption and Obfuscation Support in EOS

Andreas Joachim Peters (CERN)

08/03/2022, 11:10

EOS Core Development

10 Minutes

EOS 3

With XRootD5 the on the wire protocol provides confidentiality of data inside the transport layer. However data files are human readable on storage nodes and can be accessed and downloaded by any EOS administrator and any person with read access. Filesystem level encryption on storage nodes does not solve this confidentiality problem.
To provide better data privacy the most recent versions...

30. Taming Batch Access to EOS at CERN

Andreas Joachim Peters (CERN)

08/03/2022, 11:25

EOS Core Development

10 Minutes

EOS 3

Physics and CERNBOX instances at CERN are exposed to O(4) mount clients simultaneously. Overloads from batch access is not a new thing - since years the AFS filesystem suffers more or less frequently volume overloads. During overload episodes meta-data access at the MGM slows down significantly because thousands of batch nodes compete against few interactive clients and sync & share access. To...

19. xrdcp primer

Michal Kamil Simon (CERN)

08/03/2022, 11:35

Sites and Deployments

10 Minutes

EOS 3

A primer on xrdcp new (and old) features like zip append, metalling support, retries and many more.

44. EOS Windows client productisation

Gregor Molan (Comtrade 360's AI Lab)

08/03/2022, 11:45

EOS Operations

15 Minutes

EOS 3

Context: Productisation of Windows native connection of EOS to Windows operating system.

Objectives: The professional implementation of the EOS with the Windows platform should allow seamless usage of EOS as a Windows local disk with all the EOS benefits, as it is low latency, high throughput, and high reliability.

Method: Implementation of the EOS client for the Windows...

57. EOS Durability Summary

Manuel Reis (Universidade de Lisboa (PT))

08/03/2022, 12:00

EOS Operations

10 Minutes

EOS 3

EOS durability machinery is a set of (operator's) scripts, tools and EOS components to classify, monitor and repair unhealthy files. EOS filesystem check (fsck) was enabled in 2021, but one should keep track of the instances' state, and investigate root causes for the problems found.

50. CERNBox: today and tomorrow

Hugo Gonzalez Labrador (CERN)

08/03/2022, 15:30

CERNBOX

15 Minutes

CERNBOX

CERNBox is key enabler service built on top of EOS for users at CERN and beyond. The service is used by more than 37K users and stores over 15PB of data, representing all the user communities at the laboratory.

In this talk we will explain the current status of the service, the challenges we faced in 2021 and our vision for the future: CERNBox as the gateway for a federation of...

4. EOS for CERNBox Report

Roberto Valverde Cameselle (CERN)

08/03/2022, 15:50

CERNBOX

10 Minutes

CERNBOX

EOS provides the backend to CERNBox, the cloud sync and share service implementation used at CERN. EOS for CERNBox is storing 12PB of user and project space data across 9 different instances running in multi-fst configuration. This presentation will give an overview of 2021 challenges, how we tried to address them and talk about the roadmap for the service for 2022.

5. CERNBox backup evolution

Gianmaria Del Monte (CERN)

08/03/2022, 16:05

CERNBOX

15 Minutes

CERNBOX

More than 300 million CERNBox files are processed daily using cback backup tool, which ensures that files are safely stored in a different geographical area and using a different storage backend. The backup tool has not stop evolving and was extended to support CephFS mount backup along with EOS mounts under the same infrastructure. This talk will present the current status of the project...

6. Converging Storage Layers with Virtual CephFS Drives for EOS/CERNBox

Roberto Valverde Cameselle (CERN)

08/03/2022, 16:20

CERNBOX

15 Minutes

CERNBOX

The CERNBox service is currently backed by 13PB of EOS storage distributed across more than 3,000 drives. EOS has proven to be a reliable and highly performing backend throughout. On the other hand, the CERN Storage Group also operates CephFS, which has been previously evaluated in combination with EOS as a potential solution for large scale physics data taking [1]. This work seeks to further...

26. Share ACLs and EGroup-Ownership in EOS

Andreas Joachim Peters (CERN)

08/03/2022, 16:35

CERNBOX

10 Minutes

CERNBOX

To consolidate the concept of sharing implemented inside EOS for any access protocol we are currently adding a new type of ACL which defines a 'share'. One of the new characteristics of a share ACL is that they are not influenced by POSIX or classic ACLs. We support additional ACL capabilities as 'can share'.
A second important new concept is the concept of ownership by an EGROUP. Ownership...

2. EOS log aggregation with Grafana Loki.

Sami Mohamed Chebbi (CERN)

08/03/2022, 16:50

EOS Operations

10 Minutes

CERNBOX

EOS provides a very detailed log system which provides useful information of all the user and system operations that are performed at any time. Each EOS daemon has its own log file and tracing operations that involve different components can be a time consuming task (MGM -> FST1 -> FST2). With Grafana Loki and Promtail, we setup a logging aggregation system that allows tracing operations...

40. Samba: service evolution and experience with bind mounts

Aritz Brosa Iartza (CERN)

08/03/2022, 17:05

10 Minutes

CERNBOX

In this talk we present the evolution of the CERNBox Samba service that we operate in front of EOS. An important recent change is the adoption of a new layout based on bind mounts: this allows to operate a smaller number of EOS mounts and to enable federating multiple EOS instances in a single namespace. We will discuss further measures adopted to address the ever increasing load from the...

58. Authentication Logic on /eos

Andreas Joachim Peters (CERN)

08/03/2022, 17:15

CERNBOX

10 Minutes

CERNBOX

Understanding the configuration and logic used by eosxd on /eos/ is not straight forward in particular in containerized environments. This short presentation tries to explain the basics.

47. Enabling lightweight and federated accounts access in CERNBox

Ishank Arora (CERN)

08/03/2022, 17:20

CERNBOX

10 Minutes

CERNBOX

Access to CERNBox via social account providers and external emails provides a highly scalable and traceable mechanism to allow sharing of data and knowledge with people external to CERN, and encourage collaboration across boundaries and institutes. In this talk, we'll talk about how we adapted our service to accommodate such accounts with restricted scopes and describe the developments that...

48. Managing locks in CERNBox and EOS

Giuseppe Lo Presti (CERN)

08/03/2022, 17:35

CERNBOX

10 Minutes

CERNBOX

This contribution illustrates how we have evolved file locking in CERNBox and EOS. Initially introduced to support Office online applications, the functionality has been extended to be an integral part of Reva, the engine powering CERNBox. We will describe the implementation in the EOS storage system, and the foreseen extensions to cover Linux file locks (flocks) as supported for FUSE and...

21. The CTA project, team and community

Oliver Keeble (CERN)

09/03/2022, 08:55

CTA

10 Minutes

CTA 1

Introduction to the CTA session.

8. CTA at AARNet

Mr Denis Lujanski Not Supplied

09/03/2022, 09:05

CTA

15 Minutes

CTA 1

In this presentation, we will report on how we at AARNet deployed CTA along with restic backup client as a backup/ archive solution for our production EOS clusters. The solution has been in production since late 2021. This presentation will aim to cover why we chose CTA, how CTA is deployed, and how it is integrated into our backup workflow.

45. EOS and CTA Status at IHEP

Yujiang Bi (Institute of High Energy Physics, Chinese Academy of Sciences)

09/03/2022, 09:20

CTA

15 Minutes

CTA 1

EOS is now the main Storage System for IHEP experiments like LHAASO and JUNO. And Castor has been used for backup experiment data for a long time at IHEP, and has difficulty to satisfiy data backup requirement of new experiments like LHAASO, JUNO. As EOSCTA became stable to replace Castor in production, we started EOSCTA evaluation and the castor migration. In this talk, we will give a brief...

13. CTA Status and Roadmap

Michael Davis (CERN)

09/03/2022, 09:35

CTA

15 Minutes

CTA 1

CTA entered into production at CERN in 2020 and physics data taking into CTA started in July 2021. 2022 will see the start of LHC Run-3, with combined experiment data rates up to 40 GB/s. This presentation will give an overview of CTA's preparation and readiness for the upcoming Run, as well as a look forward to software features in the development pipeline.

55. How to enable EOS for tape

Julien Leduc (CERN)

09/03/2022, 09:55

20 Minutes

CTA 1

An EOSCTA instance is an EOS instance commonly called a tape buffer configured with a CERN Tape Archive (CTA) back-end.
This EOS instance is entirely bandwidth oriented: it offers an SSD based tape interconnection, it can contain spinning disks if needed and it is optimized for the various tape workflows.
This talk will present how to enable EOS for tape using CTA and the Swiss horology...

23. Configuring user access control in CTA

Volodymyr Yurchenko (CERN)

09/03/2022, 10:35

CTA

CTA 1

CTA uses access mechanism provided by EOS and adds tape-specific layer. If one of these elements is misconfigured, a user won't be able to read a file, or, on the contrary, unauthorized access can be granted.

This talk explains how the combination of the ACL, Unix permissions and mount rules works in CTA. We show which tools we use for the permissions management and what are capabilities...

22. Tape Drive Status Lifecycle

Jorge Camarero Vera (CERN)

09/03/2022, 10:50

CTA

10 Minutes

CTA 1

Explanation of the CTA Tape Drive status during a data transfer session.

24. EOSCTA file restoring

Miguel Barros (Universidade de Lisboa (PT))

09/03/2022, 11:05

CTA

10 Minutes

CTA 1

This talk sumarizes the new file restoring feature of CTA, how it works, how to configure it, when it should be used and it's current limitations.

20. Maintaining consistency in an EOSCTA system

Richard Bachmann (CERN)

09/03/2022, 11:20

CTA

15 Minutes

CTA 1

This presentation summarizes the current effort to detect, and therebye subsequenly remedy, inconsistencies in the file metadata stored on EOS and CTA.
We show how we combine and validate EOSCTA namespaces in order to produce a summary of healthy files for experiments and a troubleshooting tool for operators.

9. Evaluation of CTA for use at Fermilab

Ren Bauer (Fermi National Accelerator Lab. (US))

09/03/2022, 16:00

CTA

CTA 2

Fermilab is the primary research lab dedicated to particle physics in the United States and also is home to the largest archival HEP data store outside of CERN. Fermilab currently employs a HSM based on Enstore, a Fermilab product, and dCache, for tape and disk, respectively. This Enstore+dCache HSM manages nearly 300 PB of active data on tape. Because of the necessary development work to...

34. An HTTP Rest API as SRM replacement for tape access

Cedric Caffy (CERN)

09/03/2022, 16:20

CTA

CTA 2

Imagine a world where SRM is no longer needed to dialog with tape storage systems. A world where only one standard protocol can be used across the entire WLCG to access tape storage systems.

This dream will soon become reality on EOS...

After several discussions about the specifications of the new WLCG tape REST API, a prototype of the final API has been developed in EOS.

In order to...

28. CTA at RAL

Dr George Patargias (STFC)

09/03/2022, 16:40

CTA

15 Minutes

CTA 2

This talk will present details of the deployment of Antares, the EOS-CTA service at RAL Tier-1, which replaces Castor.

7. dCache integration with CTA

Mr Tigran Mkrtchyan (DESY)

09/03/2022, 17:00

CTA

20 Minutes

CTA 2

The ever increasing amount of data that is produced by modern scientific facilities like EuXFEL or LHC puts a high pressure on the data management infrastructure at the laboratories. This includes poorly shareable resources of archival storage, typically, tape libraries. To achieve maximal efficiency of the available tape resources a deep integration between hardware and software components...

39. CTA tape format support : BoF discussion

Michael Davis (CERN)

09/03/2022, 17:20

CTA

CTA 2

CTA uses the same tape format as CASTOR. There is interest from the community in adding support to read (but not write) tapes in alternate formats, such as OSM and Enstore. The main use case is to allow sites to migrate from their existing tape storage system to CTA without needing to physically repack all of their tapes.

This BoF session will be a round-table for stakeholders with an...

18. Native XRootD EC @ SLAC

Michal Kamil Simon (CERN)

10/03/2022, 09:00

Erasure Encoding

EOS 1

Report on the latest tests done at SLAC with the native XRootD EC library.

37. EOS and XCache data access performance for LHC analysis at CERN

Dr Andrea Sciabà (CERN)

10/03/2022, 09:20

Sites and Deployments

20 Minutes

EOS 1

Physics analysis is done at CERN in several different ways, using both interactive and batch resources and EOS for data storage. In order to understand if and how the CERN computer centre should change the way analysis is supported for Run3, we performed several performance studies on two fronts: measuring the performance and utilisation levels of EOS with respect to the current analysis...

31. EOS 5 during Run-3 Roadmap

Andreas Joachim Peters (CERN)

10/03/2022, 09:45

EOS Core Development

10 Minutes

EOS 1

This presentation will introduce the roadmap for EOS5 during the Run-3 period.

54. Community Feedback & Open Discussion

10/03/2022, 10:05

Community Feedback & Open Discussion

Choose timezone

EOS workshop