The 4th EOS workshop is in preparation to bring together the EOS community.
This three-day event at CERN is organized to provide a platform for exchange between developers, users and sites running EOS.
The first day of the workshop takes place in the IT amphitheater covering a wide range of topics. An overview from the community about their usage of EOS services in their production environment, followed by EOS related development, update on the tape backend functionalities with the CERN Tape Archive (CTA), service operations, applications, collaborations and various use-cases!
During the second day we will bring interested participants to discover some of the amazing CERN facilities with a special visit, followed by a dedicated session on the usage of EOS in Tier-1 and Tier-2 storage setups and the next challenges to use EOS in online/offline activities of the experiments .
The third day will focus on operational aspects, demonstrations, hands-on tutorials with a deep-dive and the future roadmap & service evolution. The practical sessions with the EOS and CTA team will take place in a CERN computer center meeting room (513-1-024).
We invite all participants to join a social dinner on Monday evening (at their own expense).
The workshop participation does not require a fee.
Please register yourself to the workshop here. Don't forget to submit an abstract if you would like to share your experience/ideas within the EOS community.
If you are interested in joining the EOS community, this is the perfect occasion!
We look forward to seeing and talking to many of you in February 2020!
Your CERN EOS team.
A short introduction to the scope of the workshop, organization and logistics.
The CERN IT Storage group operates multiple distributed storage systems to support all CERN data storage requirements: the physics data generated by LHC and non-LHC experiments; object and file storage for infrastructure services; block storage for the CERN cloud system; filesystems for general use and specialized HPC clusters; content distribution filesystem for software distribution and condition databases; and sync&share cloud storage for end-user files.
We report about the current disk-storage offer, its evolution and future outlook towards Exabyte-scale storage with a dedicated look at the EOS storage systems.
The CERN Tape Archive (CTA) is the tape back-end to EOS. Configuring EOS to work with CTA allows event-based triggering of tape archivals and retrievals. As well as controlling the tape hardware (libraries and drives), CTA provides an advanced queue manager and scheduler to manage how and when tapes will be mounted, to optimise the use of the tape infrastructure. This presentation will provide an overview of CTA's features and how they integrate with EOS. There will be a separate hands-on session on Day 3 to learn how to configure EOS+CTA.
The Joint Research Centre (JRC) of the European Commission has set up the JRC Earth Observation Data and Processing Platform (JEODPP) as an infrastructure to enable the JRC projects to process and analyze big data, extracting knowledge and insights in support of EU policy making. The main focus is related to geospatial data, but has been extended to other data domains. EOS is the main storage component of the platform, operationally used since mid 2016, maintained and extended with support by the CERN EOS team.
The JEODPP is actively used by more than 40 JRC projects as platform for data science, covering a wide range of data analysis activities. In order to serve the growing needs for data storage and processing capacity by the JRC projects, the platform has been extended in 2019. It currently consists of the EOS system as storage back-end with a total gross capacity of 15.5 PB, and service nodes with a total of 2000 CPU cores.
As main changes in 2019, the EOS service has been migrated to the QuarkDB namespace, and half of the service nodes have been migrated to FUSEX client. The presentation will give an overview about the implemented platform, the current status, experiences made, and issues identified with EOS as main storage back-end of the JRC data science platform.
This contribution is about the current status of the project on procurement and deployment of a custodial storage system based on high-density JBOD enclosures with EOS RAIN. System design and considerations, preliminary test results, procurement, installation and basic deployment setup will be presented.
During 2019 CERN decommissioned its data centre in Wigner, Hungary. This contribution gives an overview of how 90PB of EOS-managed storage was successfully dismantled and the data relocated to the central Geneva site, maintaining full availability of the service and the data.
2019 was the year of migration on many levels. In this presentation we would like to outline the deployment of the new namespace backend, based on QuarkDB, replacing the in-memory namespace. We will talk about the reasons behind the change, the planning, initial implementation, some challenges and the outcome of this migration.
EOS Home (CERNBox) has now more than 20K users and projects spaces which need to be backed up every day. In order to improve our current backup system, we developed a distributed backup/restore system based on opensource tool restic which stores backup data on CERN S3 service. We are presenting an update on the current status of the project and future challenges.
A brief overview how EOS FUSEX (and client in general) are rolled out and configured at CERN. 5min
Over the last year, as part of developing our S3 gateway service, we have started deploying EOS using Kubernetes at AARNet. This will be a quick talk on how we set up our environments and the tooling we use to make deployments easy, and will include a short section on our current EOS usage and the instances we run.
EOS is a disk-based file system designed to be a fast, low-latency, high availability, and elastic storage to capture the results of CERN experiments. Nowadays, EOS has been developed to the stage where it became potentially interesting for enterprise users – that is where Comtrade with its vast experience with enterprise software development came in.
The documentation development process is proposed to be divided into six books and two supplements:
The result of the presented documentation process is the prototype for EOS documentation. This prototype is validated both by domain experts (developers) and end-users. Feedback is implementing into a version that is edited by professional proofreaders. Iterations of revisions between proofreaders and experts are ongoing processes. The final version of this prototype is also designed by the graphic design team.
The presented documentation process is provided as the first step towards establishing EOS as a viable enterprise storage product.
You can have lunch at R2 across the road.
In this talk we give a brief overview of the successful migration to the new namespace. Practically all EOS instances at CERN are currently on QuarkDB, the new namespace is officially boring technology, and MGM boot time a distant memory.
We will also discuss future plans and ideas to further improve scalability and performance of the namespace in particular with respect to locking, planned end-of-support for in-memory legacy namespace, and all miscellaneous namespace-related news.
From EOS version 4.5. to 4.6. the need to read-lock listings has been changed. This presentation will explain why that matters!
Being the foundation and main component of numerous solutions employed within the WLCG collaboration, most notably the EOS storage system, XRootD grew into one of the most important storage technologies in the High Energy Physics (HEP) community. With the upcoming major release (5.0.0), the XRootD framework will not only bring functional enhancements and a TLS based, secure version of the xroot/root data access protocol, but also introduce architectural improvements that set the stage for new exciting developments.
In this contribution we explain the xroots/roots protocol mechanics and focus on the implementation of the encryption component engineered to ensure low latencies and high throughput. We also give an overview of other developments finalized in release 5.0.0 and we discuss future directions of the project.
This will be a status update about the eosxd implementation status and recent evolution.
Lessons learned from releasing EOS for CentOS 8.
This presentation will show the steps and difficulties encountered when moving the build and testing infrastructure to a new platform.
""
Presentation about the new FSCK functionality.
This is a short talk to give the status of our High-available Samba setup in front of CERNBox/Eos.
Deployed in production in September 2019, its role is becoming more and more critical as a larger Windows-based user community moves to CERNBox, following the planned phase-out of DFS at CERN.
XDC is a 2-year R&D project involving multiple EU partner institutes.
Within it's second year, developed EOS features revolve around QoS.
The presentation will showcase QoS classes, file conversions,
a rework of the Converter Engine and how they all tie together.
The current EOS version (Citrine v4.6.8) supports access to Windows clients using the Samba interoperability suite. As Samba is just an additional layer between EOS on Linux and Windows clients, it is an additional possibility for unexpected issues:
1. Access and data transfer speed
2. Problems with access/read/write file and directories
3. Problems with ACLs
4. Filename issues
EOS Windows Native Client (EOS Wnc) is an implementation of EOS Linux client for Windows platform and it should improve EOS usability for Windows clients. Development of EOS Wnc is provided as Comtrade’s research project within EOS openlab R&D Topic 1: Data-centre technologies and infrastructures.
Following steps are proposed ad a starting point for the development of a prototype for the EOS Wnc:
1. Study of the architecture of existing EOS Linux Client
2. Identify potential risks and incompatibilities with Windows philosophy
3. Resume of available Windows disk-based storage systems
4. Setup the Windows development environment with native Windows libraries
5. Porting of EOS Linux Client with Microsoft Visual Studio IDE
6. Identify functionalities from Linux that are not possible to simply port
Implementation of EOS Wnc is the most important step in porting to EOS Wnc. The proposed high-level design of EOS client porting using Microsoft Visual Studio:
1. Provide a prototype version of EOS Wnc with basic functionalities
- Access to Windows filesystem
- Read/write Windows filesystem
2. Upgrade of the prototype
- Add user roles and permissions
- Adjust Windows and Linux ACL policy
- Check and finalize security model
- Check and finalize file authentication process
- Improve performance
- Improve Windows code (possible refactoring)
The first prototype of the EOS Wnc is proposed for March 2020 and the first release version is proposed for September 2020.
The Reva project is dedicated to create a platform to bridge the gap between Cloud Storages and Application Providers by making them talk to each other in an inter-operable fashion by leveraging on the community-driven CS3 APIs. For this reason, the goal of the project is not to recreate other services but to offer a straightforward way to connect existing services in a simple, portable and scalable way.
In this contribution we explain the roots of the project and how Reva inter-plays with EOS bringing new access methods and many advanced features to provide a new generation of sync and share services based on ownCloud for the CERNBox service.
At AARNet the Cloud Services team have invested a lot of research and development effort in the last year to lift CloudStor performance to meet the demands of research data generators and those who need to access, move, share and consume high volume research data. Much of our storage infrastructure had previously operated at modest speeds, often below 1Gbps for users accessing CloudStor services across 10Gb AARNet links. As universities start to adopt AARNet’s 100Gbps service the disparity between what our network can deliver and what our storage can achieve started to grow. There are many reasons for this disparity, including organic growth of the service, new service expectations and the evolution of the applications that CloudStor is built upon. But a confluence of advancements in the applications and operational software, combined with improved hardware, and plain old hard work, has given us the opportunity to deliver a service that pushes the boundaries of speed. This talk will highlight the steps we have taken over the past year to improve the performance of the CloudStor platform and how we are preparing for the high performance infrastructure required to meet the challenge of low-latency international data sharing.
CERN is moving the majority of end-user storage use-cases to CERNBox, CERN's cloud and sync&share platform: it's now common to see many different usages, like data analysis and collaborative editing of office documents, coupled with multiple access protocols.
With EOS powering CERNBox, this presentation will talk about how the service is being used and how we plan to evolve it, making CERNBox - and EOS - the center of our collaborative future.
What did we archieve, where should we go?
EOS in the landscape of available Open Source storage software. You are invited to bring forward also your ideas!
The EOS storage system has been steadily deployed on the Grid SE for ALICE over the past years. Presently, more than 1/2 of the ALICE total disk storage capacity is managed by EOS, with this fraction increasing every day. For the ALICE upgrade, EOS will also be installed on the large (60PB) disk buffer, which will hold the RAW data collected through the data taking year.
A 2020 update of our current experiences and challenges with running an EOS instance for use by the Fermilab LHC Physics Center (LPC) computing cluster. The LPC cluster is a 4500-core user analysis cluster with 7 PB of EOS storage which is an increase of about 40% over 2018. The LPC cluster supports several hundred active CMS users at any given time. We will also discuss our recent upgrades and our plans for moving to IPv6 and the QDB namespace.
The Institute of High Energy Physics undertakes many large scientific engineering projects in China. These large scientific projects generate a large amount of data every year and require a computing platform for analysis and processing. Among them, the LHAASO experiment to be constructed in 2021 will generate about 6PB of data each year. EOS, as the main data storage system of the LHAASO data processing platform, has been deployed for online use since 2016. EOS was deployed at IHEP as the core storage system of LHAASO data processing platform and is used in production with strong support by the CERN EOS team.
Currently, a total of 4 EOS instances are deployed, of which 3 serve LHAASO and HXMT experiments, and the other serves IHEPBOX. After multiple expansion in 2018 and 2019, the current total capacity is 8PB, and it is expected to expand by 6PB in 2020.
The presentation will give an overview about the deployment status, issues encountered during the usage of new layout of JBOD’s 84*12TB, and user experiences from using fuse to xrootd.
Root protocol is a standard way to access data from EOS in a grid environment. The performance of storage is crucial for the effective use of computing resources. One of the questions that could be asked by a person who sends jobs to the grid is the following: how many jobs may simultaneously download data from storage and still be effective? In order to answer this question, a simple test was created in JINR. The idea is the following: send 100-200 jobs, wait until all of them are in the running state, initiate file download from EOS, estimate how much time did it take to download all the data in all jobs. It is a purely functional test that gives understanding about the real performance of a storage element. The tests were performed in JINR: EOS was used as a storage element, local CE was used to run jobs, DIRAC was used as a workload management system, DIRAC standard utilities were used to initiate transfers.
After a long time of under delivering on our pledges for storage for our Tier2,
we are in the process of procuring new storage for our Tier2 facility.
3 PB usable total, 1.5PB for each of ALICE and ATLAS.
We will present what we intend to purchase and what motivated those decisions.
Current status of the distributed storage with EOS across two Asian sites, KISTI in Korea and SUT in Thailand, will be presented. EOS deployment is performed using container technology with the help of automation scripts.
This talk will explain the road choosen from the experiment specifications up to the current running implementation of a online storage buffer for a high-speed DAQ experiment (ProtoDUNE-DP).
After building an proof-of-concept and some initial tests and after building the final system and running another tests it is now on production since mid-2019, we will share our experience on the daily maintenance of this EOS DAQ buffer.
ALICE is upgrading the detector and the data processing to allow the collection, the processing and the analysis of 50kHz of PbPb collisions starting 2021. ALICE will run with no trigger selection and the data from the electronics front-ends will be collected in a continuous way and organised in 20-ms Time Frames. All Time Frames will be processed online to achieve a compression factor of the order of 35, reducing the raw data from a 3.5 TB/s to about 100 GB/s. The compressed Time Frames will be stored on EOS before tape archival, further processing and data distribution.
In this presentation we review the design parameters, the current status of the prototyping and the plans for the final system and its operation during the next years.
Brief overview of the LHCb online storage requirements for future LHC runs.
I will present few measurements of EC configurations on the latest CERN hardware and discuss the status of EC support in EOS.
This is a short overview about some improvements/extensions concerning the file and block checksum support in EOS.
This presentation will summarize a new buzz technology in WLCG. In January 2020 during a hackathon at CERN the major steps to enable this technology in WLCG have been done. The talk will briefly compare the WLCG token model to the native EOS token model and discuss how WLCG token support can be enabled for HTTP(S) and XRootD access.
Configuration setup and guidelines for TPC support with delegated certificates and tokens.
The File Transfer Service (FTS) is distributing the majority of the LHC data across the WLCG infrastructure and, in 2019, it has transferred more than 800 million files and a total of 0.95 exabyte of data. It is used by more than 25 experiments at CERN and in other data-intensive sciences outside of the LHC and even outside of the High Energy Physics domain. This brief talk will cover the work performed by the FTS team for the support of the new CERN Tape Archive (CTA) system which has been stress tested by the ATLAS Data Carousel activity and for supporting a more user-friendly authentication and delegation method using tokens.
Data science and other communities have moved a lot of their workloads to web based applications, with notebooks being one of the most popular ways to perform data analysis or machine learning. In the same way, large scale computation services are often offered via web portals with similar requirements for authentication and authorization.
End users use SSO to access the frontend application, making the required OAuth2 or SAML tokens available. If all remaining backends services support the same sort of authentication and authorization mechanisms, we can drop the additional and cumbersome step of asking the user to pass X509 certificates or Kerberos tokens.
In this short presentation we describe a client side use case and deployment accessing EOS using OAuth2 tokens, including a quick demo of how the tokens are propagated and how the necessary renewal is handled for long lived processes.
Comtrade is the member of CERN openlab and the important aim of such a membership is the integration of knowledge and experiences from CERN as a research institution and Comtrade as an industry partner.
What are the most valuable benefits of this collaboration for CERN:
a. New technologies and prototypes
b. Access and use of the research environments that are corner cases of industrial environments
c. Fewer restrictions on experimenting with no evident financial success
What are the most valuable benefits of this collaboration for Comtrade:
a. R&D processes optimized and focused on financially successful projects
b. Focused development without deviations related to not provided influences
c. R&D focused to satisfy the end-user
EOS Storage Appliance Prototype is proposed as a result of merging know-hows, knowledge and experiences from research and industry processes. Such a prototype should merge some of the following activities and goals:
1. On-premises data and applications
2. Integration of storage and AI/ML
3. Integration of storage and data analysis
4. Integration of hardware, software and support
Development of EOS Storage Appliance Prototype should not be limited to initial goals but should be shaped according to on-going needs and requests. The result is provided as a prototype that should be able to transform into the appropriate software product.
An overview of how to build and run EOS with code coverage support,
The EOS CI setup is used as an example of how to run an EOS cluster
and aggregate a final code coverage report.
An overview of the EOS CI nightly build setup.
Topics such as multiple build platforms and compiler sanitizers will be shown, together with examples of what the nightly build discovered.
Development work-flows and quality assurance of tagged releases are of pivotal importance in any large scale software project environment.
In this presentation we follow-up on the evolution of the CI operations, where we explored the opportunity to deploy EOS on top of Kubernetes, consolidated a seemless autonomous setup workflow and exploit distributed virtual clusters to perform the testing stage of our pipelines.
As a plus, taking advantage of disposable EOS-on-Kubernetes instances the test suite is being enriched with continuous structure-aware fuzz testing - using llvm's libfuzzer and libprotobuf-mutator technologies.
Most of the namespace data stored inside QuarkDB is in binary form as protobuf structures. While this is efficient and works well in practice, manually inspecting the namespace contents directly from QuarkDB is difficult.
eos-ns-inspect is a standalone tool which connects directly to QuarkDB, and offers a full view into the namespace contents, no MGM needed. In this short demonstration, we will showcase a few of its capabilities, including dumping the entire contents, looking up files by name or ID, as well as correcting inconsistencies.
Practical overview of operating EOS with a tape infrastructure.
In this session, EOS & CTA adevelopers/users/operators from CERN and outside will meet to help setting up EOS installations, debug problems in your installations, share knowledge and answer questions.