Conveners
EOS 3: Development & Operation
- Elvin Alin Sindrilaru (CERN)
EOS 3: Development & Operation
- Elvin Alin Sindrilaru (CERN)
General update from XRootD project.
std::atomic introduced since C++11 is used as a building block for lock free programming. However while the default flags provide the maximum consistency; the do come with a performance penalty and may not be what you want in all cases. We will look under the hood, at a top level on what the processor sees when an atomic is encountered, the acquire and release semantics, which are...
Improving EOS monitoring of finished transfers. Hands-on eos io stat
output.
Prometheus is a modern, simple and scalable monitoring system with an easy to use query language based in labels. EOS Operators team has developed a fully-functional EOS Prometheus exporter in Golang to monitor all EOS metrics. This includes space, group, node, filesystem, I/O and namespace stats collectors. In this talk, the tool will be showcased and made available to the EOS Community.
Presentation on the new recording plug-in that allows I/O sampling and the replay tool.
With 100GE technology and erasure coding we discovered new bottlenecks and challenges. This presentation will recap the state of the art of the ALICEO2 EOS instance and show benchmarks including a real and and replayed physics analysis use case.
This contribution reports on the recent revamping of ScienceBox: The container-based stack for science with EOS, CERNBox, and SWAN services for Kubernetes-orchestrated clusters.
ScienceBox has been rebuilt from its foundations using modern cloud-native technologies for better service configuration and improved reliability, without compromising on deployment flexibility. Rethinking the whole...
In preparation for Run-3 we have faced the following problem: we have to balance the usage of IO resources between individual activities, which has led to the implementation of IO priorities and bandwidth regulation policies. While commissioning the ALICEO2 EOS instance we have observed, that write performance using the buffer cache is a bottleneck on storage nodes. Direct IO helps to improve...
With XRootD5 the on the wire protocol provides confidentiality of data inside the transport layer. However data files are human readable on storage nodes and can be accessed and downloaded by any EOS administrator and any person with read access. Filesystem level encryption on storage nodes does not solve this confidentiality problem.
To provide better data privacy the most recent versions...
Physics and CERNBOX instances at CERN are exposed to O(4) mount clients simultaneously. Overloads from batch access is not a new thing - since years the AFS filesystem suffers more or less frequently volume overloads. During overload episodes meta-data access at the MGM slows down significantly because thousands of batch nodes compete against few interactive clients and sync & share access. To...
A primer on xrdcp new (and old) features like zip append, metalling support, retries and many more.
Context: Productisation of Windows native connection of EOS to Windows operating system.
Objectives: The professional implementation of the EOS with the Windows platform should allow seamless usage of EOS as a Windows local disk with all the EOS benefits, as it is low latency, high throughput, and high reliability.
Method: Implementation of the EOS client for the Windows...
EOS durability machinery is a set of (operator's) scripts, tools and EOS components to classify, monitor and repair unhealthy files. EOS filesystem check (fsck) was enabled in 2021, but one should keep track of the instances' state, and investigate root causes for the problems found.