Speaker
Description
At photon-science facilities such as the European XFEL, large data volumes are generated at multiple experiment stations and under frequently changing configurations.
The experiments that produce these data typically last only a few days and are carried out by external user teams.
In this environment, effective management of experimental data is essential for delivering timely, high‑quality scientific results,
ensuring that data produced at large-scale research facilities can be reliably captured, accessed, processed, and preserved.
We present the architecture and operation of the European XFEL data management infrastructure,
built around a four-tier storage model tailored to the different phases of the data lifecycle.
An online storage layer located close to the instruments is designed for high performance and exceptional reliability.
It buffers the data produced by instruments at extreme rates, reaching up to 15 GB/s per individual detector.
A high-performance storage layer, located in the DESY computing centre, supports both prompt processing during beam time and subsequent offline analysis.
The data management infrastructure is connected to the European XFEL experiment hall’s InfiniBand fabric via a 4.4 km, 1 Tb/s link.
Mid-term access to data is provided by a mass storage layer, while a tape archive ensures reliable long-term preservation with a retention time of at least 10 years.
Together, these systems support the handling and processing of up to 2 PB of newly recorded data per day and are tightly integrated with a shared compute cluster
for near-online analysis, as well as supporting remote analysis by external users for several years after the experiment.
In the context of environmental sustainability, the continual and future operations of European XFEL will require a review of resource consumption and usage policies.
Therefore, in addition, we discuss emerging sustainability measures, including per-user and per-job energy and emissions reporting,
comprehensive power metering of data centre infrastructure, and dynamic resource provisioning linked to user demand and green energy availability,
developed in the context of projects such as RF2.0.