Speaker
Description
The High Energy Photon Source (HEPS), a new fourth-generation high-energy synchrotron radiation facility, is set to become fully operational by the end of 2025. With its significantly enhanced brightness and detector performance, HEPS will generate over 300 PB of experimental data annually across 14 beamlines in phase I, quickly reaching the EB scale. HEPS supports a wide range of experimental techniques, including imaging, diffraction, scattering, and spectroscopy, each with significant differences in data throughput and scale. Meanwhile, the emergence of increasingly complex experimental methods poses unprecedented challenges for data processing.
To address the future EB-scale experimental data processing demands of HEPS, we have developed DAISY (Data Analysis Integrated Software System), a general scientific data processing software framework. DAISY is designed to enhance the integration, standardization, and performance of experimental data processing at HEPS. It provides key capabilities, including high-throughput data I/O, multimodal data parsing, and multi-source data access. It supports elastic and distributed heterogeneous computing to accommodate different scales, throughput levels, and low-latency data processing requirements. It also offers a general workflow orchestration system to flexibly adapt to various experimental data processing modes. Additionally, it provides user software integration interfaces and a development environment to facilitate the standardization and integration of methodological algorithms and software across multiple disciplines.
Based on the DAISY framework, we have developed multiple domain-specific scientific applications, covering imaging, diffraction, scattering and spectroscopy, while continuously expanding to more scientific domains. Furthermore, we have optimized key software components and algorithms to significantly improve data processing efficiency. At present, several DAISY-based scientific applications have been successfully deployed on HEPS beamlines, supporting online data processing for users. The remaining applications are scheduled for fully deployment within the year, further strengthening HEPS’s data analysis capabilities.
Significance
We will present our recent progress in streaming data processing, distributed data processing, and data processing efficiency optimization, as well as the deployment of the DAISY scientific software on HEPS beamlines.
Experiment context, if any | HEPS |
---|