Mar 17 – 19, 2026
University of Oslo
Europe/Zurich timezone

From Primary Data to Computational Analysis: Bridging HPC Workflows and Research Documentation

Mar 18, 2026, 10:00 AM
15m
Gamle Festsal (University of Oslo)

Gamle Festsal

University of Oslo

Karl Johans gate 47
FAIR and Open Data, Research Data Lifecycle, Data Science Environments & Preservation Data Science Environments & HPC integration

Speakers

Mr Ramon D'Agosta (ResearchSpace)Mr Rory Macneil (ResearchSpace)

Description

Research institutions face a critical challenge in managing large-scale data, such as sequencing data, across their complete lifecycle—from generation of the primary data with an instrument through HPC analysis to long-term archival storage—while maintaining data provenance and FAIR compliance. Researchers currently lack integrated tools to coordinate data movement across storage tiers, forcing manual action that creates compliance risks, inflate storage costs due to duplication of data, and break connections between primary data, experimental context, and computational results. As computational methods become increasingly intrinsic to research, the integration between active research documentation and compute infrastructure becomes critical for efficient workflows to create FAIR data.
Building on our approach to vertical interoperability presented at CS3 2025, where we demonstrated how RSpace, an open-source research data management platform including an electronic laboratory notebook, can provide a user-friendly frontend for institutional file sync and share solutions like iRODS, we are now extending this approach to manage data from its primary origin to being used in HPC workflows and the outputs created there. RSpace's S3 integration enables researchers to seamlessly manage sequencing data across distributed storage locations through an interface they already use daily for experiment documentation and sample management. The solution provides intuitive file operations between S3 buckets and other RSpace-supported storage systems, links rich contextual metadata from experimental documentation and sample records to files in S3 storage, and robustly connects HPC workflow outputs back to originating experiments with relevant run metadata.
A typical workflow illustrates the value: researchers document experiments in RSpace with linked samples and protocols, connect raw sequencing data (regardless of storage location) to their documentation with contextual metadata, transfer data with metadata collected in RSpace to an HPC-environment via RSpace's unified interface, and after HPC analysis, link result files back to original experiments, preserving complete lineage.
This approach, developed in collaboration with the Leibniz Supercomputing Center and the University of Göttingen, addresses FAIR adoption barriers by providing seamless access to data regardless of storage backend, ensuring experimental context travels with data for discoverability and lineage tracking, and enabling cost optimization through efficient data lifecycle management across storage tiers. By integrating institutional storage infrastructure directly into researchers' daily workflows, we reduce usability barriers while improving FAIR compliance—making data management practices easier to adopt and sustain.

Suggested Contribution Type Regular Talk (15-30 min)

Authors

Mohamad Hayek (LRZ) Mr Ramon D'Agosta (ResearchSpace) Mr Rory Macneil (ResearchSpace) Tilo Mathes

Presentation materials

There are no materials yet.